Learn how to work with annotated corpora for linguistic analysis and pattern validation.
Corpora are collections of annotated linguistic examples used for pattern analysis, validation of grammatical theories, and training machine learning models. In EGlossa, corpora are versioned, searchable, and collaborative.
The quick brown fox jumps over the lazy dog
Query linguistic patterns across corpora with regex, POS tags, or semantic features.
Track changes to corpora over time using Git-style diff and branching.
Download corpora in multiple formats including XML, JSON, and CSV for offline analysis.
Use the search interface to find patterns with filters for word class, syntactic role, or semantic frame.
Apply grammatical tags interactively and review consensus annotations from other researchers.
Query all instances of transitive verbs and their object types in the Spanish-English parallel corpus.
SELECT * FROM corpus WHERE verb = 'transitive' AND object = 'direct'
Analyze case marking patterns across Germanic and Slavic language corpora.
FILTER [case: nominative] in German + Polish datasets