BulPosCor

The Bulgarian Part of Speech-annotated Corpus (BulPosCor) (in Bulgarian: Български Пос анотиран корпус (БулПосКор)) is a morphologically annotated general monolingual corpus of written language where each item in a text is assigned a grammatical tag. BulPosCor is created by the Department of Computational Linguistics at the Institute for Bulgarian Language of the Bulgarian Academy of Sciences and consists of 174 697 lexical items. BulPosCor has been compiled from the Structured "Brown" Corpus of Bulgarian by sampling 300+ word-excerpts (expanded to sentence boundary) from the original BCB files in such a way as to preserve the BCB overall structure. The annotation process consists of a primary stage of automatically assigning tags from the Bulgarian Grammar Dictionary and a stage of manual resolving of morphological ambiguities. The disambiguated corpus consists of 174,697 lexical units.

Access

BulPOSCor Search Interface

gollark: You can imagine what this incentive does to people.
gollark: Apparently the mark scheme for practicals here gives you more points if your data is close to the known real value.
gollark: Actually, you can be safe if you train in all combat sports ever for several years retroactively.
gollark: Graphene oxide? Why are people being thing about graphene oxide, of all things?
gollark: People can get used to basically arbitrarily bad things.

References

Koeva, Sv. Gramatichen Rechnik na Balgarskiya ezik.Opisanie na koncepciyata za organizaciyata na lingvistichnite danni. (Grammatical Dictionary of Bulgarian.), в: Български език, 6, 1998, с. 49-58. Koeva, Sv., Sv. Leseva, I. Stoyanova, E. Tarpomanova, M. Todorova. Bulgarian Tagged Corpora, Proceedings of the Fifth International Conference Formal Approaches to South Slavic and Balkan Languages, 18–20 October 2006, Sofia, Bulgaria, pp. 78–86. Todorova, Maria, Rositsa Dekova. Balgarski POS anotiran korpus – osobenosti na gramatichnata anotaciya. (Bulgarian POS annotated corpus – specifics of the grammatical annotation) в: Езикови ресурси и технологии за български език. Състав. и научн. ред. Св. Коева, Д. Благоева, Т. Тинчев. София: Академично издателство „Марин Дринов“, 2014.

See also


This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.