Berri corpus manager: a corpus analysis tool using MongoDB technology

dc.contributor.authorSanjurjo González, Hugo
dc.date.accessioned2024-11-13T09:13:18Z
dc.date.available2024-11-13T09:13:18Z
dc.date.issued2020-09-15
dc.date.updated2024-11-13T09:13:18Z
dc.descriptionPonencia presentada en la 9th International Conference, Human Language Technologies - The Baltic Perspective (Baltic HLT 2020), celebrada en Kaunas, Lituania, entre el 22 y 23 de septiembre de 2020.es
dc.description.abstractNowadays, there are many options for corpus linguistic analysis that make use of different approaches for corpus storage. There are tools based on SQL databases, dedicated implementations such as CQP/CWB and others that employ plain-text corpora. NoSQL databases have been widely used for big data, data mining and even sentiment analysis. However, as far as we can see, there is a lack of a widespread concordancer or consolidated framework that makes use of MongoDB architecture for the purposes of corpus linguistics. This paper aims to describe the architecture of a software that allows users to analyse monolingual and bilingual parallel corpora with grammatical annotation using MongoDB technology. Our premises are that MongoDB is ideal for non-structured data and provides high flexibility and scalability, so it may be also useful for corpus linguistic research. We analyse functionalities of MongoDB such as text search indexes and query format in order to examine its suitability.en
dc.identifier.citationSanjurjo-González, H. (2020). Berri corpus manager: a corpus analysis tool using MongoDB technology. Frontiers in Artificial Intelligence and Applications, 328, 166-173.
dc.identifier.doi10.3233/FAIA200619
dc.identifier.isbn9781643681160
dc.identifier.issn0922-6389
dc.identifier.urihttp://hdl.handle.net/20.500.14454/1808
dc.language.isoeng
dc.publisherIOS Press BV
dc.rights© 2020 The authors and IOS Press
dc.subject.otherConcordancer
dc.subject.otherCorpus analysis tool
dc.subject.otherCorpus linguistics
dc.subject.otherMongoDB
dc.subject.otherNoSQL database
dc.titleBerri corpus manager: a corpus analysis tool using MongoDB technologyen
dc.typeconference paper
dcterms.accessRightsopen access
oaire.citation.endPage173
oaire.citation.startPage166
oaire.citation.titleFrontiers in Artificial Intelligence and Applications
oaire.citation.volume328
oaire.licenseConditionhttps://creativecommons.org/licenses/by-nc/4.0/
oaire.versionVoR
Ficheros en el ítem
Bloque original
Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
sanjurjo_berri_2020.pdf
Tamaño:
639.97 KB
Formato:
Adobe Portable Document Format
Colecciones