Comparable Corpora: Compilation Methods and Areas of Application
DOI:
https://doi.org/10.32859/kadmos/16/237-253Keywords:
Comparable corpora, compilation methods, applicationAbstract
Comparable corpora and their application in research have been an object of interest since the 1990s. Following the establishment of the annual workshop series “Building and Using Comparable Corpora” (BUCC) in 2008, there has been an increasing interest in comparable corpora and the study of their effectiveness for bilingual/multilingual projects. Although there is a general comparable corpus of the Georgian language compiled as part of the “Aranea” project, a family of web-crawled comparable corpora, currently, there are no specialized comparable corpora available for the Georgian language. In general, the application of comparable corpora for bilingual/multilingual specialized lexicography in Georgia is a novel research topic that has not been explored before. Therefore, this review paper aims to analyze the concept and types of comparable corpora. It also discusses the advantages of using comparable corpora and the areas of their application. Furthermore, the paper focuses on the methods of compiling specialized comparable corpora, in particular, such issues as representativeness, balance, corpus size, and comparability criteria.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Ketevan Mchedlishvili

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Authors retain the copyright. They grant the journal the right of first publication and permit the use of their work under a CC BY-NC license, which allows others to download and share articles, provided that Kadmos. A Journal of the Humanities is credited as the source. The works derived from them can be used for noncommercial purpose