For the annotation of Titlo Diacritic



The paper describes different levels of annotation used in the Corpus of Modern, Middle and Old Georgian Texts. Aiming at building a new, extensive and representative tool for Georgian language the Corpus was compiled under the financial support of the Shota Rustaveli National Science Foundation and the Ilia State University (AR/266/1-31/13). In particular, the Corpus of Georgian language is envisaged as collecting a substantial amount of data needed for research. The scope and representativeness of texts included as well as free accessibility to it makes the corpus one of the most necessary tools for the study of different texts in Modern, Middle and Old Georgian (see, The corpus consists of different kind of texts, mainly: a) Manuscript- based publications; b) Reprints; c) Previously unpublished manuscripts and; d) Previously published manuscripts and covers Modern, Middle and Old Georgian.

The paper presents the research area, the design and structure and applications related to the compilation of the corpus, in particular, different levels of annotation as meta-data, structural mark-up and linguistic annotation at word-level, especially, from the viewpoint of Titlo Diacritic.

This paper is structured as follows: Section 1 includes background and research questions; Section 2 presents a methodological approach and briefly summarizes its theoretical prerequisites; Section 3 includes the findings and hypothesis, which refers generally to the differences between the annotation of Modern and Old Georgian texts; and Section 4 presents the answers to the research questions.

