Liebeskind, Chaya and Dagan, Ido and Schler, Jonathan (2019) An Algorithmic Scheme for Statistical Thesaurus Construction in a Morphologically Rich Language. Applied Artificial Intelligence, 33 (6). pp. 483-496. ISSN 0883-9514
An Algorithmic Scheme for Statistical Thesaurus Construction in a Morphologically Rich Language.pdf - Published Version
Download (1MB)
Abstract
Corpus-based automatic thesaurus construction uses linguistic methods, such as Part-of-Speech taggers and parsers, which often perform poorly on MRLs. Therefore, in this paper, we focused on the complex task of adapting corpus-based thesaurus construction methods for MRLs. We investigated two statistical approaches for thesaurus construction; a) a first-order co-occurrence-based approach and b) a second-order distributional-based approach. We explored alternative levels of morphological term representations complemented by grouping the morphological variants. We then introduced and adopted a generic algorithmic scheme for thesaurus construction in MRLs for both first-order and second-order approaches. Our scheme investigated alternative representation levels and offered alternative configurations. We demonstrated the empirical benefits of our methodology for a diachronic Hebrew thesaurus construction. We used morphological analysis tools, defined and applied a new annotation scheme, and demonstrated its optimal configuration, which outperforms the baseline for both first and second order corpus-based thesaurus construction approaches.
Item Type: | Article |
---|---|
Subjects: | STM Academic > Computer Science |
Depositing User: | Unnamed user with email support@stmacademic.com |
Date Deposited: | 21 Jun 2023 10:28 |
Last Modified: | 18 Nov 2023 05:49 |
URI: | http://article.researchpromo.com/id/eprint/1114 |