Correcting writing errors in turkish with a character-level neural language model [Dahi anlamindaki de ayri yazilir: Türkçe yazim hatalarinin karakter-seviyeli bir sinirsel dil modeli ile düzeltilmesi]
Abstract
A large part of the written content on the Internet is composed of social media posts, articles written for content platforms and user comments. In contrast to the content prepared for print media, these types of texts include a large number of writing errors. Automating the detection and correction of writing errors in content created for commercial purposes would decrease editing costs dramatically. Although word-level language models have performed well in processing analytic languages, they are not ideal for agglutinative languages, which include Turkish. Models built on smaller elements such as morphemes or characters are more suitable for agglutinative languages. In this study, we propose a method that uses a character-level language model to correct writing errors in Turkish. Character-level text generation is used to calculate the probabilities of possible syntaxes. The syntax that is the most probable is inferred to be correct. The proposed method is implemented to correct errors in writing the conjunction 'de' and the suffix '-de'
Source
26th IEEE Signal Processing and Communications Applications Conference, SIU 2018Collections
- Bildiri Koleksiyonu [355]
- Scopus İndeksli Yayınlar Koleksiyonu [8325]