Gelişmiş Arama

Basit öğe kaydını göster

dc.contributor.authorAslan, Özkan
dc.contributor.authorGünal, Serkan
dc.contributor.authorDinçer, Bekir Taner
dc.date.accessioned2019-10-21T19:44:22Z
dc.date.available2019-10-21T19:44:22Z
dc.date.issued2018
dc.identifier.issn0306-4573
dc.identifier.issn1873-5371
dc.identifier.urihttps://dx.doi.org/10.1016/j.ipm.2018.05.004
dc.identifier.urihttps://hdl.handle.net/11421/19866
dc.descriptionWOS: 000445713800024en_US
dc.description.abstractChunking is a task which divides a sentence into non-recursive structures. The primary aim is to specify chunk boundaries and classes. Although chunking generally refers to simple chunks, it is possible to customize the concept. A simple chunk is a small structure, such as a noun phrase, while constituent chunk is a structure that functions as a single unit in a sentence, such as a subject. For an agglutinative language with a rich morphology, constituent chunking is a significant problem in comparison to simple chunking. Most of Turkish studies on this issue use the IOB tagging schema to mark the boundaries. In this study, we proposed a new simpler tagging schema, namely OE, in constituent chunking for Turkish. "E" represents the rightmost token of a chunk, while "O" stands for all other items. In reference to OE, we also used a schema called OB, where "B" represents the leftmost token of a chunk. We aimed to identify both chunk boundaries and chunk classes using the conditional random fields (CRF) method. The initial motivation was to employ the fact that Turkish phrases are head-final for chunking. In this context, we assumed that marking the end of a chunk (OE) would be more advantageous than marking the beginning of a chunk (013). In support of the assumption, the test results reveal that OB has the worst performance and OE is significantly a more successful schema in many cases. Especially in long sentences, this contrast is more obvious. Indeed, using OE means simply marking the head of the phrase (chunk). Since the head and the distinctive label "E" are aligned, CRF finds the chunk class more easily by using the information contained in the head. OE also produced more successful results than the schemas available in the literature. In addition to comparing tagging schemas, we performed four analyses. Along with the examination of window size, which is a parameter of CRF, it is adequate to select and accept this value as 3. A comparison of the evaluation measures for chunking revealed that F-score was a more balanced measure in contrast to token accuracy and sentence accuracy. As a result of the feature analysis, syntactic features improves chunking performance significantly under all conditions. Yet when withdrawing these features, a pronounced difference between OB and OE is forthcoming. In addition, flexibility analysis shows that OE is more successful in different data.en_US
dc.description.sponsorshipFund of Scientific Research Projects, Anadolu University [1410F415]en_US
dc.description.sponsorshipThis work was supported by the Fund of Scientific Research Projects, Anadolu University under grant number 1410F415.en_US
dc.language.isoengen_US
dc.publisherElsevier Sci LTDen_US
dc.relation.isversionof10.1016/j.ipm.2018.05.004en_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectChunkingen_US
dc.subjectShallow Parsingen_US
dc.subjectTurkishen_US
dc.subjectConstituent Conditional Random Fieldsen_US
dc.subjectNatural Language Processingen_US
dc.titleOn constituent chunking for Turkishen_US
dc.typearticleen_US
dc.relation.journalInformation Processing & Managementen_US
dc.contributor.departmentAnadolu Üniversitesi, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümüen_US
dc.identifier.volume54en_US
dc.identifier.issue6en_US
dc.identifier.startpage1262en_US
dc.identifier.endpage1276en_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US]
dc.contributor.institutionauthorGünal, Serkan


Bu öğenin dosyaları:

Thumbnail

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

Basit öğe kaydını göster