On classification of abstracts obtained from medical journals

Parlak, Bekir; Uysal, Alper Kurşat

dc.contributor.author	Parlak, Bekir
dc.contributor.author	Uysal, Alper Kurşat
dc.date.accessioned	2019-10-21T19:44:30Z
dc.date.available	2019-10-21T19:44:30Z
dc.date.issued	2019
dc.identifier.issn	0165-5515
dc.identifier.issn	1741-6485
dc.identifier.uri	https://dx.doi.org/10.1177/0165551519860982
dc.identifier.uri	https://hdl.handle.net/11421/19892
dc.description	WOS: 000474950600001	en_US
dc.description.abstract	Classification of medical documents was mostly carried out on English data sets and these studies were performed on hospital records rather than academic texts. The main reasons behind this situation are the lack of publicly available data sets and the tasks being costly and time-consuming. As the first contribution of this study, two data sets including Turkish and English counterparts of the same abstracts published in Turkish medical journals were constructed. Turkish is one of the widely used agglutinative languages worldwide and English is a good example of non-agglutinative languages. While English abstracts were obtained automatically from MEDLINE database with a computer program, Turkish counterparts of these documents were collected manually from the Internet. As the second contribution of this study, an extensive comparison on classification of abstracts obtained from Turkish medical journals was made by using these two equivalent data sets. Features were extracted from text documents with three different approaches: unigram, bigram and hybrid. Hybrid approach includes a combination of unigram and bigram features. In the experiments, three different feature selection methods and seven different classifiers were utilised. According to the results on both data sets, classification performance of the English abstracts outperformed the Turkish counterparts. Maximum accuracies were obtained from the combination of unigram features, distinguishing feature selector (DFS) and multinomial naive Bayes (MNB) classifier for both data sets. Unigram features were generally more efficient than bigram and hybrid features. However, analysis of top-10 features indicated that nearly half of the features were translations of each other for Turkish and English data sets.	en_US
dc.description.sponsorship	Anadolu University [1503F136]	en_US
dc.description.sponsorship	The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by Anadolu University, Fund of Scientific Research Projects under grant number 1503F136.	en_US
dc.language.iso	eng	en_US
dc.publisher	SAGE Publications LTD	en_US
dc.relation.isversionof	10.1177/0165551519860982	en_US
dc.rights	info:eu-repo/semantics/closedAccess	en_US
dc.subject	Feature Selection	en_US
dc.subject	Medical Documents	en_US
dc.subject	Preprocessing	en_US
dc.subject	Text Classification	en_US
dc.subject	Text Representation	en_US
dc.title	On classification of abstracts obtained from medical journals	en_US
dc.type	article	en_US
dc.relation.journal	Journal of Information Science	en_US
dc.contributor.department	Anadolu Üniversitesi, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü	en_US
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US]
dc.contributor.institutionauthor	Uysal, Alper Kurşat

Bu öğenin dosyaları:

Ad:: 19892.pdf
Boyut:: 321.2Kb
Biçim:: PDF
Açıklama:: Tam Metin / Full Text

Göster/Aç

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

Makale Koleksiyonu [100]
Scopus İndeksli Yayınlar Koleksiyonu [8325]
Scopus Indexed Publications Collection
WoS İndeksli Yayınlar Koleksiyonu [7605]
WoS Indexed Publications Collection

Basit öğe kaydını göster