Medical dataset classification for Kurdish short text over social media

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The Facebook application is used as a resource for collecting the comments of this dataset, The dataset consists of 6756 comments to create a Medical Kurdish Dataset (MKD). The samples are comments of users, which are gathered from different posts of pages (Medical, News, Economy, Education, and Sport). Six steps as a preprocessing technique are performed on the raw dataset to clean and remove noise in the comments by replacing characters. The comments (short text) are labeled for positive class (medical comment) and negative class (non-medical comment) as text classification. The percentage ratio of the negative class is 55% while the positive class is 45%.

Related collections

Most cited references 9

Record: found
Abstract: not found
Article: not found

Social media competitive analysis and text mining: A case study in the pizza industry

Shenghua Zha, Ling Li, Wu He (2013)

0 comments Cited 113 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

A survey of pre-processing techniques to improve short-text quality: a case study on hate speech detection on twitter

Usman Naseem, Imran Razzak, Peter W. Eklund (2021)

0 comments Cited 14 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

An extensive dataset of handwritten central Kurdish isolated characters

Rebin Ahmed, Tarik Rashid, Polla Fatah … (2021)

To collect the handwritten format of separate Kurdish characters, each character has been printed on a grid of 14 × 9 of A4 paper. Each paper is filled with only one printed character so that the volunteers know what character should be written in each paper. Then each paper has been scanned, spliced, and cropped with a macro in photoshop to make sure the same process is applied for all characters. The grids of the characters have been filled mainly by volunteers of students from multiple universities in Erbil.

0 comments Cited 4 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Ari M. Saeed

Journal

Journal ID (nlm-ta): Data Brief

Journal ID (iso-abbrev): Data Brief

Title: Data in Brief

Publisher: Elsevier

ISSN (Electronic): 2352-3409

Publication date PMC-release: 23 March 2022

Publication date Collection: June 2022

Publication date (Electronic): 23 March 2022

Volume: 42

Electronic Location Identifier: 108089

Affiliations

[a ]Computer Science Department, University of Halabja, KRG, Halabja, Kurdistan, Iraq

[b ]Computer Science and Engineering Department, University of Kurdistan-Hawlêr, KRG, Erbil, Kurdistan, Iraq

Author notes

[* ]Corresponding author. ari.said@ 123456uoh.edu.iq

Article

Publisher Item ID: S2352-3409(22)00300-6 Publisher ID: 108089

DOI: 10.1016/j.dib.2022.108089

PMC ID: 8980624

SO-VID: c21da945-b5c7-462c-b83f-5324edae49a1

License:

This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

History

Date received : 17 January 2022

Date revision received : 14 March 2022

Date accepted : 16 March 2022

Comments

Comment on this article

scite_

Cited by 2

See all cited by

Most referenced authors 45

See all reference authors

Medical dataset classification for Kurdish short text over social media

Read this article at

Abstract

Related collections

Annual Reviews AI, Machine Learning, and Society

Most cited references 9

Social media competitive analysis and text mining: A case study in the pizza industry

A survey of pre-processing techniques to improve short-text quality: a case study on hate speech detection on twitter

An extensive dataset of handwritten central Kurdish isolated characters

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 77

Cited by 2

Most referenced authors 45