6
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Medical dataset classification for Kurdish short text over social media

      data-paper

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The Facebook application is used as a resource for collecting the comments of this dataset, The dataset consists of 6756 comments to create a Medical Kurdish Dataset (MKD). The samples are comments of users, which are gathered from different posts of pages (Medical, News, Economy, Education, and Sport). Six steps as a preprocessing technique are performed on the raw dataset to clean and remove noise in the comments by replacing characters. The comments (short text) are labeled for positive class (medical comment) and negative class (non-medical comment) as text classification. The percentage ratio of the negative class is 55% while the positive class is 45%.

          Related collections

          Most cited references9

          • Record: found
          • Abstract: not found
          • Article: not found

          Social media competitive analysis and text mining: A case study in the pizza industry

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            A survey of pre-processing techniques to improve short-text quality: a case study on hate speech detection on twitter

              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              An extensive dataset of handwritten central Kurdish isolated characters

              To collect the handwritten format of separate Kurdish characters, each character has been printed on a grid of 14 × 9 of A4 paper. Each paper is filled with only one printed character so that the volunteers know what character should be written in each paper. Then each paper has been scanned, spliced, and cropped with a macro in photoshop to make sure the same process is applied for all characters. The grids of the characters have been filled mainly by volunteers of students from multiple universities in Erbil.
                Bookmark

                Author and article information

                Contributors
                Journal
                Data Brief
                Data Brief
                Data in Brief
                Elsevier
                2352-3409
                23 March 2022
                June 2022
                23 March 2022
                : 42
                : 108089
                Affiliations
                [a ]Computer Science Department, University of Halabja, KRG, Halabja, Kurdistan, Iraq
                [b ]Computer Science and Engineering Department, University of Kurdistan-Hawlêr, KRG, Erbil, Kurdistan, Iraq
                Author notes
                [* ]Corresponding author. ari.said@ 123456uoh.edu.iq
                Article
                S2352-3409(22)00300-6 108089
                10.1016/j.dib.2022.108089
                8980624
                c21da945-b5c7-462c-b83f-5324edae49a1
                © 2022 The Author(s). Published by Elsevier Inc.

                This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

                History
                : 17 January 2022
                : 14 March 2022
                : 16 March 2022
                Categories
                Data Article

                machine learning,medical text classification,kurdish short text,text pre-processing

                Comments

                Comment on this article