20
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Challenges in Persian Electronic Text Analysis

      Preprint
      , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Farsi, also known as Persian, is the official language of Iran and Tajikistan and one of the two main languages spoken in Afghanistan. Farsi enjoys a unified Arabic script as its writing system. In this paper we briefly introduce the writing standards of Farsi and highlight problems one would face when analyzing Farsi electronic texts, especially during development of Farsi corpora regarding to transcription and encoding of Farsi e-texts. The pointes mentioned may sounds easy but they are crucial when developing and processing written corpora of Farsi.

          Related collections

          Author and article information

          Journal
          18 April 2014
          Article
          1404.4740
          53a80f46-e0a1-4772-9836-39c851b33978

          http://arxiv.org/licenses/nonexclusive-distrib/1.0/

          History
          Custom metadata
          68T50
          Appeared in a Local conference 2006, available for the first time
          cs.CL

          Comments

          Comment on this article