47
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Filtering Chinese microblog topics noise algorithm based on a semi-supervised model

      research-article

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Social networking feeds often include much spam that includes marketing, recruitment or short articles without real content which negatively affect the user interest. The spam also seriously affects academic research and business applications. This paper presents an algorithm based on the pSVM-kNN model for filtering Chinese microblogging text noise to reduce the spam. This method combines the SVM and kNN algorithms. The kNN algorithm iteratively finds the optimal solution of the classification hyperplane in the local scope on the SVM computing hyperplane. Penalty costs and proportional weights are introduced into the SVM and kNN stages to improve the noise filtering and reduce misclassification. Tests on various size of real Sina Weibo datasets demonstrate that the precision and recall of this algorithm are significantly better than other methods with a remarkable improvement of the F-measure.

          Abstract

          摘要 社交网络中存在大量营销、招聘等垃圾信息以及无实质内容的短文, 为话题建模工作带来很多干扰, 更严重影响社交网络方面的学术研究及商业应用。因此, 该文提出了一种结合支持向量机与 k近邻模型 (pSVM-kNN) 的半监督话题噪声过滤方法。该方法融合了SVM和kNN算法, 在SVM计算得到超平面的基础上使用kNN算法在局部范围内迭代寻找分类超平面的最优解; 同时为减少误分类发生, 分别在SVM和kNN阶段引入惩罚代价和比例权重, 以提高噪声过滤的效果。通过选取新浪微博中不同大小的数据集进行实验与其他方法进行比较, 结果表明:该方法只利用了少量的标注样本进行训练, 在准确率、召回率和F值方面均优于其他的对比方法。

          Author and article information

          Journal
          J Tsinghua Univ (Sci & Technol)
          Journal of Tsinghua University (Science and Technology)
          Tsinghua University Press
          1000-0054
          15 March 2019
          19 March 2019
          : 59
          : 3
          : 178-185
          Affiliations
          [1] 1Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
          [2] 2CAS Key Laboratory of Network Data Science & Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
          [3] 3State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China
          Article
          j.cnki.qhdxxb.2019.26.060
          10.16511/j.cnki.qhdxxb.2019.26.060
          663f8a72-cf85-4a5c-971b-c56d427e52be
          Copyright © Journal of Tsinghua University

          This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 Unported License (CC BY-NC 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See https://creativecommons.org/licenses/by-nc/4.0/.

          History
          : 22 August 2018

          Software engineering,Data structures & Algorithms,Applied computer science,Computer science,Artificial intelligence,Hardware architecture
          social networks,support vector machine,penalty cost, k-nearest neighbor,noise filtering

          Comments

          Comment on this article