7
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Scalable Solutions for Automated Single Pulse Identification and Classification in Radio Astronomy

      Preprint
      , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Data collection for scientific applications is increasing exponentially and is forecasted to soon reach peta- and exabyte scales. Applications which process and analyze scientific data must be scalable and focus on execution performance to keep pace. In the field of radio astronomy, in addition to increasingly large datasets, tasks such as the identification of transient radio signals from extrasolar sources are computationally expensive. We present a scalable approach to radio pulsar detection written in Scala that parallelizes candidate identification to take advantage of in-memory task processing using Apache Spark on a YARN distributed system. Furthermore, we introduce a novel automated multiclass supervised machine learning technique that we combine with feature selection to reduce the time required for candidate classification. Experimental testing on a Beowulf cluster with 15 data nodes shows that the parallel implementation of the identification algorithm offers a speedup of up to 5X that of a similar multithreaded implementation. Further, we show that the combination of automated multiclass classification and feature selection speeds up the execution performance of the RandomForest machine learning algorithm by an average of 54% with less than a 2% average reduction in the algorithm's ability to correctly classify pulsars. The generalizability of these results is demonstrated by using two real-world radio astronomy data sets.

          Related collections

          Most cited references19

          • Record: found
          • Abstract: not found
          • Article: not found

          Wrappers for feature subset selection

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Selection of relevant features and examples in machine learning

              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              SMOTE: Synthetic Minority Over-sampling Technique

              An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of "normal" examples with only a small percentage of "abnormal" or "interesting" examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.
                Bookmark

                Author and article information

                Journal
                07 October 2018
                Article
                10.1145/3225058.3225101
                1810.03190
                7cde5c73-5bc5-46ed-bb3c-b5e2a316cdc3

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                In Proceedings of the 47th International Conference on Parallel Processing (ICPP 2018). ACM, New York, NY, USA, Article 11, 11 pages
                astro-ph.IM

                Instrumentation & Methods for astrophysics
                Instrumentation & Methods for astrophysics

                Comments

                Comment on this article