A Survey on Malicious Domains Detection through DNS Data Analysis

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Related collections

Most cited references 94

Record: found
Abstract: not found
Article: not found

Some Studies in Machine Learning Using the Game of Checkers

A. L. Samuel (1959)

0 comments Cited 502 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

The Role of Balanced Training and Testing Data Sets for Binary Classifiers in Bioinformatics

Qiong Wei, Roland L. Dunbrack (2013)

Training and testing of conventional machine learning models on binary classification problems depend on the proportions of the two outcomes in the relevant data sets. This may be especially important in practical terms when real-world applications of the classifier are either highly imbalanced or occur in unknown proportions. Intuitively, it may seem sensible to train machine learning models on data similar to the target data in terms of proportions of the two binary outcomes. However, we show that this is not the case using the example of prediction of deleterious and neutral phenotypes of human missense mutations in human genome data, for which the proportion of the binary outcome is unknown. Our results indicate that using balanced training data (50% neutral and 50% deleterious) results in the highest balanced accuracy (the average of True Positive Rate and True Negative Rate), Matthews correlation coefficient, and area under ROC curves, no matter what the proportions of the two phenotypes are in the testing data. Besides balancing the data by undersampling the majority class, other techniques in machine learning include oversampling the minority class, interpolating minority-class data points and various penalties for misclassifying the minority class. However, these techniques are not commonly used in either the missense phenotype prediction problem or in the prediction of disordered residues in proteins, where the imbalance problem is substantial. The appropriate approach depends on the amount of available data and the specific problem at hand.

0 comments Cited 76 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Phishing Detection: A Literature Survey

Mahmoud Khonji, Andrew Jones, Youssef Iraqi (2014)

0 comments Cited 43 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Title: ACM Computing Surveys

Abbreviated Title: ACM Comput. Surv.

Abbreviated Title: CSUR

Publisher: Association for Computing Machinery (ACM)

ISSN (Print): 03600300

Publication date Created: September 06 2018

Publication date (Print): July 06 2018

Volume: 51

Issue: 4

Pages: 1-36

Affiliations

[1 ]Qatar Computing Research Institute, HBKU, Qatar

[2 ]Eurecom, France

Article

DOI: 10.1145/3191329

SO-VID: cafd213b-c82a-444a-8054-5179e0d2e5b2

License:

http://www.acm.org/publications/policies/copyright_policy#Background

History

Data availability:

A Survey on Malicious Domains Detection through DNS Data Analysis

Read this article at

Related collections

Software for SAXS correction and analysis

Most cited references 94

Some Studies in Machine Learning Using the Game of Checkers

The Role of Balanced Training and Testing Data Sets for Binary Classifiers in Bioinformatics

Phishing Detection: A Literature Survey

Author and article information

Journal

Affiliations

Article

History

Comments

Comment on this article

Similar content 2,922

Cited by 7

Most referenced authors 591