One of the major research directions in bioinformatics is that of assigning superfamily
classification to a given set of proteins. The classification reflects the structural,
evolutionary, and functional relatedness. These relationships are embodied in a hierarchical
classification, such as the Structural Classification of Protein (SCOP), which is
mostly manually curated. Such a classification is essential for the structural and
functional analyses of proteins. Yet a large number of proteins remain unclassified.
In this study, we have proposed an unsupervised machine learning approach to classify
and assign a given set of proteins to SCOP superfamilies. In the method, we have constructed
a database and similarity matrix using P-values obtained from an all-against-all BLAST
run and trained the network with the ART2 unsupervised learning algorithm using the
rows of the similarity matrix as input vectors, enabling the trained network to classify
the proteins from 0.82 to 0.97 f-measure accuracy. The performance of ART2 has been
compared with that of spectral clustering, Random forest, SVM, and HHpred. ART2 performs
better than the others except HHpred. HHpred performs better than ART2 and the sum
of errors is smaller than that of the other methods evaluated.