Comparison of classification techniques in biinformatics
Rashpal Ahluwalia, (West Virginia University), rashpal.ahluwalia@mail.wvu.edu, and
Sundar Chidambaram, (West Virginia University), schidam@mix.wvu.edu
Abstract
This paper provides a comparison of Discriminant Function Analysis (DFA), Logistic Regression, Decision Tree, and Artificial Neural Network algorithms utilized in bioinformatics. DFA is used to predict group membership in naturally occurring groups. Its goal is to find a dimension along which the groups differ. It is used when the Dependent Variable (DV) is predicted from a set of Independent Variables (IVs). The prediction success of DFA is determined by the choice of the predictors. DFA assumes the predictors to be normally distributed and linearly related. DFA provides accurate predictions when the group sizes are equal and when IVs are continuous and well distributed. Logistic Regression (LR) allows the prediction of a DV from a set of IVs that may be discrete, continuous, or a mix. The models produced by LR are non-linear. Unlike DFA, LR makes no assumptions on the predictor variables. It predicts the probability of a particular outcome for each sample. It is also robust for complex datasets. Decision Trees (DT) are generally used to predict discrete valued outputs. DTs can generally be represented by a set of if-then rules. DTs are suitable when the instances are represented by disjoint values and when the training data contain errors or has missing values. The classification algorithms utilized by Artificial Neural Networks (ANN) provide a better generalization when compared to other classification algorithms. The generalization in ANN is influenced by three critical factors: learning rule, network architecture and training set. The ANN algorithm discussed in this paper is Cascade-Correlation, which starts with a minimal network and trains the network by adding hidden nodes dynamically. The results obtained from the four classes of algorithms are analyzed for accuracy, sensitivity and specificity.