Date of Award
Quantitative Research Methods
Antonio Olmos, Ph.D.
Classification, Institutional research, Logistic regression, Machine learning, Multilayer perceptron, Persistence
Multilayer perceptron neural networks, Gaussian naïve Bayes, and logistic regression classifiers were compared when used to make early predictions regarding one-year college student persistence. Two iterations of each model were built, utilizing a grid search process within 10-fold cross-validation in order to tune model parameters for optimal performance on the classification metrics F-Beta and F-1. The results of logistic regression, the historically favored approach in the domain, were compared to the alternative approaches of multilayer perceptron and naïve Bayes based primarily on FBeta and F-1 score performance on a hold-out dataset. A single logistic regression model was found to perform optimally on both F-1 and F-Beta. The logistic regression model outperformed all four of the individual alternative models on the evaluation criteria of concern. A majority voting ensemble and two additional ensembles with empirically derived weights were also applied to the hold-out set. The logistic regression model also outperformed all three ensemble models on the scoring metrics of concern. A visualization technique for comparing and summarizing case-level classifier performance was introduced. The features used in the modeling process comprised traditional and non-traditional elements.
Copyright is held by the author. User is responsible for all copyright compliance.
Siebrase, Ben, "Classification of One-Year Student Persistence: A Machine Learning Approach" (2018). Electronic Theses and Dissertations. 1514.
Received from ProQuest
Statistics, Higher education