Date of Award

6-1-2018

Document Type

Dissertation

Degree Name

Ph.D.

Department

Quantitative Research Methods

First Advisor

Antonio Olmos, Ph.D.

Keywords

Classification, Institutional research, Logistic regression, Machine learning, Multilayer perceptron, Persistence

Abstract

Multilayer perceptron neural networks, Gaussian naïve Bayes, and logistic regression classifiers were compared when used to make early predictions regarding oneyear college student persistence. Two iterations of each model were built, utilizing a grid search process within 10-fold cross-validation in order to tune model parameters for optimal performance on the classification metrics F-Beta and F-1. The results of logistic regression, the historically favored approach in the domain, were compared to the alternative approaches of multilayer perceptron and naïve Bayes based primarily on FBeta and F-1 score performance on a hold-out dataset. A single logistic regression model was found to perform optimally on both F-1 and F-Beta. The logistic regression model outperformed all four of the individual alternative models on the evaluation criteria of concern. A majority voting ensemble and two additional ensembles with empirically derived weights were also applied to the hold-out set. The logistic regression model also outperformed all three ensemble models on the scoring metrics of concern. A visualization technique for comparing and summarizing case-level classifier performance was introduced. The features used in the modeling process comprised traditional and non-traditional elements.

Provenance

Received from ProQuest

Rights holder

Ben Siebrase

File size

153 p.

File format

application/pdf

Language

en

Discipline

Statistics, Higher education

Share

COinS