Electronic Theses and Dissertations

A Comparison of Logistic, RIDGE, and LASSO Regression with Heart Failure Risk Data: Effects of Sample Size, Predictor Correlation, and Predictor Weight on Outcome Accuracy

Mahmoud M. AlJuhani, University of DenverFollow

Date of Award

12-2022

Document Type

Dissertation

Degree Name

Ph.D.

Organizational Unit

College of Natural Science and Mathematics, Mathematics

First Advisor

Nicholas Cutforth

Second Advisor

Frederique Chevillot

Third Advisor

Kathy Green

Fourth Advisor

Antonio Olmos

Keywords

Collinearity, LASSO, Logistic regression, RIDGE, Sample size, Weight

Abstract

Logistic Regression (LR), LASSO regression, and RIDGE regression are standard classification techniques for predicting a dichotomous output. Since these methods are applied for similar purposes and have different features, it is crucial to evaluate the performance of these methods under different controlled conditions. With this information, researchers can apply the optimal method for specific conditions.

Following previous research, which reported the effects of conditions such as sample size and multicollinearity on the performance of the classification methods, this research focused on the effects of when sample size, level of predictor collinearity, and predictor variable weight are controlled on the performance of LR, LASSO, and RIDGE regressions. Data were simulated with 100 iterations that generated a total of n = 2,400 observations in R statistical software. A factorial ANOVA with follow-ups was employed to evaluate the effect of conditions on the performance of each technique as measured by accuracy and F-measure.

In most conditions for the two outcome performance measures (accuracy and F-measure), the highest effect on performances was observed from the predictor variable weight. However, when the weight was low, all three regression methods were found to have an overall better performance under high correlation and a large sample size. Moreover, the models with high-weight conditions suppressed the effects of every other controlled condition on accuracy and F-measure output values. Therefore, when the study data conditions include a high-weighted variable, regardless of which method was used or which level of correlation or sample size was selected, there were no marked differences between the methods.

Based on these results, researchers are encouraged first to consider the problem they are trying to solve. Data nature and feature understanding can lead to more accurate and efficient methods implementation while making it easier to pivot to new analytic problems, adapt when model accuracy drifts, and save data scientists and business users considerable time and effort.

Publication Statement

Rights Holder

Mahmoud M. AlJuhani

Provenance

Received from ProQuest

File Format

application/pdf

Language

File Size

315 pgs

Recommended Citation

AlJuhani, Mahmoud M., "A Comparison of Logistic, RIDGE, and LASSO Regression with Heart Failure Risk Data: Effects of Sample Size, Predictor Correlation, and Predictor Weight on Outcome Accuracy" (2022). Electronic Theses and Dissertations. 2169.
https://digitalcommons.du.edu/etd/2169

Copyright date

2022

Discipline

Statistics

Download

Available for download on Friday, April 11, 2025

Included in

Multivariate Analysis Commons, Statistical Methodology Commons

COinS

Digital Commons @ DU

Electronic Theses and Dissertations

A Comparison of Logistic, RIDGE, and LASSO Regression with Heart Failure Risk Data: Effects of Sample Size, Predictor Correlation, and Predictor Weight on Outcome Accuracy

Date of Award

Document Type

Degree Name

Organizational Unit

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Keywords

Abstract

Publication Statement

Rights Holder

Provenance

File Format

Language

File Size

Recommended Citation

Copyright date

Discipline

Included in

Browse

Search

Author Corner

Digital Commons @ DU

Electronic Theses and Dissertations

A Comparison of Logistic, RIDGE, and LASSO Regression with Heart Failure Risk Data: Effects of Sample Size, Predictor Correlation, and Predictor Weight on Outcome Accuracy

Author

Date of Award

Document Type

Degree Name

Organizational Unit

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Keywords

Abstract

Publication Statement

Rights Holder

Provenance

File Format

Language

File Size

Recommended Citation

Copyright date

Discipline

Included in

Share

Browse

Search

Author Corner