Comparison of the Penalized Regression Techniques with Classical Least Squares in Minimizing the Effect of Multicollinearity

Filed in Articles by on November 26, 2022

 – Comparison of the Penalized Regression Techniques with Classical Least Squares in Minimizing the Effect of Multicollinearity – 

Download Comparison of the Penalized Regression Techniques with Classical Least Squares in Minimizing the Effect of Multicollinearity project materials: This project material is ready for students who are in need of it to aid their research.

ABSTRACT

A penalized regression techniques which is a variable selection has been developed specifically to eliminate the problem of multicollinearity and also reduce the flaws inherent in the prediction accuracy of the classical ordinary least squares (OLS) regression technique. In this dissertation, we focus on the numerical study of four penalized regression methods.

A diabetes dataset was used to compare four of these well-known techniques, namely: Least Absolute Shrinkage Selection Operator (LASSO), Smoothly Clipped Absolute Deviation(SCAD) and Correlation Adjusted Elastic Net (CAEN) and Elastic Net (EN).

The whole paths of results (in λ) for the LASSO, SCAD and CAEN models were calculated using the path wise Cyclic Coordinate Descent (CCD) algorithms– in glmnetin R. We used 10-fold cross validation (CV) within glmnetto entirely search for the optimal λ. Regularized profile plots of the coefficient paths for the three methods were also shown.

Predictive accuracy was also assessed using the mean squared error (MSE) and the penalized regression models were able to produce feasible and efficient models capable of capturing the linearity in the data than the ordinary least squares model.

TABLE OF CONTENTS

Page Title Page……………………… 2

Declaration……………………….. 3

Certification…………………… 4

Dedication………………………….. 5

Acknowledgements………………. 6

Abstract…………………. 7

Table of Contents……………………….. 8

List of Tables……………………. 10

List of Figures……………………. 11

CHAPTER ONE…………………………. 12

INTRODUCTION……………………. 12

  • Background of the Study………………….. 12
  • Research Motivation………………….. 13
  • Statement of the Problem……………………… 13
  • Aim and Objectives of the Study………………………… 14
  • Significance of the Study…………………………. 14
  • Scope and Limitations of the Study…………………… 14

CHAPTER TWO

LITERATURE REVIEW…………….. 15

  • Introduction………………… 15
  • Classical Regression Methods…………………… 15
  • Penalized Regression……………….. 18
    • LASSO Regression……………… 19
    • Elastic Net Regression……………. 22
    • Correlation Adjusted Elastic Net (CAEN) Regression…………… 23
    • Smoothly Clipped Absolute Deviation (SCAD) Regression………… 23
  • Application of Penalized Regression………………….. 24

CHAPTER THREE

METHODOLOGY…………………………. 26

  • Penalized Regression Techniques………………….. 26
    • LASSO Regression Approach…………………… 28
    • Elastic Net Regression Approach……………….. 30
    • Correlation Adjusted Elastic Net Approach……………… 31
    • SCAD Regression Approach…………………… 32
  • Ordinary Least Squares……………………. 33
  • Assumptions of Multiple Linear Regression…………. 33

3.4. Variance Inflation Factor……………. 34

  • Mean Square Error………………………… 35
  • Choice of turning Parameters…………………… 36
  • Source of data…………………………… 36

CHAPTER FOUR

RESULTS AND DISCUSSION…………….. 37

  • Introduction…………. 37
  • Determining the Ordinary least squares regression…………. 37
  • Determining the Correlation among independent variables…………… 39
  • : Results Based LASSO regression………….. 44
  • Results Based Elastic net regression………………… 44
  • Results Based Correlation adjusted elastic net regression………. 48
  • : Smoothly Clipped Absolute Deviation regression……….. 51

CHAPTER FIVE

SUMMARY, CONCLUSION AND RECOMMENDATION………. 55

  • Summary……………………….. 55
  • Conclusion…………………….. 55
  • Recommendation………………. 55
  • Suggestion for further study…………… 56
  • Contribution to knowledge………………. 56

REFERENCES……….. 57

APPENDIX A  61

INTRODUCTION 

In order to reduce possible biasness, large number of predictor variables was introduced in  a model and that lead to a serious concern of multicollinearity among the predictor variables in multiple linear regressions, variable selection is an important issue.(Mathew and Yahaya, 2015)

Multicollinearity and high dimensionality are two problems and computational issue that bring challenges to regression analysis. To deal with these challenges, variables selection and shrinkage estimation are becoming important and useful.

The traditional approach of automatic selection (such as forward selection, backward elimination and stepwise selection) and best subset selection are computationally expensive and may not necessarily produce the best model.

Multicollinearity problem is being dealt with by Penalized least square (PLS) method by putting some constraints on the values of the parameters estimated. The aftermath is that the entries of the variance covariance matrix are significantly reduced. When multicollinearity exist that predictor’s variables that are highly correlated form some groups.

One of the way collinearity problem can be dealt with is to remove one or more of the predictor variables within the same group, by making decision which among the group variables is to be eliminated tend to be difficult and complicated.

The aftermath of multicollinearity is that the parameter estimator and their variance or standard error tends to be large and prediction may be very inaccurate.

REFERENCES

Adams, J. (1990). A computer experiment to evaluate regression strategies.Proceedings of the Statistical Computing Section.American Statistical Association, 3(4): 55-62.

Andre, N., Young, T. M. and Rials, T. (2006). Online monitoring of the buffer capacity of particle board furnish by near-infrared spectroscopy. Applied Spectroscopy, 60(10), 1204-1209.

Ayers, K. L., and Cordell, H. J. (2010).SNP selection in genome-wide and candidate gene studies via penalized logistic regression.Genet.Epidemiol.34, 879–89.1

Bovelstad. H. M.,Nygard, S., Storvold, H. I., Aldrin, M., Borgan, O., and Frigessi, A., (2007) predicting survival from microray data a comparative study. Bioinformatics. 23(16): 2080- 2087

Beer, D.G. Kardia. S. I., Huang. C.C, Giordano, T.J. and Levin. A. M. (2002). Gene-expression Profiles predict survival Patients with lung adenocarcinoma. Nat. Med., 8(8): 816-824.

Buhlmann, P., and VandeGeer, S. (2011). Statistics for High Dimensional Data: methods, theory and applications. Springer Science and Business Media.

Cho, S., Kim,K.,Kim,Y.J.,Lee,J.K.,(2010). Joint identification of multiple genetic variants via elastic-net variable selection in a genome-wide association analysis.Annals of Human Genetics. 74(5), 416-428

Comments are closed.

Hey Hi

Don't miss this opportunity

Enter Your Details