empirical-methods

Homepage for 17-803 "Empirical Methods" at Carnegie Mellon University


Project maintained by bvasiles Hosted on GitHub Pages — Theme by mattgraham

L14: Linear Regression Diagnostics (pdf, video)

Lecture14-Regression-Diagnostics

This is the second lecture in a series dedicated to regression modeling. We talked about some of the things that can go wrong when estimating linear models and how to diagnose those.

The importance of having a good understanding of linear regression before studying more complex statistical models cannot be overstated.

Lecture Readings

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112, p. 18). Springer.

Chapter 3 reviews some of the key ideas underlying the linear regression model, as well as the least squares approach that is most commonly used to fit this model.


Grolemund, G., & Wickham, H. (2018). R for data science.

This book will teach you how to do data science with R: You’ll learn how to get your data into R, get it into the most useful structure, transform it, visualise it, and model it. In this book, you will find a practicum of skills for data science. Just as a chemist learns how to clean test tubes and stock a lab, you’ll learn how to clean data and draw plots—and many other things besides.

Chapters 22-24 (Modeling) are the most relevant for this lecture.


Bruce, P., Bruce, A., & Gedeck, P. (2020). Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python. O’Reilly Media.

Chapter 4 covers regression modeling. Note the emphasis on prediction, instead of the more common goal of explanation in empirical research.


Goodman, S. (2008). A dirty dozen: Twelve p-value misconceptions. In Seminars in Hematology (Vol. 45, No. 3, pp. 135-140). WB Saunders.

Among others, the paper addresses a common false belief that the probability of a conclusion being in error can be calculated from the data in a single experiment without reference to external evidence or the plausibility of the underlying mechanism.

Additional Readings

Woolridge, J. M. (2003). Introductory econometrics: A modern approach. Thomson, Mason.

Probably the most in-depth coverage of regression modeling possible. More emphasis on theory than other sources.


F.E. Harrell, Jr., Regression Modeling Strategies, Springer Series in Statistics.

  • Chapter 1: Introduction
  • Chapter 2: General aspects of fitting regression models (especially 2.1–2.3, 2.7)
  • Chapter 4: Multivariable modeling strategies

Freedman, D., Pisani, R., & Purves, R. (2007). Statistics. W. W. Norton & Company.