empirical-methods

Homepage for 17-803 "Empirical Methods" at Carnegie Mellon University


Project maintained by bvasiles Hosted on GitHub Pages — Theme by mattgraham

L15: Linear Regression (Part II) (pdf, video)

Lecture15-Regression-Diagnostics

This is the second lecture in a series dedicated to regression modeling. We talked about some of the things that can go wrong when estimating linear models and how to diagnose those, how to model categorical variables and interpret the corresponding regression coefficients, how to model and interpret interaction effects, and how to work with standardized regression coefficients.

The importance of having a good understanding of linear regression before studying more complex statistical models cannot be overstated.

Lecture Readings

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112, p. 18). Springer.

Chapter 3 reviews some of the key ideas underlying the linear regression model, as well as the least squares approach that is most commonly used to fit this model.


Grolemund, G., & Wickham, H. (2018). R for data science.

This book will teach you how to do data science with R: You’ll learn how to get your data into R, get it into the most useful structure, transform it, visualise it, and model it. In this book, you will find a practicum of skills for data science. Just as a chemist learns how to clean test tubes and stock a lab, you’ll learn how to clean data and draw plots—and many other things besides.

Chapters 22-24 (Modeling) are the most relevant for this lecture.


Bruce, P., Bruce, A., & Gedeck, P. (2020). Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python. O’Reilly Media.

Chapter 4 covers regression modeling. Note the emphasis on prediction, instead of the more common goal of explanation in empirical research.


Goodman, S. (2008). A dirty dozen: Twelve p-value misconceptions. In Seminars in Hematology (Vol. 45, No. 3, pp. 135-140). WB Saunders.

Among others, the paper addresses a common false belief that the probability of a conclusion being in error can be calculated from the data in a single experiment without reference to external evidence or the plausibility of the underlying mechanism.

Additional Readings

Woolridge, J. M. (2003). Introductory econometrics: A modern approach. Thomson, Mason.

Probably the most in-depth coverage of regression modeling possible. More emphasis on theory than other sources.


F.E. Harrell, Jr., Regression Modeling Strategies, Springer Series in Statistics.

  • Chapter 1: Introduction
  • Chapter 2: General aspects of fitting regression models (especially 2.1–2.3, 2.7)
  • Chapter 4: Multivariable modeling strategies

Freedman, D., Pisani, R., & Purves, R. (2007). Statistics. W. W. Norton & Company.