Homepage for 17-803 "Empirical Methods" at Carnegie Mellon University

Project maintained by bvasiles Hosted on GitHub Pages — Theme by mattgraham

This is the archived site for the Fall 2018 offering of this course. Go to the current offering here.


Empirical methods play a key role in the evaluation of tools and technologies, and in testing the social and technical theories they embody. No matter what your research area is, chances are you will be conducing empirical studies as part of your research. Are you looking to evaluate a new algorithm? New tool? Analyze (big) data? Understand what challenges practitioners face in some domain? This course is a survey of empirical methods, appropriate for all computer science PhD students, including Software Engineering and Societal Computing.

This course provides an overview and hands on experience with a core of qualitative and quantitative empirical research methods, including interviews, qualitative coding, survey design, and large-scale mining and analysis of data. Students will mine and integrate data from and across online software repositories (e.g., GitHub and Stack Overflow) and employ a spectrum of data analysis techniques, ranging from statistical modeling to social network analysis.

There will be extensive reading with occasional student presentations about the reading in class, weekly homework assignments, and a semester-long research project for which students must prepare in-class kickoff and final presentations as well as a final report.

After completing this course, you will:


Th 09:00 - 11:20 a.m. in Wean 5328

Course materials and assignments on Canvas

Bogdan Vasilescu
WEH 5115

Course Syllabus and Policies

The syllabus covers course overview and objectives, evaluation, time management, late work policy, and collaboration policy.

Learning Goals

The learning goals describe what I want students to know or be able to do by the end of the semester. I evaluate whether learning goals have been achieved through assignments, written project reports, and in-class presentations.


We cover the following topics (slides or notes posted when available):

Date Topic Deadlines
08/30 Introduction  
09/06 Literature Review and Theory HW1 due (comparison of methods)
09/13 Interviews HW2 due (literature review)
09/20 Grounded Theory HW3 due (interviews)
09/27 Surveys HW4 due (grounded theory)
10/04 Introduction to Measurement (no slides) HW5 due (survey)
10/11 Your Research Project Proposal (no slides)  
10/18 Experimentation HW6 due (hypothesis testing)
10/25 Quasi-experimentation HW7 due (experiment)
11/01 Time Series Analysis HW8 due (regression)
11/08 Mixed-methods (no slides) HW9 due (time series analysis)
11/15 Text Mining (slides by David Blei) HW10 due (mixed methods)
11/22 No Class - Thanksgiving  
11/29 Social Network Analysis HW11 due (text mining)
12/06 Final Presentations (no slides) HW12 due (social network analysis)
12/13   Final project report due


Introduction (8/30)

We will start out by looking broadly over the range of empirical methods you might consider using, and the assumptions and philosophical points of view they rely on. We also want to hear about the kinds of research problems you are working on or plan to work on, and the sorts of empirical questions they give rise to. Please be prepared to share this with the class, as it will help us to have fruitful discussions if we know a little about each others’ research areas. We will also use the information to customize the course a bit to emphasize things for which there is a clear need.


🔹 Chapter 1 from Creswell, J. W., & Creswell, J. D. (2017). Research design: Qualitative, quantitative, and mixed methods approaches. Sage publications. (philosophical world views)

Literature Review and Theory (09/06)


🔹 Stol, K.-J., & Fitzgerald, B. (2015). Theory-oriented software engineering. Science of computer programming, 101, 79-98.

🔹 In the following paper, read up to “DESIGNING RECOMMENDERS FOR TWITTER” on p. 1187:

Chen, J., Nairn, R., Nelson, L., Bernstein, M., & Chi, E. (2010). Short and tweet: experiments on recommending content from information streams. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems.

🔹 In the following paper, read up to METHODOLOGY AND DATA SOURCES on page 302:

Mockus, A., Fielding, R. T., & Herbsleb, J. D. (2002). Two case studies of open source software development: Apache and Mozilla. ACM Transactions on Software Engineering and Methodology, 11(3), 309-346.

Discussion points:

Be prepared to contrast these two literature reviews. In each case, how much prior work was published? What kinds of gaps and questions were the papers addressing? How did the authors choose the papers they discussed, and the specific points they focused on?

Interviewing (9/13)


🔹 Paul Goodman’s (2005) Building Effective Interviewing Skills.

🔹 King, N. (2004). Using interviews in qualitative research. In C. Cassell & G. Symon (Eds.), Essential Guide to Qualitative Methods in Organizational Research (pp. 11-22). Loondon: Sage.

🔹 Seidman, I. (2012). Interviewing as qualitative research: A guide for researchers in education and the social sciences: Teachers college press. (Ch 4).

🔹 Seidman, I. (2012). Interviewing as qualitative research: A guide for researchers in education and the social sciences: Teachers college press. (Ch. 6).


Grounded Theory (9/20)


🔹 Miles, M.B, Huberman, A.M., & Saldana, J. (2014) Qualitative Data Analysis: A Methods Sourcebook. 3d Ed. Sage: Los Angeles.:


Surveys (9/27)


🔹 Chapters from Dillman, D., Smyth, J. D., & Christian, L. M. (2014). Internet, Phone, Mail and Mixed-Mode Surveys: The Tailored Design Method (4th ed.). Hoboken, NJ: Wiley.


Introduction to Measurement (10/4)


🔹 Chapter 10 from C. Wohlin et al., Experimentation in Software Engineering, Springer-Verlag Berlin Heidelberg 2012

🔹 Chapter 6 from F. Shull et al. (eds.), Guide to Advanced Empirical Software Engineering. Springer 2008 (similar content as the Wohlin chapter but slightly different presentation; read one or the other)

🔹 Chapter 6 from MacKenzie. Human-Computer Interaction. Elsevier 2013

Optional readings:

🔹 Lawrence, N. W. (2007). The basics of Social Research. Qualitative and Quantitative Approaches:

Example papers for in class presentations:

Experiments (10/18)

Please read the SCC chapters in preparation for the lecture. There are no paper presentations assigned, we will discuss the examples in class.


🔹 Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference: Wadsworth Cengage learning:

Example papers:

Plus the following, in this order:

Quasi-experimental Design & Linear Regression (10/25)


🔹 Woolridge, J. M. (2003). Introductory econometrics: A modern approach. Thomson, Mason. Chapter 2 - Simple Regression [skim]

🔹 F.E. Harrell, Jr., Regression Modeling Strategies, Springer Series in Statistics, Chapters 1&2 - Regression general aspects: [Chapter 1: skim] [Chapter 2: read 2.1–2.3, 2.7]

Optional reading:

🔹 Shadish, Cook, & Campbell, Experimental and Quasi-Experimental Designs for Generalized Causal Inference, Chapter 3, Construct Validity and External Validity.

🔹 Oktay, H., Taylor, B. J., & Jensen, D. D. (2010, July). Causal discovery in social media using quasi-experimental designs. In Proceedings of the First Workshop on Social Media Analytics (pp. 1-9). ACM.


Time Series Analysis (11/1)


🔹 Cowpertwait, P. S., & Metcalfe, A. V. (2009). Introductory time series with R. Springer Science & Business Media. [great practical book with applications in R; read selectively for topics you’re interested in and otherwise keep for reference; decompositions, e.g., seasonality+trend, are particularly useful]

🔹 Woolridge, J. M. (2003). Introductory econometrics: A modern approach. Thomson, Mason. Chapter 10 - Time series [read if you want to see how the sausage is made, otherwise keep for reference]

🔹 Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference: Wadsworth Cengage learning:

🔹 Wagner, A. K., Soumerai, S. B., Zhang, F., & Ross‐Degnan, D. (2002). Segmented regression analysis of interrupted time series studies in medication use research. Journal of clinical pharmacy and therapeutics, 27(4), 299-309. [great example of how to apply the technique; read carefully after you’ve skimmed Shadish]

Examples to discuss in class:

Mixed-Method Design (11/8)


🔹 Creswell Chapter 10

🔹 Venkatesh, V., Brown, S. A., & Bala, H. (2013). Bridging the qualitative-quantitative divide: Guidelines for conducting mixed methods research in information systems. MIS quarterly, 37(1), 21-54.

🔹 Onwuegbuzie, A. J., & Collins, K. M. (2007). A typology of mixed methods sampling designs in social science research. The qualitative report, 12(2), 281-316. [skim only; good discussion of how to select sample sizes for mixed-methods research, depending on the study goals]

Examples: In your presentations in class, describe clearly: which methods are being mixed in the paper; how are the different methods combined; which threats to validity of each method does the mixture alleviate.

Text Mining (11/15)


🔹 Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing (Vol. 999). Cambridge: MIT press. - Chapter 1 [skim, interesting background reading]

🔹 Bird, C., Menzies, T., & Zimmermann, T. (Eds.). (2015). The Art and Science of Analyzing Software Data. Elsevier:

Examples to discuss in class:

Social Network Analysis (11/29)


🔹 From “Network Science” by Albert-László Barabási. Cambridge University Press, 2016:

🔹 From “Networks, Crowds, and Markets: Reasoning about a Highly Connected World.” by David Easley and Jon Kleinberg. Cambridge University Press, 2010:

Examples to discuss in class:

Additional examples: