Homepage for 17-803 "Empirical Methods" at Carnegie Mellon University
This is the archived site for the Fall 2018 offering of this course. Go to the current offering here.
Empirical methods play a key role in the evaluation of tools and technologies, and in testing the social and technical theories they embody. No matter what your research area is, chances are you will be conducing empirical studies as part of your research. Are you looking to evaluate a new algorithm? New tool? Analyze (big) data? Understand what challenges practitioners face in some domain? This course is a survey of empirical methods, appropriate for all computer science PhD students, including Software Engineering and Societal Computing.
This course provides an overview and hands on experience with a core of qualitative and quantitative empirical research methods, including interviews, qualitative coding, survey design, and large-scale mining and analysis of data. Students will mine and integrate data from and across online software repositories (e.g., GitHub and Stack Overflow) and employ a spectrum of data analysis techniques, ranging from statistical modeling to social network analysis.
There will be extensive reading with occasional student presentations about the reading in class, weekly homework assignments, and a semester-long research project for which students must prepare in-class kickoff and final presentations as well as a final report.
After completing this course, you will:
Th 09:00 - 11:20 a.m. in Wean 5328
Course materials and assignments on Canvas
Bogdan Vasilescu
vasilescu@cmu.edu
WEH 5115
The syllabus covers course overview and objectives, evaluation, time management, late work policy, and collaboration policy.
The learning goals describe what I want students to know or be able to do by the end of the semester. I evaluate whether learning goals have been achieved through assignments, written project reports, and in-class presentations.
We cover the following topics (slides or notes posted when available):
Date | Topic | Deadlines |
---|---|---|
08/30 | Introduction | |
09/06 | Literature Review and Theory | HW1 due (comparison of methods) |
09/13 | Interviews | HW2 due (literature review) |
09/20 | Grounded Theory | HW3 due (interviews) |
09/27 | Surveys | HW4 due (grounded theory) |
10/04 | Introduction to Measurement (no slides) | HW5 due (survey) |
10/11 | Your Research Project Proposal (no slides) | |
10/18 | Experimentation | HW6 due (hypothesis testing) |
10/25 | Quasi-experimentation | HW7 due (experiment) |
11/01 | Time Series Analysis | HW8 due (regression) |
11/08 | Mixed-methods (no slides) | HW9 due (time series analysis) |
11/15 | Text Mining (slides by David Blei) | HW10 due (mixed methods) |
11/22 | No Class - Thanksgiving | |
11/29 | Social Network Analysis | HW11 due (text mining) |
12/06 | Final Presentations (no slides) | HW12 due (social network analysis) |
12/13 | Final project report due |
We will start out by looking broadly over the range of empirical methods you might consider using, and the assumptions and philosophical points of view they rely on. We also want to hear about the kinds of research problems you are working on or plan to work on, and the sorts of empirical questions they give rise to. Please be prepared to share this with the class, as it will help us to have fruitful discussions if we know a little about each others’ research areas. We will also use the information to customize the course a bit to emphasize things for which there is a clear need.
Methods:
🔹 Chapter 1 from Creswell, J. W., & Creswell, J. D. (2017). Research design: Qualitative, quantitative, and mixed methods approaches. Sage publications. (philosophical world views)
Methods:
🔹 Stol, K.-J., & Fitzgerald, B. (2015). Theory-oriented software engineering. Science of computer programming, 101, 79-98.
🔹 In the following paper, read up to “DESIGNING RECOMMENDERS FOR TWITTER” on p. 1187:
Chen, J., Nairn, R., Nelson, L., Bernstein, M., & Chi, E. (2010). Short and tweet: experiments on recommending content from information streams. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems.
🔹 In the following paper, read up to METHODOLOGY AND DATA SOURCES on page 302:
Mockus, A., Fielding, R. T., & Herbsleb, J. D. (2002). Two case studies of open source software development: Apache and Mozilla. ACM Transactions on Software Engineering and Methodology, 11(3), 309-346.
Discussion points:
Be prepared to contrast these two literature reviews. In each case, how much prior work was published? What kinds of gaps and questions were the papers addressing? How did the authors choose the papers they discussed, and the specific points they focused on?
Methods:
🔹 Paul Goodman’s (2005) Building Effective Interviewing Skills.
🔹 King, N. (2004). Using interviews in qualitative research. In C. Cassell & G. Symon (Eds.), Essential Guide to Qualitative Methods in Organizational Research (pp. 11-22). Loondon: Sage.
🔹 Seidman, I. (2012). Interviewing as qualitative research: A guide for researchers in education and the social sciences: Teachers college press. (Ch 4).
🔹 Seidman, I. (2012). Interviewing as qualitative research: A guide for researchers in education and the social sciences: Teachers college press. (Ch. 6).
Examples:
Grinter, R. E., & Palen, L. (2002). Instant messaging in teen life, Paper presented at the 2002 ACM Conference on Computer-Supported Cooperative Work (pp. 21-30): ACM.
Dabbish, L., Stuart, C., Tsay, J., & Herbsleb, J. (2012). Social Coding in GitHub: Transparency and Collaboration in an Open Software Repository. Paper presented at the 2012 ACM Conference on Computer-Supported Cooperative Work, Seattle, WA.
Manotas, I., Bird, C., Zhang, R., Shepherd, D., Jaspan, C., Sadowski, C., . . . Clause, J. (2016). An empirical study of practitioners’ perspectives on green software engineering. Paper presented at the International Conference on Software Engineering (ICSE), Austin, TX.
Methods:
🔹 Miles, M.B, Huberman, A.M., & Saldana, J. (2014) Qualitative Data Analysis: A Methods Sourcebook. 3d Ed. Sage: Los Angeles.:
Examples:
Razavi, M. N., & Iverson, L. (2006). A grounded theory of information sharing behavior in a personal learning space, Proceedings of the ACM Conference on Computer Supported Cooperative Work (pp. 459-468).
de Souza, C. R., & Redmiles, D. F. (2008). An empirical study of software developers’ management of dependencies and changes, Proceedings of the 30th International Conference on Software Engineering (pp. 241-250).
Deterding, S. (2016). Contextual autonomy support in video game play: a grounded theory. Paper presented at the Conference on Human Factors in Computing Systems (CHI).
Methods:
🔹 Chapters from Dillman, D., Smyth, J. D., & Christian, L. M. (2014). Internet, Phone, Mail and Mixed-Mode Surveys: The Tailored Design Method (4th ed.). Hoboken, NJ: Wiley.
Examples:
Methods:
🔹 Chapter 10 from C. Wohlin et al., Experimentation in Software Engineering, Springer-Verlag Berlin Heidelberg 2012
🔹 Chapter 6 from F. Shull et al. (eds.), Guide to Advanced Empirical Software Engineering. Springer 2008 (similar content as the Wohlin chapter but slightly different presentation; read one or the other)
🔹 Chapter 6 from MacKenzie. Human-Computer Interaction. Elsevier 2013
Optional readings:
🔹 Lawrence, N. W. (2007). The basics of Social Research. Qualitative and Quantitative Approaches:
Example papers for in class presentations:
Please read the SCC chapters in preparation for the lecture. There are no paper presentations assigned, we will discuss the examples in class.
Methods:
🔹 Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference: Wadsworth Cengage learning:
Example papers:
Plus the following, in this order:
Sobel, A. E. K., & Clarkson, M. R. (2002). Formal methods application: An empirical tale of software development. IEEE Transactions on Software Engineering, 28(3), 308-320.
Berry, D. M., & Tichy, W. F. (2003). Comments on “Formal methods application: an empirical tale of software development”. IEEE Transactions on Software Engineering, 29(6), 567-571.
Sobel, A. E. K., & Clarkson, M. R. (2003). Response to “Comments on ‘Formal methods application: an empirical tale of software development’”. IEEE Transactions on Software Engineering, 29(6), 572-575.
Methods:
🔹 Woolridge, J. M. (2003). Introductory econometrics: A modern approach. Thomson, Mason. Chapter 2 - Simple Regression [skim]
🔹 F.E. Harrell, Jr., Regression Modeling Strategies, Springer Series in Statistics, Chapters 1&2 - Regression general aspects: [Chapter 1: skim] [Chapter 2: read 2.1–2.3, 2.7]
Optional reading:
🔹 Shadish, Cook, & Campbell, Experimental and Quasi-Experimental Designs for Generalized Causal Inference, Chapter 3, Construct Validity and External Validity.
🔹 Oktay, H., Taylor, B. J., & Jensen, D. D. (2010, July). Causal discovery in social media using quasi-experimental designs. In Proceedings of the First Workshop on Social Media Analytics (pp. 1-9). ACM.
Examples:
Methods:
🔹 Cowpertwait, P. S., & Metcalfe, A. V. (2009). Introductory time series with R. Springer Science & Business Media. [great practical book with applications in R; read selectively for topics you’re interested in and otherwise keep for reference; decompositions, e.g., seasonality+trend, are particularly useful]
🔹 Woolridge, J. M. (2003). Introductory econometrics: A modern approach. Thomson, Mason. Chapter 10 - Time series [read if you want to see how the sausage is made, otherwise keep for reference]
🔹 Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference: Wadsworth Cengage learning:
🔹 Wagner, A. K., Soumerai, S. B., Zhang, F., & Ross‐Degnan, D. (2002). Segmented regression analysis of interrupted time series studies in medication use research. Journal of clinical pharmacy and therapeutics, 27(4), 299-309. [great example of how to apply the technique; read carefully after you’ve skimmed Shadish]
Examples to discuss in class:
Kenmei, B., Antoniol, G., & Di Penta, M. (2008). Trend analysis and issue prediction in large-scale open source systems. In Software Maintenance and Reengineering, 2008. CSMR 2008. 12th European Conference on (pp. 73-82). IEEE.
Trockman. A., Zhou, S., Kästner, C., & Vasilescu. B. (2017). Adding Sparkle to Social Coding: An Empirical Study of Repository Badges in the npm Ecosystem. [there’s a lot in this paper, focus only on one example of applying ITS/RDD, I recommend dependency management]
Jadidi, M., Karimi, F., & Wagner, C. (2017). Gender Disparities in Science? Dropout, Productivity, Collaborations and Success of Male and Female Computer Scientists. arXiv preprint arXiv:1704.05801.
Methods:
🔹 Creswell Chapter 10
🔹 Venkatesh, V., Brown, S. A., & Bala, H. (2013). Bridging the qualitative-quantitative divide: Guidelines for conducting mixed methods research in information systems. MIS quarterly, 37(1), 21-54.
🔹 Onwuegbuzie, A. J., & Collins, K. M. (2007). A typology of mixed methods sampling designs in social science research. The qualitative report, 12(2), 281-316. [skim only; good discussion of how to select sample sizes for mixed-methods research, depending on the study goals]
Examples: In your presentations in class, describe clearly: which methods are being mixed in the paper; how are the different methods combined; which threats to validity of each method does the mixture alleviate.
Methods:
🔹 Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing (Vol. 999). Cambridge: MIT press. - Chapter 1 [skim, interesting background reading]
🔹 Bird, C., Menzies, T., & Zimmermann, T. (Eds.). (2015). The Art and Science of Analyzing Software Data. Elsevier:
Examples to discuss in class:
Methods:
🔹 From “Network Science” by Albert-László Barabási. Cambridge University Press, 2016:
🔹 From “Networks, Crowds, and Markets: Reasoning about a Highly Connected World.” by David Easley and Jon Kleinberg. Cambridge University Press, 2010:
Examples to discuss in class:
Additional examples: