·¬ÇÑÉçÇø

 

PP422     
Data Science for Public Policy

This information is for the 2024/25 session.

Teacher responsible

Dr Casey Kearney

Availability

This course is compulsory on the MPA in Data Science for Public Policy. This course is not available as an outside option.

Pre-requisites

Students must have completed Pre-Sessional Coding and Mathematics Bootcamp (PP407).

This will ensure that students have basic fluency in Maths and Statistics along with Python and its main Data Science libraries. 

Course content

This course covers the theory and practice of the Data Science project lifecycle in Python for Public Policy, from problem definition and data sourcing/cleaning to exploration, visualization, and modelling. Emphasis will be placed on identifying problems that are suitable for different Data Science techniques and on good practices for managing data. Linear and logistic models and regularization techniques will be covered in the AT and Machine Learning, Clustering and introductory text analysis models will be left for the WT. Key concepts and ideas underlying modelling (bias vs. variance, types of error, training vs. test data) and data ethics and data science ethics will be illustrated and implemented with examples from healthcare, education, urban policy, international development, and other policy areas. By the end of the course, students will have a strong coding workflow and will be able to source and experiment with data for analysis and research, both individually and in a collaborative environment.

Teaching

15 hours of lectures and 15 hours of seminars in the AT. 15 hours of lectures and 15 hours of seminars in the WT.

Formative coursework

Students will be expected to produce weekly problem sets throughout the AT and WT.

Indicative reading


These books provide an excellent starting point and can be used as the main reference for many topics. A full reading list will be provided at the beginning of the course.

  1. James, Gareth, et al. An introduction to statistical learning: With applications in python. Springer Nature, 2023.
  2. Chen, Jeffrey C., Edward A. Rubin, and Gary J. Cornwall. Data science for public policy. Springer, 2021.
  3. Géron, Aurélien. Hands-on machine learning with Scikit-Learn, Ke