This repository contains learning materials for CP4813 Urban Data Science at Georgia Institute of Technology, School of City and Regional Planning
Course name: CP4815 Urban Data Science Instructor: Yiyi He Teaching assistant: Yuehan Zhang
Contact email addresses:
- [email protected] (Instructor)
- [email protected] (GTA)
In today’s world, understanding cities requires more than just traditional methods. Urban planners and social scientists are increasingly turning to data science techniques to gain deeper insights into the complex issues that cities face. This course serves as an introduction to data science for undergraduate and graduate students in urban planning and related fields.
Throughout this course, you will delve into the interdisciplinary field of data science, which combines scientific methods, algorithms, and systems to extract valuable insights from diverse datasets. We will explore how data from various sectors, such as transportation, housing, and the physical environment, can be analyzed to understand urban dynamics better.
You will develop a solid foundation in key data science concepts and techniques using the programming language Python. We will begin by covering essential topics such as data import, cleansing, and transformation, laying the groundwork for more advanced analyses. As we progress, we will introduce data visualization techniques tailored to the needs of urban planners, emphasizing effective communication of findings and insights.
By the end of this course, you will have acquired fundamental skills and tools essential for conducting data-driven analyses in urban planning. Whether you're an undergraduate embarking on your academic journey or a graduate student preparing for advanced research, this course will provide you with the necessary expertise to tackle real-world urban challenges. Moreover, the knowledge gained here will serve as a solid foundation for future coursework and research endeavors, empowering you to confidently apply data science techniques to your capstone, thesis, or dissertation work.
- Understand the fundamental concepts, theories, and models of urban data analytics.
- Collect, import, tidy, export, and manipulate data effectively and efficiently.
- Have the necessary quantitative, GIS, and Python programming skills for analyzing urban issues and problems through a series of hands-on lab exercises.
- Identify urban problems/research questions and solve them in a reproducible way using spatial analysis/visualization techniques through final projects.
- Engage in hands-on projects and case studies that apply data science techniques to real-world urban challenges, fostering problem-solving and critical thinking skills.
Students with disabilities needing academic accommodation should provide documentation to the Access Disabled Assistance Program for Tech Students (http://www.adapts.gatech.edu/) and bring an ADAPTS accommodation letter to the instructor indicating the nature of accommodations required. This should be done within the first week of class or as soon as possible after a new disability condition arises. All efforts will be made to provide reasonable accommodations.
The course will be structured as a combined lecture-lab course. You are expected to have read all assigned readings ahead of time and be prepared to participate actively in class discussions. Students will be evaluated on three sets of tasks:
-
Three lab assignments (15 pts total, 5 pts each): Through completing three lab assignments, students will gain first-hand knowledge of the use and operation of Python in urban data science challenges.
-
Hackathon challenge (30 pts total): The hackathon leverages open datasets and advanced analytics to address critical issues. Teams will use data visualization, machine learning, and geospatial analysis to develop actionable solutions, presenting their insights to the class.
-
Final Project (55 pts total: 15 pts on Presentation, 40 pts on Final project paper): The final class requirement is a team-based project that applies the knowledge learned in the class to a real-world problem. Fifteen percent of the total course grade will be a formal presentation to the class.
All assignments (unless otherwise noted) are to be submitted through the “Assignments” tab on Canvas. It is the student’s responsibility to ensure that assignments submitted through Canvas are successfully uploaded into the system on time. For late submissions to any of the assignments above (final project paper excluded), a .5 point/day penalty will be applied. Late project paper submissions are not accepted. In the case of illness or other special circumstances, notification should be given as soon as possible and before the assignment deadline.
📖 Python for Data Analysis (3rd Edition)
Author: Wes McKinney
Link to book content: https://wesmckinney.com/book
Link to Github Repo: https://github.com/wesm/pydata-book/tree/3rd-edition
📖 Geographic Data Science with Python
Authors: Sergio J. Rey, Dani Arribas-Bel, and Levi J. Wolf
Link to book content: https://geographicdata.science/book/intro.html
📖 Python Data Science Handbook
Author: Jake VanderPlas
Link to book content: https://jakevdp.github.io/PythonDataScienceHandbook
Link to GitHub Repo: https://github.com/jakevdp/PythonDataScienceHandbook
📖 The Elements of Statistical Learning
Authors: Trevor Hastie, Robert Tibshirani, and Jerome Friedman
Link to book content: https://www.sas.upenn.edu/~fdiebold/NoHesitations/BookAdvanced.pdf
GitHub and Git
Link to GitHub Documentation: https://docs.github.com/en/get-started/start-your-journey/about-github-and-git
Python Crash Course
Author: Srebalaji Thirumalai
Link to GitHub Repo: https://github.com/srebalaji/python-crash-course
Week 1: Introduction
- Lecture: Introduction and course overview
Lab session: Getting things started-Anaconda and Jupyter
Week 2: Python fundamentals
Lab session: Introduction to Python 1 Lab session: Introduction to Python 2
Week 3: Data Cleaning and Exploration
- Lecture: Exploratory Data Analysis (EDA)
Week 4: Geographic Data Science
- Lecture: Spatial Data Science
Week 5 Spatial patterns and visualization
- Lecture: Poin pattern analysis
Lab session: Spatial data visualization
Week 6 Unsupervised Machine Learning
- Lecture: K-means clustering and beyond
Lab session: Crime patterns in Atlanta
Week 7 🌟 Mid-semester project check-in 🌟
- Lecture: Principal Component Analysis
- Mid-semester project presentations
Week 8 Supervised Machine Learning 1
- Lecture: Classification And Regression Trees
- Lecture: Random Forest
Week 9 Supervised Machine Learning 2
Lab session: Understanding housing affordability in California
- Lecture: Support Vector Machine
Week 10 🥊 Hackathon Challenge 🥊
- Day 1: Introduction, Forming teams, Exploratory Analysis
- Day 2: Team presentations and discussion
Week 11 Urban Networks
- Lecture: Urban networks and graph representation
Lab session: Traffic congestion in NYC
Week 12 Data Science Agent
- Lecture: Introduction to computer vision
Week 13 Computer Vision for Urban Planning
- Lecture: Introduction to computer vision
Lab session: Google Street View
Week 14-15 Final Project Presentations