Skip to content

Repository for UDel PHYS167 LuncolnU Mat115

Notifications You must be signed in to change notification settings

zhrvdt77/FDSFE_FBianco

 
 

Repository files navigation

FDSfE

Repo for Foundations of Data Science for Everyone - class taught at Lincoln University + University of Delaware

This course will teach the basics of data-driven research. Students will acquire basic computational skills, basic knowledge of statistical analysis, error analysis, familiarize with good practices for handling small- and big-data, and the basics of Machine Learning. After this class students should be able to formulate a question, find appropriate data to answer the question, prepare and analyze the data, get an answer, and understand the answer’s confidence level. The course will be organized in a modular fashion, with labs and projects assigned to students for group work.

Philosophy and good practices of data science:

the flow chart of a data-driven project from idea to divulgation, the concepts of falsifiability, reproducibility, open science, the importance of version control

Lab: setting up github repositories, making a jupyter notebooks (on colab free platform)

Data manipulation:

Data types, missing data, censored data, organization of data in tables. Data hygiene

Lab: Acquiring and preparing data (CSV, TSV, downloadable ascii files, basic SQL, API) in Pandas: merging data from different files, reading data collections from CSV files into data frames, selecting columns, selecting rows, merging data frames

Inference and prediction:

Inference from plots: plotting histograms and scatter plots, data types incl ordinal, continuous, categorical data, visual inspection of correlation between variables Lab: read and clean data, Citibikes, Pluto, Census

Hypothesis testing:

p-value, chi-square, z-test. Lab: basic statistics on Pluto, Census, Citibikes data, moment extraction, deviations from Gaussianity/Poissonity, histograms, proper binning. PDF/CDF, data dredging, error analysis, testing models (KS, Anderson Darling, KL divergence), goodness of fit. Lab: creating and testing simple distribution models in NumPy

Basic bayesian concepts

Bayes vs Frequentist statistics, Prior, Likelihood, Posterior

Regression: least square methods - OLS, WLS

Timeseries techniques: smoothing, detrending, stationary, nonstationary, homeo- & hetero-scedastic noise

Machine Learning methods: Clustering

Machine Learning methods for text analysis: Natural Language Processing

Machine Learning methods: Decision trees and Tree ensemble methods

Multidimensional data: Spatial+Temporal data ( data)

About

Repository for UDel PHYS167 LuncolnU Mat115

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%