FDSfE

Repo for Foundations of Data Science for Everyone - class taught at Lincoln University + University of Delaware

This course will teach the basics of data-driven research. Students will acquire basic computational skills, basic knowledge of statistical analysis, error analysis, familiarize with good practices for handling small- and big-data, and the basics of Machine Learning. After this class students should be able to formulate a question, find appropriate data to answer the question, prepare and analyze the data, get an answer, and understand the answer’s confidence level. The course will be organized in a modular fashion, with labs and projects assigned to students for group work.

Philosophy and good practices of data science:

the flow chart of a data-driven project from idea to divulgation, the concepts of falsifiability, reproducibility, open science, the importance of version control

Lab: setting up github repositories, making a jupyter notebooks (on colab free platform)

Data manipulation:

Data types, missing data, censored data, organization of data in tables. Data hygiene

Lab: Acquiring and preparing data (CSV, TSV, downloadable ascii files, basic SQL, API) in Pandas: merging data from different files, reading data collections from CSV files into data frames, selecting columns, selecting rows, merging data frames

Inference and prediction:

Inference from plots: plotting histograms and scatter plots, data types incl ordinal, continuous, categorical data, visual inspection of correlation between variables Lab: read and clean data, Citibikes, Pluto, Census

Hypothesis testing:

p-value, chi-square, z-test. Lab: basic statistics on Pluto, Census, Citibikes data, moment extraction, deviations from Gaussianity/Poissonity, histograms, proper binning. PDF/CDF, data dredging, error analysis, testing models (KS, Anderson Darling, KL divergence), goodness of fit. Lab: creating and testing simple distribution models in NumPy

Basic bayesian concepts

Bayes vs Frequentist statistics, Prior, Likelihood, Posterior

Name		Name	Last commit message	Last commit date
Latest commit History 185 Commits
CodeExamples		CodeExamples
HW1		HW1
HW2		HW2
HW5		HW5
classdemo		classdemo
data		data
imgs		imgs
midterm		midterm
statistics		statistics
test		test
.gitignore		.gitignore
Python_Crash_Course.ipynb		Python_Crash_Course.ipynb
README.md		README.md
Resources.md		Resources.md
fbb.mplstyle		fbb.mplstyle
gitallrepos.py		gitallrepos.py
pullallgits.py		pullallgits.py
pythoncrashcourse.ipynb		pythoncrashcourse.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FDSfE

Philosophy and good practices of data science:

Data manipulation:

Inference and prediction:

Hypothesis testing:

Basic bayesian concepts

Regression: least square methods - OLS, WLS

Timeseries techniques: smoothing, detrending, stationary, nonstationary, homeo- & hetero-scedastic noise

Machine Learning methods: Clustering

Machine Learning methods for text analysis: Natural Language Processing

Machine Learning methods: Decision trees and Tree ensemble methods

Multidimensional data: Spatial+Temporal data ( data)

About

Releases

Packages

Languages

zhrvdt77/FDSFE_FBianco

Folders and files

Latest commit

History

Repository files navigation

FDSfE

Philosophy and good practices of data science:

Data manipulation:

Inference and prediction:

Hypothesis testing:

Basic bayesian concepts

Regression: least square methods - OLS, WLS

Timeseries techniques: smoothing, detrending, stationary, nonstationary, homeo- & hetero-scedastic noise

Machine Learning methods: Clustering

Machine Learning methods for text analysis: Natural Language Processing

Machine Learning methods: Decision trees and Tree ensemble methods

Multidimensional data: Spatial+Temporal data ( data)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages