UtrechtUniversity
diff --git a/‎book/slides/images/pivot_longer_smaller-cut.gif
295 KB b/‎book/slides/images/pivot_longer_smaller-cut.gif
295 KB
diff --git a/‎book/slides/slides_introduction.html
Lines changed: 122 additions & 67 deletions b/‎book/slides/slides_introduction.html
Lines changed: 122 additions & 67 deletions
diff --git a/‎book/slides/slides_introduction.qmd
Lines changed: 21 additions & 2 deletions b/‎book/slides/slides_introduction.qmd
Lines changed: 21 additions & 2 deletions
diff --git a/‎book/slides/slides_tidyverse.html
Lines changed: 342 additions & 296 deletions b/‎book/slides/slides_tidyverse.html
Lines changed: 342 additions & 296 deletions
diff --git a/‎book/slides/slides_tidyverse.qmd
Lines changed: 98 additions & 81 deletions b/‎book/slides/slides_tidyverse.qmd
Lines changed: 98 additions & 81 deletions
@@ -484,15 +484,15 @@ country_fac
 . . .
 
 ::: columns
-::: {.column width="50%"}
+::: {.column width="40%"}
 
 ```{r}
 #| label: factors-df
 df <- data.frame(name, age, country_fac)
 df
 ```
 :::
-::: {.column width="50%"}
+::: {.column width="60%"}
 ```{r}
 #| label: summary-factor-df
 summary(df)
@@ -1010,6 +1010,25 @@ for(the_age in df$age){
 }
 ```
 
+## Answers Bonus question
+
+Bonus question: add age category as a new column in df.
+
+```{r}
+# We first set an index that will be increased every time the for-loop runs
+i <- 1
+
+for(nr in df$age){
+  # Add the age category to a new column in df
+  df$age_category[i] <- test_age(nr)
+
+  # Increase the index with 1 after running the code in this for-loop iteration
+  i <- i + 1
+}
+
+df
+```
+
 # Recap "Basics of R" {background-color=#FFCD00}
 
 ## Which bracket does what?
 
@@ -56,6 +56,43 @@ dplyr::rename(iris, petal_length = Petal.Length)
 
 # Go to Exercise 0 in `datascience_exercises.Rmd`
 
+# Some good coding practices {background-color=#FFCD00}
+
+## R projects and working directories
+
+When you start programming for yourself:
+
+- Create a folder dedicated to your project
+- Start a new R project: **File > New Project > Existing Directory**
+- An `.RProj` file will be created
+
+. . .
+
+Advantages:
+
+- Automatically set your working directory to that folder
+- Automatically retrieve only the history and objects from that R project
+- More reproducible (relative vs. absolute paths)
+
+```r
+getwd()
+```
+
+## Comments
+
+If a line of R code starts with `#`, it is ignored.
+
+- Help your future you and collaborators/supervisors understand what your code does
+- Explain (the purpose of) code that is not self-explanatory
+- Comment and un-comment to debug code without having to delete it[^1]
+
+```{r}
+# Print a very important line
+print("A very important line")
+```
+
+[^1]: You should delete non-working code when done debugging though.
+
 ## The Data Science workflow
 
 ![Source: [R 4 Data Science](https://r4ds.had.co.nz/introduction.html)](https://d33wubrfki0l68.cloudfront.net/571b056757d68e6df81a3e3853f54d3c76ad6efc/32d37/diagrams/data-science.png "The data science workflow."){fig-align="center"}
@@ -85,8 +122,8 @@ This means that often the results of one step will be the input for the next
 ::: {.incremental}
 1. Read a file into R
 2. Clean the data:
-   - Filter on relevant rows
    - Select only relevant columns
+   - Filter on relevant rows
 3. Calculate a new column
 4. Rename a column
 5. Put the dataframe into a long (tidy) format
@@ -111,26 +148,6 @@ This means that often the results of one step will be the input for the next
 
 ![](../images/reading_writing.png "Importing files into R, and exporting them into a file again")
 
-## R projects and working directories
-
-When you start programming for yourself:
-
-- Create a folder dedicated to your project
-- Start a new R project: **File > New Project > Existing Directory**
-- An `.RProj` file will be created
-
-. . .
-
-Advantages:
-
-- Automatically set your working directory to that folder
-- Automatically retrieve only the history and objects from that R project
-- More reproducible (relative vs. absolute paths)
-
-```r
-getwd()
-```
-
 ## readr: Read Rectangular Text Data
 
 To read text data, you need:
@@ -239,52 +256,26 @@ write_csv(penguins_isotopes,
 
 # Selecting & filtering data {background-color=#FFCD00}
 
-## dplyr: Data Manipulation
-
-`dplyr` contains functions for many types of data manipulation, such as:
-
-- `filter()`: select **rows** that meet one or several logical criteria
-- `select()`: select (or drop) **columns**  
-- `rename()`: change column name
-- `mutate()`: transform column values or create new column
-- `group_by()`: group data on one or more columns
-- `summarize()`: reduces a group of data into a single row
-
-## Filter
-
-Selects **rows** in your dataframe.
-
-Use:
-```r
-filter(your-dataframe, your-condition)
-```
-
-. . .
-
 ```{r}
 #| label: load-df-morning-session
 #| output: false
 #| echo: false
 df <- data.frame(name = c("Ann", "Bob", "Chloe", "Dan"),
-                 age = c(35,22,50,51),
+                 age = c(15,22,50,51),
                  country = c("UK","USA","USA","UK")
 )
 ```
 
-From the morning session: "From your dataframe `df`, return complete rows for everyone living in a country of your choice."
-
-```{r}
-#| label: filter-baser
-#| eval: false
-df[df$country=="UK", ]        # Base R
-```
+## dplyr: Data Manipulation
 
-. . .
+`dplyr` contains functions for many types of data manipulation, such as:
 
-```{r}
-#| label: filter-tidyverse
-filter(df, country == "UK")   # Tidyverse
-```
+- `filter()`: select **rows** that meet one or several logical criteria
+- `select()`: select (or drop) **columns**  
+- `rename()`: change column name
+- `mutate()`: transform column values or create new column
+- `group_by()`: group data on one or more columns
+- `summarize()`: reduces a group of data into a single row
 
 ## Select
 
@@ -325,63 +316,89 @@ df[, c("name","age")]       # Base R
 select(df, name, age)       # Tidyverse
 ```
 
+## Filter
+
+Selects **rows** in your dataframe.
+
+Use:
+```r
+filter(your-dataframe, your-condition)
+```
+
+. . .
+
+From the morning session: "From your dataframe `df`, return complete rows for everyone living in a country of your choice."
+
+```{r}
+#| label: filter-baser
+#| eval: false
+df[df$country=="UK", ]        # Base R
+```
+
+. . .
+
+```{r}
+#| label: filter-tidyverse
+filter(df, country == "UK")   # Tidyverse
+```
+
 # Go to exercises 4 and 5
 
 ## Answers exercise 4
 
-4. Filter `penguins` to leave out the NAs.
+4. Select the columns Individual_ID, Species, Sex, Island, Culmen_Depth_mm and Culmen_Length_mm
 
 ```{r}
 #| label: exercise-4a
-penguins_subset <- filter(penguins, !is.na(Sex))
+penguins_subset <- select(penguins, Individual_ID, Species,
+                            Sex, Island, 
+                            Culmen_Depth_mm, Culmen_Length_mm)
 ```
 
 . . .
 
 Or
 ```{r}
 #| label: exercise-4b
-penguins_subset <- filter(penguins, Sex == "MALE" | Sex == "FEMALE")
+penguins_subset <- select(penguins, Individual_ID, Species,
+                            Sex, Island, 
+                            starts_with("Culmen"))
 ```
 
 . . .
 
 Or
-
 ```{r}
 #| label: exercise-4c
-penguins_subset <- filter(penguins, Sex %in% c("MALE", "FEMALE"))
+penguins_subset <- select(penguins, Individual_ID, Species,
+                            Sex, Island, 
+                            contains("Culmen"))
 ```
 
 ## Answers exercise 5
 
-5. Select the columns Individual_ID, Species, Sex, Island, Culmen_Depth_mm and Culmen_Length_mm
+5. Filter `penguins` to leave out the NAs.
 
 ```{r}
 #| label: exercise-5a
-penguins_subset_2 <- select(penguins_subset, Individual_ID, Species,
-                            Sex, Island, 
-                            Culmen_Depth_mm, Culmen_Length_mm)
+penguins_subset_2 <- filter(penguins_subset, !is.na(Sex))
 ```
 
 . . .
 
 Or
 ```{r}
 #| label: exercise-5b
-penguins_subset_2 <- select(penguins_subset, Individual_ID, Species,
-                            Sex, Island, 
-                            starts_with("Culmen"))
+penguins_subset_2 <- filter(penguins_subset, Sex == "MALE" | Sex == "FEMALE")
 ```
 
 . . .
 
 Or
+
 ```{r}
 #| label: exercise-5c
-penguins_subset_2 <- select(penguins_subset, Individual_ID, Species,
-                            Sex, Island, 
-                            contains("Culmen"))
+penguins_subset_2 <- filter(penguins_subset, Sex %in% c("MALE", "FEMALE"))
 ```
 
 
@@ -467,7 +484,7 @@ penguins_subset_4 <- rename(penguins_subset_3,
 
 A key tidyverse component that chains all data science steps together: 
 
-`%>%`[^1]
+`%>%`[^2]
 
 . . .
 
@@ -478,7 +495,7 @@ Why?
 - no need to save intermediate R objects with `<-`
 - easily add and/or delete steps in your pipeline without breaking the code
 
-[^1]: Base R now also has a piping operator: `|>`     
+[^2]: Base R now also has a piping operator: `|>`     
 which [works very similarly](https://www.tidyverse.org/blog/2023/04/base-vs-magrittr-pipe/) to the magrittr piping operator `%>%`
 
 
@@ -492,10 +509,10 @@ which [works very similarly](https://www.tidyverse.org/blog/2023/04/base-vs-magr
 my_new_df <- df %>%
   
   # Perform a function using that object as input
-  filter(country == "UK") %>%
+  select(name, age) %>%
   
   # Add another operation
-  select(name, age) %>%
+  filter(country == "UK") %>%
   
   # And another, etc.
   mutate(old_age = age + 20)
@@ -521,12 +538,12 @@ Make a workflow that starts with the data `penguins` and subsequently applies yo
 #| label:  exercise-8
 penguins_subset_5 <- penguins %>%
   
-  # Filter out NAs
-  filter(!is.na(Sex)) %>%
-  
   # Select only relevant columns
   select(Individual_ID, Species, Sex, Island, starts_with("Culmen")) %>%
   
+    # Filter out NAs
+  filter(!is.na(Sex)) %>%
+  
   # Add a new columns culmen_ratio
   mutate(culmen_ratio =  Culmen_Length_mm / Culmen_Depth_mm) %>%
   
@@ -636,8 +653,8 @@ Tidy data is a consistent way of storing data + most R functions work with vecto
 
 Do It Yourself:
 
-- `pivot_longer()`: lengthen data: more rows, fewer columns (long format, tidy)
-- `pivot_wider()`: widen data: fewer rows, more columns (wide format)
+- `pivot_longer()`: lengthen data: more rows, often fewer columns (long format, tidy)
+- `pivot_wider()`: widen data: fewer rows, often more columns (wide format)
 
 . . .
 
@@ -665,7 +682,7 @@ pivot_longer(df_ext,
 
 # Go to Exercise 9
 
-![](https://raw.githubusercontent.com/apreshill/teachthat/master/pivot/pivot_longer_smaller.gif "Visualization of the pivot_longer() process")
+![](images/pivot_longer_smaller-cut.gif "Visualization of the pivot_longer() process")
 *Source: [Allison Hill](https://github.com/apreshill/teachthat/blob/master/pivot/pivot_longer_smaller.gif)*
 
 ## Answer exercise 9