Merge pull request #50 from UtrechtUniversity/dorien-changes

DorienHuijser · web-flow · commit db6acec50855 · 2025-03-18T17:33:10.000+01:00
Update materials
diff --git a/book/slides/slides_introduction.html b/book/slides/slides_introduction.html
diff --git a/book/slides/slides_introduction.qmd b/book/slides/slides_introduction.qmd
@@ -627,7 +627,8 @@ df[df$name=="Bob", "age"]
 
 ## Answers to exercise 5
 
-1. From your dataframe `df`, return complete rows for everyone living in a country of your choice.
+1. From your dataframe `df`, return all columns for everyone living in a country of your choice.
+
 ```{r}
 #| label: exercise-5.1
 df[df$country=="UK", ]
@@ -844,6 +845,23 @@ if(number >= 18){
 print(age_category)
 ```
 
+Bonus exercise: expand the if-else statement to assign "toddler" if number is smaller than 2:
+
+```{r}
+#| label: exercise-7-bonus
+number <- 8
+
+if(number >= 18){
+  age_category <- "adult"
+} else if(number < 2) {
+  age_category <- "toddler"
+} else {
+  age_category <- "minor"
+}
+
+print(age_category)
+```
+
 # Programming: functions {background-color=#FFCD00}
 
 ## Functions: a sequence
@@ -862,34 +880,9 @@ mean(df$age)
 mean(1:100)
 ```
 
-## Functions
-
-**Functions can also be used to make a complex line of code easier to write/read:**
-
-You write the function once:
-```{r}
-#| label: function-find-bobs-age
-find_bobs_age <- function(data){
-  bobs_age <- data[data$name == "Bob", "age"]
-  return(bobs_age)
-}
-```
-
-Now, every time you want to find Bob's age you use:
-```{r}
-#| label: use-find-bobs-age
-find_bobs_age(df)
-```
-
-. . .
-
-**Functions are the bread and butter of programming!**
-
-A good script will consist mostly of functions, with a minimal amount of code that applies the functions.
-
 ## Functions in R
 
-- To make a function, use the function `function()`:
+- To make a function yourself, use the function `function()`:
   ```r
   myFun <- function()
   ```
@@ -936,6 +929,26 @@ myFun(3, 4)
 myFun(90, 71)
 ```
 
+## Functions
+
+**Functions are the bread and butter of programming!**
+
+A good script will consist mostly of functions, with a minimal amount of code that applies the functions.
+
+. . .
+
+```r
+myFun <- function(arg1, arg2){
+    multiplication <- arg1 * arg2
+    return(multiplication)
+}
+```
+
+Note that:
+
+- arg1 and arg2 are **internal variables**: they do not exist outside of the function and they are used within the function to perform the code.
+- The output of a function is always spit out using `return` (**not** `print`).
+
 # Go to exercise 8
 
 ## Answers to exercise 8
diff --git a/book/slides/slides_tidyverse.html b/book/slides/slides_tidyverse.html
@@ -2404,6 +2404,7 @@ <h2>Learn more</h2>
 <li>Statistical Inference via Data Science: A ModernDive into R and the Tidyverse: <a href="https://moderndive.com/" class="uri">https://moderndive.com/</a></li>
 <li>Big Book of R: <a href="https://www.bigbookofr.com/" class="uri">https://www.bigbookofr.com/</a></li>
 <li>Better spreadsheets: <a href="https://better-spreadsheets.netlify.app/" class="uri">https://better-spreadsheets.netlify.app/</a></li>
+<li><a href="https://bookdown.org/ndphillips/YaRrr/">YaRrr! The Pirate’s Guide to R</a></li>
 </ul>
 <p><strong>See also the <a href="https://utrechtuniversity.github.io/workshop-introduction-to-R-and-data/what-next.html">What’s next page</a>.</strong></p>
 </section>
diff --git a/book/slides/slides_tidyverse.qmd b/book/slides/slides_tidyverse.qmd
@@ -1008,6 +1008,7 @@ What have we learned this afternoon?
 - Statistical Inference via Data Science: A ModernDive into R and the Tidyverse: <https://moderndive.com/>
 - Big Book of R: <https://www.bigbookofr.com/>
 - Better spreadsheets: <https://better-spreadsheets.netlify.app/>
+- [YaRrr! The Pirate’s Guide to R](https://bookdown.org/ndphillips/YaRrr/)
 
 **See also the [What's next page](https://utrechtuniversity.github.io/workshop-introduction-to-R-and-data/what-next.html).**
 
diff --git a/book/what-next.qmd b/book/what-next.qmd
@@ -10,6 +10,7 @@ Not finished learning? Feel free to check our website, [uu.nl/rdm](https://www.u
 - Statistical Inference via Data Science: A ModernDive into R and the Tidyverse: <https://moderndive.com/>
 - Big Book of R: <https://www.bigbookofr.com/>
 - Better spreadsheets: <https://better-spreadsheets.netlify.app/>
+- [YaRrr! The Pirate’s Guide to R](https://bookdown.org/ndphillips/YaRrr/)
 
 ### Data visualization
 
diff --git a/course-materials/baseR_exercises.Rmd b/course-materials/baseR_exercises.Rmd
@@ -96,7 +96,7 @@ Before you start, please run this code:
 rm(name,age,country)
 ```
 
-1. From your dataframe df, return complete rows for everyone living in a country of your choice.
+1. From your dataframe df, return all columns for everyone living in a country of your choice.
 
 2. Return only the names of everyone in your data frame df under 40. 
 
@@ -131,7 +131,7 @@ is.na(NA)
 ### Exercise 7: If statement
 
 Make an if statement that tests if a number is larger than 18. 
-If the number is larger, the variable age_category should be assigned the value "adult". If not, the variable age_category should be assigned the vaue "minor".
+If the number is larger, the variable age_category should be assigned the value "adult". If not, the variable age_category should be assigned the value "minor".
 
 ```{r}
 number <- 8
@@ -143,6 +143,21 @@ if(){
 print(age_category)
 ```
 
+BONUS exercise:
+
+Expand your if-else statement from above with an additional condition: if the number is less than 2, age_category should get the value "toddler" assigned.
+
+```{r}
+number <- 8
+
+if(){
+  
+}
+
+print(age_category)
+```
+
+
 ### Exercise 8: Function
 
 Turn the if statement from the last exercise into a function. Let the user provide the value for number, and return the age_category.
diff --git a/course-materials/datascience_exercises.Rmd b/course-materials/datascience_exercises.Rmd
@@ -56,7 +56,7 @@ library(readxl)
 Use the function `read_excel()` to read a related data set `penguins_isotopes.xlsx` (an Excel file!), also located in your `data` folder, into R.
 
 ```{r}
-penguins_isotopes <- ???(path = "???")
+penguins_isotopes <- ???
 ```
 
 ## Exercise 3
@@ -68,9 +68,7 @@ For this function, you need to provide:
 - The name of the folder and the file name: "data/penguins_isotopes.csv"
 
 ```{r}
-???(penguins_isotopes,
-    # Name of the file, including extension (.csv)
-    file = "???")
+
 ```
 
 # Chapter 2: Selecting & Filtering Data
@@ -95,18 +93,20 @@ Use the function `select()` to include the right columns in our data subset.
 **Note** that in order to work with the subset from Exercise 4, we need to use that data frame (`penguin_subset`) as our data input!
 
 ```{r}
-penguins_subset <- select(???)
+penguins_subset <- ???
 ```
 
 
 ## Exercise 5
 
 For some of the penguins, the sex was not determined. Use the function `filter()` to keep only the rows where the column `Sex` specifies either "FEMALE" or "MALE".
 
+Use the `penguins_subset` dataframe that you created in the previous exercise as input.
+
 One way to do this (though not the only way) is to filter out the rows where `Sex` is NA. Remember the function `is.na()` and remember that to negate a condition, you use the operator! For example, `!is.na(NA)` returns `FALSE`.
 
 ```{r}
-penguins_subset_2 <- filter(penguins_subset, ???)
+penguins_subset_2 <- ???
 ```
 
 
@@ -117,17 +117,17 @@ Use the `mutate()` function to create a new column called `culmen_ratio`, which
 Make sure to work with the `penguins_subset_2` dataframe.
 
 ```{r}
-penguins_subset_3 <- mutate(penguins_subset_2, ???)
+penguins_subset_3 <- ???
 ```
 
 ## Exercise 7
 
-In the next chapter, we will make our dataframe tidy. Before we do that, we will rename the `Culmen_Length_mm` and `Culmen_Depth_mm` columns in order to make subsequent operations easier.
+In the next chapter, we will make our dataframe tidy. Before we do that, we will rename the `Culmen_Length_mm` column as `length` and `Culmen_Depth_mm` as `depth` in order to make subsequent operations easier.
+
+Use the `rename` function to rename these columns. Make sure to take `penguins_subset_3` as the input dataframe.
 
 ```{r}
-penguins_subset_4 <- rename(penguins_subset_3,
-                            length = ???,
-                            depth = ???)
+penguins_subset_4 <- ???
 ```
 
 # Exercise 8
@@ -157,8 +157,8 @@ For this, you need to specify:
 ```{r}
 penguins_long <- penguins_subset_5 %>%
   pivot_longer(cols = c(???, ???),
-               names_to = "culmen_element",
-               values_to = "measurement")
+               ??? = "culmen_element",
+               ??? = "measurement")
 ```
 
 ### Exercise 10 
@@ -175,29 +175,28 @@ Make sure to feed the right data frame into the workflow: the column `culmen_ele
 
 ```{r}
 penguins_summary <- penguins_long %>%
-  group_by(???) %>%
-  summarize(avg = ???,
-            sd = ???)
+  ??? %>%
+  ???
 ```
 
 ## Exercise 11 
 
 Use the function `full_join()` to merge the `penguins_summary` and `penguins_long` data frames, so that each row in the long data frame will have additional columns with the mean and standard deviation for its group.
 
 ```{r}
-penguins_join <- full_join(???)
+penguins_join <- ???
 ```
 
 # Chapter 4: Data Visualization
 
 ## Exercise 12
 
 Using the ggplot2 package, let's plot the culmen length against their flipper length from the `penguins` dataframe.
-Culmen_Length_mm and Flipper_Length_mm are both continuous variables, and therefore we are now choosing a scatterplot.
+`Culmen_Length_mm` and `Flipper_Length_mm` are both continuous variables, and therefore we are now choosing a scatterplot.
 
 In the code chunk below:
 
-- Put Culmen_Length_mm on the x-axis, and Flipper_Length_mm on the y-axis
+- Put `Culmen_Length_mm` on the x-axis, and `Flipper_Length_mm` on the y-axis
 - Color the points according to Species
 - Add scatter points using the geom `geom_point()`
 - Add x- and y-labels and a title with `labs()`