Skip to content

Commit 20c6369

Browse files
Merge pull request #47 from UtrechtUniversity/dorien-changes
Small fixes issues 45 and 46
2 parents 8fc38b3 + 3f5243d commit 20c6369

File tree

7 files changed

+620
-458
lines changed

7 files changed

+620
-458
lines changed
Loading

book/slides/slides_introduction.html

Lines changed: 122 additions & 67 deletions
Large diffs are not rendered by default.

book/slides/slides_introduction.qmd

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -484,15 +484,15 @@ country_fac
484484
. . .
485485

486486
::: columns
487-
::: {.column width="50%"}
487+
::: {.column width="40%"}
488488

489489
```{r}
490490
#| label: factors-df
491491
df <- data.frame(name, age, country_fac)
492492
df
493493
```
494494
:::
495-
::: {.column width="50%"}
495+
::: {.column width="60%"}
496496
```{r}
497497
#| label: summary-factor-df
498498
summary(df)
@@ -1010,6 +1010,25 @@ for(the_age in df$age){
10101010
}
10111011
```
10121012

1013+
## Answers Bonus question
1014+
1015+
Bonus question: add age category as a new column in df.
1016+
1017+
```{r}
1018+
# We first set an index that will be increased every time the for-loop runs
1019+
i <- 1
1020+
1021+
for(nr in df$age){
1022+
# Add the age category to a new column in df
1023+
df$age_category[i] <- test_age(nr)
1024+
1025+
# Increase the index with 1 after running the code in this for-loop iteration
1026+
i <- i + 1
1027+
}
1028+
1029+
df
1030+
```
1031+
10131032
# Recap "Basics of R" {background-color=#FFCD00}
10141033

10151034
## Which bracket does what?

book/slides/slides_tidyverse.html

Lines changed: 342 additions & 296 deletions
Large diffs are not rendered by default.

book/slides/slides_tidyverse.qmd

Lines changed: 98 additions & 81 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,43 @@ dplyr::rename(iris, petal_length = Petal.Length)
5656

5757
# Go to Exercise 0 in `datascience_exercises.Rmd`
5858

59+
# Some good coding practices {background-color=#FFCD00}
60+
61+
## R projects and working directories
62+
63+
When you start programming for yourself:
64+
65+
- Create a folder dedicated to your project
66+
- Start a new R project: **File > New Project > Existing Directory**
67+
- An `.RProj` file will be created
68+
69+
. . .
70+
71+
Advantages:
72+
73+
- Automatically set your working directory to that folder
74+
- Automatically retrieve only the history and objects from that R project
75+
- More reproducible (relative vs. absolute paths)
76+
77+
```r
78+
getwd()
79+
```
80+
81+
## Comments
82+
83+
If a line of R code starts with `#`, it is ignored.
84+
85+
- Help your future you and collaborators/supervisors understand what your code does
86+
- Explain (the purpose of) code that is not self-explanatory
87+
- Comment and un-comment to debug code without having to delete it[^1]
88+
89+
```{r}
90+
# Print a very important line
91+
print("A very important line")
92+
```
93+
94+
[^1]: You should delete non-working code when done debugging though.
95+
5996
## The Data Science workflow
6097

6198
![Source: [R 4 Data Science](https://r4ds.had.co.nz/introduction.html)](https://d33wubrfki0l68.cloudfront.net/571b056757d68e6df81a3e3853f54d3c76ad6efc/32d37/diagrams/data-science.png "The data science workflow."){fig-align="center"}
@@ -85,8 +122,8 @@ This means that often the results of one step will be the input for the next
85122
::: {.incremental}
86123
1. Read a file into R
87124
2. Clean the data:
88-
- Filter on relevant rows
89125
- Select only relevant columns
126+
- Filter on relevant rows
90127
3. Calculate a new column
91128
4. Rename a column
92129
5. Put the dataframe into a long (tidy) format
@@ -111,26 +148,6 @@ This means that often the results of one step will be the input for the next
111148

112149
![](../images/reading_writing.png "Importing files into R, and exporting them into a file again")
113150

114-
## R projects and working directories
115-
116-
When you start programming for yourself:
117-
118-
- Create a folder dedicated to your project
119-
- Start a new R project: **File > New Project > Existing Directory**
120-
- An `.RProj` file will be created
121-
122-
. . .
123-
124-
Advantages:
125-
126-
- Automatically set your working directory to that folder
127-
- Automatically retrieve only the history and objects from that R project
128-
- More reproducible (relative vs. absolute paths)
129-
130-
```r
131-
getwd()
132-
```
133-
134151
## readr: Read Rectangular Text Data
135152

136153
To read text data, you need:
@@ -239,52 +256,26 @@ write_csv(penguins_isotopes,
239256

240257
# Selecting & filtering data {background-color=#FFCD00}
241258

242-
## dplyr: Data Manipulation
243-
244-
`dplyr` contains functions for many types of data manipulation, such as:
245-
246-
- `filter()`: select **rows** that meet one or several logical criteria
247-
- `select()`: select (or drop) **columns**
248-
- `rename()`: change column name
249-
- `mutate()`: transform column values or create new column
250-
- `group_by()`: group data on one or more columns
251-
- `summarize()`: reduces a group of data into a single row
252-
253-
## Filter
254-
255-
Selects **rows** in your dataframe.
256-
257-
Use:
258-
```r
259-
filter(your-dataframe, your-condition)
260-
```
261-
262-
. . .
263-
264259
```{r}
265260
#| label: load-df-morning-session
266261
#| output: false
267262
#| echo: false
268263
df <- data.frame(name = c("Ann", "Bob", "Chloe", "Dan"),
269-
age = c(35,22,50,51),
264+
age = c(15,22,50,51),
270265
country = c("UK","USA","USA","UK")
271266
)
272267
```
273268

274-
From the morning session: "From your dataframe `df`, return complete rows for everyone living in a country of your choice."
275-
276-
```{r}
277-
#| label: filter-baser
278-
#| eval: false
279-
df[df$country=="UK", ] # Base R
280-
```
269+
## dplyr: Data Manipulation
281270

282-
. . .
271+
`dplyr` contains functions for many types of data manipulation, such as:
283272

284-
```{r}
285-
#| label: filter-tidyverse
286-
filter(df, country == "UK") # Tidyverse
287-
```
273+
- `filter()`: select **rows** that meet one or several logical criteria
274+
- `select()`: select (or drop) **columns**
275+
- `rename()`: change column name
276+
- `mutate()`: transform column values or create new column
277+
- `group_by()`: group data on one or more columns
278+
- `summarize()`: reduces a group of data into a single row
288279

289280
## Select
290281

@@ -325,63 +316,89 @@ df[, c("name","age")] # Base R
325316
select(df, name, age) # Tidyverse
326317
```
327318

319+
## Filter
320+
321+
Selects **rows** in your dataframe.
322+
323+
Use:
324+
```r
325+
filter(your-dataframe, your-condition)
326+
```
327+
328+
. . .
329+
330+
From the morning session: "From your dataframe `df`, return complete rows for everyone living in a country of your choice."
331+
332+
```{r}
333+
#| label: filter-baser
334+
#| eval: false
335+
df[df$country=="UK", ] # Base R
336+
```
337+
338+
. . .
339+
340+
```{r}
341+
#| label: filter-tidyverse
342+
filter(df, country == "UK") # Tidyverse
343+
```
344+
328345
# Go to exercises 4 and 5
329346

330347
## Answers exercise 4
331348

332-
4. Filter `penguins` to leave out the NAs.
349+
4. Select the columns Individual_ID, Species, Sex, Island, Culmen_Depth_mm and Culmen_Length_mm
333350

334351
```{r}
335352
#| label: exercise-4a
336-
penguins_subset <- filter(penguins, !is.na(Sex))
353+
penguins_subset <- select(penguins, Individual_ID, Species,
354+
Sex, Island,
355+
Culmen_Depth_mm, Culmen_Length_mm)
337356
```
338357

339358
. . .
340359

341360
Or
342361
```{r}
343362
#| label: exercise-4b
344-
penguins_subset <- filter(penguins, Sex == "MALE" | Sex == "FEMALE")
363+
penguins_subset <- select(penguins, Individual_ID, Species,
364+
Sex, Island,
365+
starts_with("Culmen"))
345366
```
346367

347368
. . .
348369

349370
Or
350-
351371
```{r}
352372
#| label: exercise-4c
353-
penguins_subset <- filter(penguins, Sex %in% c("MALE", "FEMALE"))
373+
penguins_subset <- select(penguins, Individual_ID, Species,
374+
Sex, Island,
375+
contains("Culmen"))
354376
```
355377

356378
## Answers exercise 5
357379

358-
5. Select the columns Individual_ID, Species, Sex, Island, Culmen_Depth_mm and Culmen_Length_mm
380+
5. Filter `penguins` to leave out the NAs.
359381

360382
```{r}
361383
#| label: exercise-5a
362-
penguins_subset_2 <- select(penguins_subset, Individual_ID, Species,
363-
Sex, Island,
364-
Culmen_Depth_mm, Culmen_Length_mm)
384+
penguins_subset_2 <- filter(penguins_subset, !is.na(Sex))
365385
```
366386

367387
. . .
368388

369389
Or
370390
```{r}
371391
#| label: exercise-5b
372-
penguins_subset_2 <- select(penguins_subset, Individual_ID, Species,
373-
Sex, Island,
374-
starts_with("Culmen"))
392+
penguins_subset_2 <- filter(penguins_subset, Sex == "MALE" | Sex == "FEMALE")
375393
```
376394

377395
. . .
378396

379397
Or
398+
380399
```{r}
381400
#| label: exercise-5c
382-
penguins_subset_2 <- select(penguins_subset, Individual_ID, Species,
383-
Sex, Island,
384-
contains("Culmen"))
401+
penguins_subset_2 <- filter(penguins_subset, Sex %in% c("MALE", "FEMALE"))
385402
```
386403

387404

@@ -467,7 +484,7 @@ penguins_subset_4 <- rename(penguins_subset_3,
467484

468485
A key tidyverse component that chains all data science steps together:
469486

470-
`%>%`[^1]
487+
`%>%`[^2]
471488

472489
. . .
473490

@@ -478,7 +495,7 @@ Why?
478495
- no need to save intermediate R objects with `<-`
479496
- easily add and/or delete steps in your pipeline without breaking the code
480497

481-
[^1]: Base R now also has a piping operator: `|>`
498+
[^2]: Base R now also has a piping operator: `|>`
482499
which [works very similarly](https://www.tidyverse.org/blog/2023/04/base-vs-magrittr-pipe/) to the magrittr piping operator `%>%`
483500

484501

@@ -492,10 +509,10 @@ which [works very similarly](https://www.tidyverse.org/blog/2023/04/base-vs-magr
492509
my_new_df <- df %>%
493510
494511
# Perform a function using that object as input
495-
filter(country == "UK") %>%
512+
select(name, age) %>%
496513
497514
# Add another operation
498-
select(name, age) %>%
515+
filter(country == "UK") %>%
499516
500517
# And another, etc.
501518
mutate(old_age = age + 20)
@@ -521,12 +538,12 @@ Make a workflow that starts with the data `penguins` and subsequently applies yo
521538
#| label: exercise-8
522539
penguins_subset_5 <- penguins %>%
523540
524-
# Filter out NAs
525-
filter(!is.na(Sex)) %>%
526-
527541
# Select only relevant columns
528542
select(Individual_ID, Species, Sex, Island, starts_with("Culmen")) %>%
529543
544+
# Filter out NAs
545+
filter(!is.na(Sex)) %>%
546+
530547
# Add a new columns culmen_ratio
531548
mutate(culmen_ratio = Culmen_Length_mm / Culmen_Depth_mm) %>%
532549
@@ -636,8 +653,8 @@ Tidy data is a consistent way of storing data + most R functions work with vecto
636653

637654
Do It Yourself:
638655

639-
- `pivot_longer()`: lengthen data: more rows, fewer columns (long format, tidy)
640-
- `pivot_wider()`: widen data: fewer rows, more columns (wide format)
656+
- `pivot_longer()`: lengthen data: more rows, often fewer columns (long format, tidy)
657+
- `pivot_wider()`: widen data: fewer rows, often more columns (wide format)
641658

642659
. . .
643660

@@ -665,7 +682,7 @@ pivot_longer(df_ext,
665682

666683
# Go to Exercise 9
667684

668-
![](https://raw.githubusercontent.com/apreshill/teachthat/master/pivot/pivot_longer_smaller.gif "Visualization of the pivot_longer() process")
685+
![](images/pivot_longer_smaller-cut.gif "Visualization of the pivot_longer() process")
669686
*Source: [Allison Hill](https://github.com/apreshill/teachthat/blob/master/pivot/pivot_longer_smaller.gif)*
670687

671688
## Answer exercise 9

0 commit comments

Comments
 (0)