-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.qmd
114 lines (83 loc) · 4.73 KB
/
index.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
---
title: "Project Demo"
author: "Phoebe"
subtitle: "Investigation into Chicago taxis"
echo: false
bibliography: references.bib
editor_options:
chunk_output_type: console
---
## Packages
### Description
A taxi is a type of vehicle for hire with a driver. Often seen as a yellow vehicle, taxi drivers charge passengers money to transport them from one place to another. These data contain information on a subset of taxi trips in the city of Chicago in 2022. For example, it may be reasonable to think the longer the taxi ride is, the more likely a passenger is to tip. If someone is considering working for a taxi company, they may wonder if one company tends to get more tips than the others.
### Research Questions:
1. Is there there a relationship between the distance (in miles) someone travels in a taxi and if they tip or not?
2. Do taxi passengers tend to tip more for the company Chicago Independents than the other companies?
```{r}
#| warning: false
#| message: false
library(tidyverse)
library(tidymodels)
library(scales)
```
## Question 1
Is there there a relationship between the distance (in miles) someone travels in a taxi and if they tip or not?
```{r}
# Boxplot of distance by tipping behavior
ggplot(taxi, aes(x = tip, y = distance)) +
geom_boxplot() +
labs(title = "Comparison of Taxi Trip Distances by Tipping Behavior",
x = "Tipped", y = "Distance (miles)")
# Violin Plot of distance by tipping behavior
ggplot(taxi, aes(x = tip, y = distance)) +
geom_violin(trim = FALSE) +
labs(title = "Density and Distribution of Taxi Trip Distances by Tipping Behavior",
x = "Tipped", y = "Distance (miles)")
# Assuming the tip column is a factor
# Jitter and smooth graphs of distance by tipping behavior
taxi$tip_numeric <- as.numeric(taxi$tip == "yes")
ggplot(taxi, aes(x = distance, y = tip_numeric)) +
geom_jitter(alpha = 0.5, color = "blue", width = 0.1, height = 0.1) +
geom_smooth(method = "glm", method.args = list(family = "binomial"), color = "red") +
labs(title = "Relationship Between Taxi Trip Distance and Tipping",
x = "Distance (miles)", y = "Probability of Tipping (1 = Yes, 0 = No)",
subtitle = "Logistic Regression with Jitter") +
theme_minimal()
```
#### Conclusion:
The analysis shows a significant relationship between the distance traveled in a taxi and the likelihood of receiving a tip. Longer taxi trips consistently show a higher probability of tipping, as evidenced by logistic regression and supported by box and violin plot distributions. These findings suggest that passengers are more inclined to tip on longer journeys, likely due to the higher fares and increased service satisfaction over extended trips.
## Question 2
Do taxi passengers tend to tip more for the company Chicago Independents than the other companies?
```{r}
# Create a new variable to compare Chicago Independents vs others
taxi <- taxi %>%
mutate(company_group = ifelse(company == "Chicago Independents", "Chicago Independents", "Other Companies"))
# Tipping percentage by company group
tipping_percentage <- taxi %>%
group_by(company_group) %>%
summarise(
Tipping_Percentage = mean(tip == "yes", na.rm = TRUE) * 100
)
# Bar plot for tipping percentage
ggplot(tipping_percentage, aes(x = company_group, y = Tipping_Percentage, fill = company_group)) +
geom_col() +
labs(title = "Tipping Percentage: Chicago Independents vs Other Companies",
x = "", y = "Tipping Percentage (%)")
# Summary statistics for tipping by group
tipping_stats <- taxi %>%
group_by(company_group) %>%
summarise(
Mean_Distance = mean(distance, na.rm = TRUE),
Median_Distance = median(distance, na.rm = TRUE),
SD_Distance = sd(distance, na.rm = TRUE),
.groups = 'drop' # Drop grouping
)
print(tipping_stats)
```
### Conclusion:
Based on the analysis of the provided data, it appears that there is a difference in tipping behavior between the company Chicago Independents and other taxi companies. Specifically, the data indicates that passengers tend to tip more frequently when riding with Chicago Independents compared to other companies.
The comparison shows that the tipping percentage for Chicago Independents is approximately 94.8%, whereas for other companies it is around 91.8%. This suggests that passengers are slightly more inclined to leave a tip when they use the services of Chicago Independents.
## Factors to Consider:
Service Quality: It could be inferred that Chicago Independents may offer better service, leading to higher tips.
Customer Satisfaction: High tipping rates could also suggest greater customer satisfaction with Chicago Independents.
Company Policies: Differences in tipping could be influenced by company policies regarding driver interaction, fare rates, or other service-related factors.