Skip to content

Commit 06358d1

Browse files
committed
Tweaks
1 parent 9f737ce commit 06358d1

File tree

1 file changed

+45
-34
lines changed

1 file changed

+45
-34
lines changed

docs/index.md

Lines changed: 45 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -164,25 +164,26 @@ If an error appears, verify that you entered your token correctly. If you are su
164164

165165
## Import data
166166

167-
Now that we're connected to Datawrapper, it's time to introduct the data that we'll use to create our charts. We'll use a dataset of arrests made by the Baltimore Police Department that is published [on the city's data portal](https://data.baltimorecity.gov/datasets/baltimore::bpd-arrests/about). To speed up the class, we've created [a simplified version](https://raw.githubusercontent.com/palewire/first-automated-chart/main/_notebooks/arrests.csv) for use here.
167+
Now that you're connected to Datawrapper, it's time to introduce the data you'll use to create your charts. You'll use a dataset of arrests made by the Baltimore Police Department that is published [on the city's data portal](https://data.baltimorecity.gov/datasets/baltimore::bpd-arrests/about). To speed up the class, we've created [a simplified version](https://raw.githubusercontent.com/palewire/first-automated-chart/main/_notebooks/arrests.csv) that doesn't require any data cleaning.
168168

169-
We'll read in the data using the [`pandas`](https://pandas.pydata.org/) library, which is a popular tool for working with data in Python that covered in depth by ["First Python Notebook."](https://palewi.re/docs/first-python-notebook/) Before you can use it, you'll need to import it in your Jupyter Desktop environment using the same technique you used to install the `datawrapper` library.
169+
We'll read in the data using the [`pandas`](https://pandas.pydata.org/) library, a popular tool for working with data in Python covered in depth by ["First Python Notebook."](https://palewi.re/docs/first-python-notebook/) Before you can use it, you'll need to import it in your Jupyter Desktop environment using the same technique you used to install the `datawrapper` library.
170170

171171
```python
172172
import pandas as pd
173173
```
174174

175-
```{note}
175+
``````{note}
176176
If your notebook throws an error and says pandas can't be found, you can install it using the technique we employed for the datawrapper library.
177177
178178
```bash
179179
%pip install pandas
180180
```
181181
182182
After that completes, try importing pandas again.
183-
```
183+
``````
184+
184185

185-
We'll read in the data using the `read_csv` function and save it as a variable named `df`.
186+
Read in the data using the `read_csv` function and save it as a variable named `df`. First we'll use the URL of the dataset, which is hosted on GitHub, and then we'll pass in a list of the columns that contain dates so that pandas can parse them correctly.
186187

187188
```python
188189
df = pd.read_csv(
@@ -191,7 +192,7 @@ df = pd.read_csv(
191192
)
192193
```
193194

194-
That can be inspected by running the `head` method on the `df` object, which will show the first five rows.
195+
The table, known in pandas as a DataFrame, can be inspected by running the `head` method on the `df` object. That will show the first five rows.
195196

196197
```python
197198
df.head()
@@ -201,19 +202,19 @@ You can see that the dataset features one row for each arrest, with columns for
201202

202203
## Create one chart
203204

204-
With these materials, any number of charts could be created. As a simple start, lets consider a chart that shows the number of arrests in Baltimore by year. We could look into the idea by creating a new column in the `df` object that contains the year of each arrest.
205+
With these materials, any number of charts could be created. As a simple start, lets consider a chart that shows the number of arrests in Baltimore by year. Lets look into the idea by creating a new column in the `df` object that contains the year of each arrest.
205206

206207
```python
207208
df['year'] = df.ArrestDateTime.dt.year
208209
```
209210

210-
And then we could count the tally of arrests in each year.
211+
Then tally the arrests logged in each year.
211212

212213
```python
213214
df.year.value_counts()
214215
```
215216

216-
That will return some eye-opening numbers. It looks like the number of arrests in Baltimore has been falling over the years, exactly the kind of thing we might want to visualize with a chart.
217+
That will return some eye-opening numbers. The number of arrests in Baltimore has been falling dramatically in recent years, exactly the kind of trend we might want to visualize with a chart.
217218

218219
```python
219220
2010 45224
@@ -233,13 +234,17 @@ That will return some eye-opening numbers. It looks like the number of arrests i
233234
Name: year, dtype: int64
234235
```
235236

236-
Before we can pass our data into Datawrapper, we need to reshape it into a pandas DataFrame, the kind of data structure that our Python library expects. We can do that by calling the `sort_index` and `reset_index` methods on the end of the `value_counts` method.
237+
```{note}
238+
You can read about this long-term trend in stories by [the Baltimore Banner](https://www.thebaltimorebanner.com/community/criminal-justice/driven-by-warrants-arrests-are-up-in-baltimore-for-the-first-time-in-more-than-a-decade-SXXOPBKJSVBY7IN7GWRHQ5IDAM/), the [BBC](https://www.bbc.com/news/world-us-canada-32889836) the [Washington Post](https://www.washingtonpost.com/outlook/baltimore-police-reforms-crime/2020/06/18/7d60e91e-b041-11ea-8758-bfd1d045525a_story.html) and the [New York Post](https://nypost.com/2015/05/28/baltimore-gets-bloodier-as-arrests-drop-sharply/).
239+
```
240+
241+
Before we can pass our data into Datawrapper, we need to reshape it into a pandas DataFrame, the kind of data structure that our datawrapper library expects. We can do that by calling the `sort_index` and `reset_index` methods on the end of the `value_counts` method.
237242

238243
```python
239244
totals_by_year = df.year.value_counts().sort_index().reset_index()
240245
```
241246

242-
That should output a tidy table that's ready for the API. The only other things you need to make a basic chart are a title and a chart type. You can write whatever headline you like, but every chart type has a strict code name that you can find in the [Datawrapper documentation](https://developer.datawrapper.de/docs/chart-types).
247+
That should output a tidy table that's ready for the API. The only other things you need to make a basic chart are a headline and a chart type. You can write whatever headline you like, but every chart type has a strict code name that you can find in the [Datawrapper documentation](https://developer.datawrapper.de/docs/chart-types).
243248

244249
[![](_static/datawrapper-chart-types.png)](https://developer.datawrapper.de/docs/chart-types)
245250

@@ -257,11 +262,11 @@ chart_config = dw.create_chart(
257262
)
258263
```
259264

260-
If the cell runs without error, a new chart is born. You can see it by visiting [https://app.datawrapper.de/](https://https://app.datawrapper.de/) in your logged in browser.
265+
If the cell runs without error, a new chart is born. You can see it by visiting [https://app.datawrapper.de/](https://https://app.datawrapper.de/) in your browser.
261266

262-
![](_static/first-chart.png)
267+
![A new chart on the datawrapper dashboard](_static/first-chart.png)
263268

264-
Congratulations! You've created your first chart using the Datawrapper API. While it's ready for review in the dashboard, it won't be published by default. Let's learn how to do that next.
269+
Congratulations! You've created your first chart using the Datawrapper API. While it's ready for review in the dashboard, it won't be published for others to see. Let's learn how to do that next.
265270

266271
Back in our notebook, the method returned a dictionary with information about the chart that was created. You can inspect it by running the variable name in a new cell.
267272

@@ -294,7 +299,7 @@ dw.display_chart(chart_id)
294299

295300
### Set the chart description
296301

297-
A common practice in data journalism is to provide a citation of the sourcing of the data behind a chart. This is often done in the "Describe" tab of the Datawrapper interface. You can also do it using the `update_description` method of the `dw` object. Here we'll set the source name, source URL and byline.
302+
A common practice in journalism is to provide a citation for the soruce data behind a chart. This is can be done manually in the "Describe" tab of the Datawrapper interface. You can also do it using the `update_description` method of the `dw` object. Here we'll set the source name, source URL and byline.
298303

299304
```python
300305
dw.update_description(
@@ -311,7 +316,7 @@ Run that cell and republish your chart.
311316
dw.publish_chart(chart_id)
312317
```
313318

314-
You can see the changes by, again, asking the `dw` object to display the chart's embed.
319+
You can see the changes by, again, asking the `dw` object to display the chart's embed. Take a look at the bottom line of the chart to see the citation.
315320

316321
```python
317322
dw.display_chart(chart_id)
@@ -322,18 +327,18 @@ dw.display_chart(chart_id)
322327

323328
### Style the chart
324329

325-
You can much more than that by using Python to configure the chart's metadata. There are literally dozens of different ways to customize axis labels, annotations, colors, legends, lines, bars and much more. A simple example is to change the color of the bars to match the IRE's accent color.
330+
You cando much more than that by using Python to configure the chart's metadata. There are literally dozens of different ways to customize axis labels, annotations, colors, legends, lines, bars and other features.
326331

327332
```{note}
328333
You can find a list of many of the available options in the [Datawrapper documentation](https://developer.datawrapper.de/docs/chart-properties).
329334
```
330335

331-
That can be done by creating a dictionary of metadata to the `metadata` parameter of the `update_chart` method. Here we'll set the "base-color" to the IRE's accent color, which is a nice shade of orange. It must conform precisely with the format expected by Datawrapper's API.
336+
A simple example is to change the color of the bars. That can be done by creating a dictionary of configuration options to the `metadata` parameter of the `update_chart` method. It must conform precisely with the format expected by Datawrapper's API. Here we'll set the "base-color" to a nice shade of orange.
332337

333338
```python
334339
metadata = {
335340
"visualize": {
336-
"base-color": "#bf7836" # IRE's accent color
341+
"base-color": "#bf7836" # Our accent color
337342
}
338343
}
339344
```
@@ -363,15 +368,17 @@ dw.display_chart(chart_id)
363368

364369
## Create many charts
365370

366-
Now on to our next challenge. While using Python to make one chart is a nice trick, it's also pretty easy to do with a mouse and keyboard. One benefit of automating chart creation with Python is that the code you right can be reused to make many charts.
371+
While using Python to make one chart is a nice trick, it's also pretty easy to do with a mouse and keyboard. One benefit of automating chart creation with Python is that the code you write to make one chart can be reused to make many charts.
372+
373+
For instance, we could use the tricks we learned making our citywide chart to create a separate chart for each of Baltimore's police districts.
367374

368-
For instance, we could create a chart for each of Baltimore's police districts. Take a look at our sample data again by running the `head` command.
375+
If you take a look at our sample data again by running the `head` command, you'll notice that there is column called “District.”
369376

370377
```python
371378
df.head()
372379
```
373380

374-
You'll notice that there is column called District. We can have a closer at what's in it by running the `value_counts` method, just as we did with the year.
381+
Have a closer at what's in it by running the `value_counts` method, just as we did with the year. It will show that there are nine unique districts in the dataset.
375382

376383
```python
377384
df.District.value_counts()
@@ -391,7 +398,9 @@ Southwest 21822
391398
Northern 13087
392399
```
393400

394-
We can use a Python to loop through the district and create an annual arrests chart for each one. While there are numerous ways to accomplish this task, for this example we'll write a function that takes the name of a district as an argument and returns a chart. We'll then use a `for` loop to call that function for each district.
401+
That means could use Python and our datawrapper library create nine different annual arrests charts.
402+
403+
While there are numerous ways to accomplish this task, in this example we'll write a function that takes the name of a district as an argument and returns a chart. We'll then use a `for` loop to call that function for each district.
395404

396405
Here's a function that does exactly that. We don't have enough time to walk through every step of it, but if you look closely you can see that it's very similar to the code we used to create the first chart. You should copy and paste it into a new cell in your notebook.
397406

@@ -479,27 +488,27 @@ for district in df.District.dropna().unique():
479488
chart_list.append(c)
480489
```
481490

482-
The charts will be created and published in Datawrapper. You can see them all at once in your notebook by introducing the `display` function from the `IPython.display` library.
491+
You can see them all at once in your notebook by introducing the `display` function from the `IPython.display` library ...
483492

484493
```python
485494
from IPython.display import display
486495
```
487496

488-
And passing in the list of charts as arguments.
497+
... and passing in the list of charts as arguments.
489498

490499
```python
491500
display(*chart_list)
492501
```
493502

494-
Not bad, right! You've just created a dozen charts in a few seconds. You could do the same with an unlimited number of charts, as long as you have the data to supply the API.
503+
Not bad, right? You've just created a dozen charts in a few seconds. You could do the same with an unlimited number of charts, as long as you have the data to supply the API.
495504

496505
## Create a chart that runs on a schedule
497506

498-
That's one example of how Python can supercharge your chart production. Here's another: You can write computer code that, when run on a schedule, will automatically create and publish a chart. This is a powerful way to publish charts whenever new data is available, or to create a series of charts that update on a regular basis.
507+
That's one example of how Python can supercharge your chart production. Here's another: You can write computer code that, when run on a schedule, will create a chart. This is a powerful way to publish whenever new records are available.
499508

500-
As an example, let's automate a chart that could be useful to a newsroom. We'll create a chart that shows the top 10 arrest charges in Baltimore over the last week which could, in theory, be published every Monday morning when new data is posted to the city's data portal.
509+
As an example, let's automate a chart that could be useful to a newsroom. We'll create a chart that shows the top 10 arrest charges in Baltimore over the last week. It could, in theory, be published every Monday morning after new data is posted to the city data portal.
501510

502-
First we'll find the most recent date in our dataset, which will be the end of the week we want to chart. That can be done with pandas by calling the `max` method on the `ArrestDateTime` column.
511+
First we'll find the most recent date in our dataset, which will be the end of the week we want to chart. That can be done with pandas by calling the `max` method on a date column.
503512

504513
```python
505514
df.ArrestDateTime.dt.date.max()
@@ -511,19 +520,21 @@ Let's save that into a variable.
511520
end_date = df.ArrestDateTime.dt.date.max()
512521
```
513522

514-
Then we'll use the `timedelta` class from the `datetime` module to find the date one week before. First we import the tool.
523+
Then we'll use the `timedelta` class from the `datetime` module to find the date one week before, which will be the start of our date range.
524+
525+
First we import the tool.
515526

516527
```python
517528
from datetime import timedelta
518529
```
519530

520-
Then we can subtract sevent days from the end date to find the start date.
531+
Then we can subtract seven days from the end date.
521532

522533
```python
523534
seven_days_ago = end_date - timedelta(days=7)
524535
```
525536

526-
Filter the dataset to the last week by creating a new DataFrame that only contains the rows where the `ArrestDateTime` is after the start date.
537+
Now filter the dataset to the last week by creating a new DataFrame that only contains the rows where the `ArrestDateTime` is after the start date.
527538

528539
```python
529540
last_week_df = df[df.ArrestDateTime.dt.date > seven_days_ago]
@@ -583,8 +594,8 @@ dw.display_chart(chart_id)
583594

584595
Boom. We're created a little Python routine that, provided with an updated dataset, could be rerun at any time to create a fresh chart.
585596

586-
There are numerous ways you could run such a script according to a schedule, a task beyond the scope of this course. One popular tool is [GitHub Actions](https://docs.github.com/en/actions), a free service linked to GitHub respositories. You can learn how journalists use it to automate data work in our complimentary class [“First GitHub Scraper"](https://palewi.re/docs/first-github-scraper/).
597+
There are numerous ways you could run such a script according to a schedule, a task beyond the scope of this course. One popular tool is [GitHub Actions](https://docs.github.com/en/actions), a free service linked to GitHub respositories. You can learn how journalists use it to automate data work in our complimentary class [“First GitHub Scraper."](https://palewi.re/docs/first-github-scraper/)
587598

588599
## About this class
589600

590-
This guide was prepared by [Ben Welsh](https://palewi.re/who-is-ben-welsh/) and [Sergio Sanchez Zavala](https://github.com/chekos) for [a training session](https://schedules.ire.org/nicar-2024/index.html#2110) at the National Institute for Computer-Assisted Reporting (NICAR)’s 2024 conference in Baltimore. Some of the copy was written with the assistance of GitHub's Copilot, an AI-powered code completion tool. The materials are available as free and open source on [GitHub](https://github.com/palewire/first-automated-chart)
601+
This guide was prepared by [Ben Welsh](https://palewi.re/who-is-ben-welsh/) and [Sergio Sanchez Zavala](https://github.com/chekos) for [a training session](https://schedules.ire.org/nicar-2024/index.html#2110) at the National Institute for Computer-Assisted Reporting’s 2024 conference in Baltimore. Some of the copy was written with the assistance of GitHub's Copilot, an AI-powered text generator. The materials are available as free and open source on [GitHub](https://github.com/palewire/first-automated-chart)

0 commit comments

Comments
 (0)