Skip to content

Commit 2f88da2

Browse files
committed
enh: add data processing workflow explanation
1 parent 408afaa commit 2f88da2

File tree

4 files changed

+84
-0
lines changed

4 files changed

+84
-0
lines changed

community/github/data-process.md

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
# pyOpenSci Infrastructure Overview
2+
3+
This page will help you understand how we collect and process peer review and contributor data to:
4+
5+
* Highlight pyOpenSci contributors
6+
* Track our peer review process
7+
* Showcase peer-reviewed Python packages
8+
9+
## How it works
10+
11+
We use a Python package called `pyosMeta` to **extract and transform contributor and peer review data** into machine-readable formats (`.yml` and `.csv`).
12+
13+
This data allows us to **automatically update**:
14+
15+
* our [public website contributor listing](https://www.pyopensci.org/our-community/index.html)
16+
* our [website accepted package listing](https://www.pyopensci.org/python-packages.html)
17+
* our [metrics dashboard](https://www.pyopensci.org/metrics)
18+
19+
with up-to-date contributor and review information, directly from GitHub.
20+
21+
## Data collection and processing
22+
23+
We collect two types of data from GitHub:
24+
25+
1. **Contributor data**
26+
Parsed from [All Contributors bot config files](https://github.com/pyOpenSci/pyopensci.github.io/blob/main/.all-contributorsrc) found in each pyOpenSci repo.
27+
28+
2. **Peer review submission data**
29+
Extracted from [issues in the software-submission repo](https://github.com/pyOpenSci/software-submission/issues), including:
30+
* package name and repo URL
31+
* editor and reviewers
32+
* maintainers and authors
33+
34+
This data is processed by `pyosMeta`, which generates:
35+
36+
* `_data/contributors.yml`
37+
* `_data/packages.yml`
38+
* `.csv` files for metrics
39+
40+
## Where the data goes
41+
42+
The processed data files are used in two main parts of our website:
43+
44+
* **Website GitHub Repo**
45+
* A cron job reads the `.yml` files to populate our
46+
👉 [Contributors page](https://www.pyopensci.org/our-community/index.html#pyopensci-community-contributors)
47+
👉 [Packages page](https://www.pyopensci.org/python-packages.html)
48+
49+
* **Metrics GitHub Repo**
50+
* A cron job reads `.csv` files to generate the
51+
👉 [Peer review status dashboard](https://www.pyopensci.org/metrics/peer-review/current-review-status.html)
52+
53+
## Workflow diagram
54+
55+
The diagram below explains the basic workflow that we use.
56+
57+
:::{mermaid}
58+
graph TD
59+
subgraph Sources
60+
A1[All Contributors Bot]
61+
A2[Peer Review Submissions *GitHub Issues*]
62+
end
63+
64+
subgraph pyosmeta
65+
A3[pyosmeta]
66+
end
67+
68+
A1 --> A3
69+
A2 --> A3
70+
71+
A3 -->|DATA:
72+
_data/contributors.yml,
73+
_data/packages.yml| B1[Website GitHub Repo]
74+
A3 -->|DATA:
75+
_/*.CSV | B2[Metrics GitHub Repo]
76+
77+
B1 -->|Cron job reads YAML| C1[🔗 Contributor listing page]
78+
B2 -->|Cron job reads CSV| C2[Generate metric plots]
79+
80+
click C1 "https://www.pyopensci.org/our-community/index.html#pyopensci-community-contributors" "View pyOpenSci Contributor Page"
81+
:::

community/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ GitHub Issue Guidelines <github/issues>
3030
Pull Requests <github/pull-requests>
3131
Continuous Integration (CI) <github/continuous-integration>
3232
GitHub permissions <github/permissions>
33+
Data Workflows <github/data-process>
3334
:::
3435

3536
:::{toctree}

conf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@
3939
"sphinx_sitemap",
4040
"sphinxext.opengraph",
4141
"sphinx_favicon",
42+
"sphinxcontrib.mermaid",
4243
]
4344

4445
# colon fence for card support in md

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ dependencies = [
2020
# Support for social / adds meta tags
2121
"sphinxext-opengraph",
2222
"sphinx-inline-tabs",
23+
"sphinxcontrib-mermaid",
2324
# for project cards
2425
"matplotlib"
2526
]

0 commit comments

Comments
 (0)