Skip to content

feat: add npm download statistics tracking system #366

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jun 23, 2025

Conversation

Janpot
Copy link
Member

@Janpot Janpot commented Jun 20, 2025

Summary

  • Add weekly GitHub Action to collect npm download statistics
  • Implement TypeScript script with parallel API fetching for performance
  • Store historical data grouped by major version for efficiency
  • Support for @mui/material and @base-ui/components packages

Features

  • Weekly automation: Runs every Sunday at midnight UTC
  • Manual trigger: Can be triggered manually via workflow_dispatch
  • Parallel fetching: Processes multiple packages simultaneously
  • Historical tracking: Maintains timestamped download history
  • Major version grouping: Aggregates downloads by major version using semver
  • Automatic commits: Commits and pushes updated data files

Data Structure

Data is stored in data/npm-versions/{package}.json with format:

{
  "package": "@mui/material",
  "timestamps": [1234567890],
  "downloads": {
    "5": [1000000],
    "6": [2000000]
  }
}

Test Results

✅ Successfully tested with both packages
✅ Proper directory structure created automatically
✅ Historical data updates working correctly

🤖 Generated with Claude Code

- Add weekly GitHub Action to collect npm download stats
- Implement TypeScript script with parallel API fetching
- Store historical data grouped by major version
- Support for @mui/material and @base-ui/components packages
- Automatic git commits with collected data

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@Janpot Janpot added the scope: code-infra Specific to the core-infra product label Jun 20, 2025
@Janpot Janpot requested a review from a team June 20, 2025 15:25
Janpot and others added 2 commits June 20, 2025 21:04
Co-authored-by: Michał Dudak <[email protected]>
Signed-off-by: Jan Potoms <[email protected]>
Co-authored-by: Michał Dudak <[email protected]>
Signed-off-by: Jan Potoms <[email protected]>

// Determine file path
const dataDir = join(process.cwd(), 'data', 'npm-versions');
const filePath = join(dataDir, `${packageName}.json`);
Copy link

@brijeshb42 brijeshb42 Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jsonl file format instead of json seems more apt for this use case.
No need of reading the existing contents and merging the new data and writing back.
You just append the new json string as a line at the end of the file.

const fs = require('fs');
const path = require('path');

// Your JSON object
const obj = {
  id: 123,
  name: "Example",
  active: true
};

// Convert the object to a one-line JSON string
const jsonLine = JSON.stringify(obj);

// Path to the .jsonl file
const filePath = path.join(__dirname, 'data.jsonl');

// Append it as a new line to the file
fs.appendFile(filePath, jsonLine + '\n', (err) => {
  if (err) {
    console.error('Error writing to file:', err);
  } else {
    console.log('JSON object appended successfully!');
  }
});

Copy link
Member

@dav-is dav-is Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be optimizing for the read of this data, not the write. Data is only written once per week as a background task, where it might be read much more often than that (like in a dashboard). I think the number of datapoints here (52/year) isn't enough to justify using jsonl. Once we have 5 years of datapoints (260 points per package major), we could archive old data and incorporate it into a "by month" dataset, so I don't think memory usage is a long term concern either.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jsonl is optimal for reading as well since we can read it line by line for the data points.
But I agree that given its weekly, it doesn't make much sense for early optimization.

Copy link
Member

@dav-is dav-is Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it is optimal performance wise, but has less DX compared to simply JSON.parse(stats) or import stats from './stats.json' (build time parse in webpack loader)

Copy link
Member Author

@Janpot Janpot Jun 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was initially aiming to optimize for was

  1. simple, cheap and maintenance free, no servers or databases.
  2. read performance, the plan is to read this directly from raw.githubusercontent.com in the infra dashboard. I want the file to be small.

@dav-is that is exactly where my mind was when building this. I saw those npm api results range from a few kb up to a few 100 kb. I picked this format (it's basically a column store) as it's so well size optimized that we can avoid building in rollover or expiration logic forever.

Drawback though is that we lose all the individual version information. I'm removing the per-major aggregation and do that on the client instead.

Janpot added 8 commits June 21, 2025 07:54
Introduced a fetchWithRetry function to handle network errors and transient server issues when fetching NPM package stats. This improves reliability by retrying failed requests up to three times with a delay.
@Janpot Janpot merged commit c54cf4f into master Jun 23, 2025
7 checks passed
@Janpot Janpot deleted the feat/npm-download-stats-tracker branch June 23, 2025 10:55
Janpot added a commit that referenced this pull request Jun 23, 2025
Copy link
Member

@LukasTy LukasTy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice initiative and great use of Claude Code. 👍

ON related note: Have you considered adding any of the X packages to the mix? 🤔

Comment on lines +9 to +10
permissions:
contents: write
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: If I'm not mistaken, we usually set permissions on the job level. 🤔

I.e.:

permissions: {}

instead of this.
And the following after L14:

permissions:
  contents: write

@Janpot
Copy link
Member Author

Janpot commented Jun 23, 2025

ON related note: Have you considered adding any of the X packages to the mix? 🤔

@LukasTy yes, I'll add them. but I'm going to put this on a personal repo for now. I know it's possible, but I don't want to build around branch protection rules for this. Don't want to create too powerful bypasses just for this functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
scope: code-infra Specific to the core-infra product
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants