-
Notifications
You must be signed in to change notification settings - Fork 20
feat: add npm download statistics tracking system #366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add weekly GitHub Action to collect npm download stats - Implement TypeScript script with parallel API fetching - Store historical data grouped by major version - Support for @mui/material and @base-ui/components packages - Automatic git commits with collected data 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Co-authored-by: Michał Dudak <[email protected]> Signed-off-by: Jan Potoms <[email protected]>
Co-authored-by: Michał Dudak <[email protected]> Signed-off-by: Jan Potoms <[email protected]>
scripts/collect-npm-stats.ts
Outdated
|
||
// Determine file path | ||
const dataDir = join(process.cwd(), 'data', 'npm-versions'); | ||
const filePath = join(dataDir, `${packageName}.json`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jsonl file format instead of json seems more apt for this use case.
No need of reading the existing contents and merging the new data and writing back.
You just append the new json string as a line at the end of the file.
const fs = require('fs');
const path = require('path');
// Your JSON object
const obj = {
id: 123,
name: "Example",
active: true
};
// Convert the object to a one-line JSON string
const jsonLine = JSON.stringify(obj);
// Path to the .jsonl file
const filePath = path.join(__dirname, 'data.jsonl');
// Append it as a new line to the file
fs.appendFile(filePath, jsonLine + '\n', (err) => {
if (err) {
console.error('Error writing to file:', err);
} else {
console.log('JSON object appended successfully!');
}
});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be optimizing for the read of this data, not the write. Data is only written once per week as a background task, where it might be read much more often than that (like in a dashboard). I think the number of datapoints here (52/year) isn't enough to justify using jsonl
. Once we have 5 years of datapoints (260 points per package major), we could archive old data and incorporate it into a "by month" dataset, so I don't think memory usage is a long term concern either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jsonl is optimal for reading as well since we can read it line by line for the data points.
But I agree that given its weekly, it doesn't make much sense for early optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it is optimal performance wise, but has less DX compared to simply JSON.parse(stats)
or import stats from './stats.json'
(build time parse in webpack loader)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I was initially aiming to optimize for was
- simple, cheap and maintenance free, no servers or databases.
- read performance, the plan is to read this directly from raw.githubusercontent.com in the infra dashboard. I want the file to be small.
@dav-is that is exactly where my mind was when building this. I saw those npm api results range from a few kb up to a few 100 kb. I picked this format (it's basically a column store) as it's so well size optimized that we can avoid building in rollover or expiration logic forever.
Drawback though is that we lose all the individual version information. I'm removing the per-major aggregation and do that on the client instead.
Introduced a fetchWithRetry function to handle network errors and transient server issues when fetching NPM package stats. This improves reliability by retrying failed requests up to three times with a delay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice initiative and great use of Claude Code. 👍
ON related note: Have you considered adding any of the X packages to the mix? 🤔
permissions: | ||
contents: write |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: If I'm not mistaken, we usually set permissions on the job level. 🤔
I.e.:
permissions: {}
instead of this.
And the following after L14:
permissions:
contents: write
@LukasTy yes, I'll add them. but I'm going to put this on a personal repo for now. I know it's possible, but I don't want to build around branch protection rules for this. Don't want to create too powerful bypasses just for this functionality. |
Summary
Features
Data Structure
Data is stored in
data/npm-versions/{package}.json
with format:Test Results
✅ Successfully tested with both packages
✅ Proper directory structure created automatically
✅ Historical data updates working correctly
🤖 Generated with Claude Code