-
Notifications
You must be signed in to change notification settings - Fork 43
[WEB-4399] Compress static assets post-build #2601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis change introduces a post-build asset compression step for Gatsby sites, using a new script and worker pool to gzip static assets. It adds a verification script to ensure all assets are compressed, updates the build and test workflows in CircleCI, and modifies the Nginx configuration to serve pre-compressed files. Dependencies for compression and file globbing are added. Changes
Sequence Diagram(s)sequenceDiagram
participant Gatsby as Gatsby Build
participant Compress as compressAssets (onPostBuild)
participant Worker as Piscina Worker Threads
participant Nginx as Nginx Server
Gatsby->>Compress: onPostBuild()
Compress->>Worker: Compress static assets (.css, .js, .json, .svg)
Worker-->>Compress: Write .gz files
Compress-->>Gatsby: Compression complete
Nginx->>Nginx: Serve request
Nginx->>Nginx: gzip_static on (serve .gz if present)
Assessment against linked issues
Suggested reviewers
Poem
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
ab52887
to
3a51511
Compare
1cd1e28
to
d699740
Compare
d699740
to
e05cfe4
Compare
7112eae
to
a1413aa
Compare
a1413aa
to
7d4767b
Compare
7d4767b
to
7e9a9ef
Compare
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (4)
.circleci/config.yml (1)
54-55
: Consider dynamic thread allocation based on environment.While setting a fixed thread count works well for CI, you might want to make this configurable based on the environment for local development or production deployments.
environment: - COMPRESS_MAX_THREADS: 8 + COMPRESS_MAX_THREADS: ${COMPRESS_MAX_THREADS:-8}data/onPostBuild/llmstxt.ts (2)
31-35
: Consider adding URL validationThe URL construction looks good, but consider adding validation to ensure the constructed URL is valid, especially since you're handling URL path prefixes.
const prefixPath = ({ url, siteUrl, pathPrefix = `` }: { url: string; siteUrl: string; pathPrefix?: string }) => { - return new URL(pathPrefix + withoutTrailingSlash(url), siteUrl).toString(); + try { + return new URL(pathPrefix + withoutTrailingSlash(url), siteUrl).toString(); + } catch (error) { + throw new Error(`Invalid URL: Could not construct URL from ${siteUrl}, ${pathPrefix}, and ${url}`); + } };
102-108
: Consider using async file operationsSince you're already in an async function, consider using
fs.promises.writeFile
instead offs.writeFileSync
for consistency and to avoid blocking the event loop.const llmsTxtPath = path.join(process.cwd(), 'public', 'llms.txt'); try { - fs.writeFileSync(llmsTxtPath, serializedPages.join('\n')); + await fs.promises.writeFile(llmsTxtPath, serializedPages.join('\n')); reporter.info(`${REPORTER_PREFIX} Successfully wrote llms.txt with ${serializedPages.length} pages`); } catch (err) { reporter.panic(`${REPORTER_PREFIX} Error writing llms.txt file`, err as Error); }data/onPostBuild/compressAssets.ts (1)
56-58
: Consider making compression options configurableThe number of iterations is hardcoded to 15. Consider making this configurable through an environment variable, similar to how you handle thread count.
const options = { - numiterations: 15, + numiterations: parseInt(process.env.COMPRESS_ITERATIONS || '15', 10), };
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
yarn.lock
is excluded by!**/yarn.lock
,!**/*.lock
📒 Files selected for processing (7)
.circleci/config.yml
(2 hunks)bin/assert-compressed.sh
(1 hunks)config/nginx.conf.erb
(1 hunks)data/onPostBuild/compressAssets.ts
(1 hunks)data/onPostBuild/index.ts
(1 hunks)data/onPostBuild/llmstxt.ts
(1 hunks)package.json
(3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
data/onPostBuild/llmstxt.ts (1)
data/onPostBuild/index.ts (1)
onPostBuild
(5-9)
data/onPostBuild/index.ts (2)
data/onPostBuild/compressAssets.ts (1)
onPostBuild
(24-49)data/onPostBuild/llmstxt.ts (1)
onPostBuild
(42-109)
🔇 Additional comments (20)
package.json (2)
46-46
: LGTM: Good choice of compression library.The
@gfx/zopfli
package is an excellent choice for this use case, providing better compression ratios than standard gzip while maintaining full gzip compatibility.
55-55
: Well-structured implementation for parallelized compression.The combination of
fast-glob
for efficient file discovery andpiscina
for worker thread management will enable efficient parallel compression of static assets.Also applies to: 81-81
config/nginx.conf.erb (2)
86-87
: Excellent optimization for serving pre-compressed assets.Moving
gzip_static on;
to the http level ensures all relevant static assets will be served in their pre-compressed form when available, which is more efficient than on-the-fly compression.
91-91
: LGTM: Fixed indentation.The indentation correction improves code readability while maintaining the same functionality.
.circleci/config.yml (2)
58-58
: LGTM: Resource upgrade supports parallel compression.Upgrading from
large
toxlarge
resource class provides more CPU cores to effectively utilize the parallel compression threads.
107-109
: Good validation step for compression process.This verification step ensures all static assets are properly compressed before proceeding with the Nginx tests, providing an early failure if compression wasn't successful.
bin/assert-compressed.sh (2)
1-8
: LGTM: Clear documentation and usage instructions.The script header provides clear documentation about the purpose and usage of this utility.
24-24
: LGTM: Clear success message.The success message clearly indicates that all files have been properly compressed.
data/onPostBuild/index.ts (2)
2-3
: Clean module imports for the new functionalityThe imports are clear and appropriately named, following the pattern of importing the specific
onPostBuild
functions from their respective modules.
5-8
: Well-structured sequential execution of post-build tasksThis implementation effectively chains the post-build steps in a clear, sequential manner. The comment accurately explains what's happening, and the code implements a clean orchestration of different post-build processes.
data/onPostBuild/llmstxt.ts (4)
5-9
: Clear documentation on module purposeThe file header clearly explains the purpose of this script, which helps with maintainability.
37-40
: The escapeMarkdown function is well implementedGood attention to detail in escaping all relevant Markdown special characters.
42-72
: Good error handling for GraphQL queryThe implementation includes thorough error handling and validation for the GraphQL query results, with appropriate error messages.
87-100
: Robust page serialization with error handlingThe implementation handles each page serialization individually with proper error handling, which prevents a single page error from breaking the entire process.
data/onPostBuild/compressAssets.ts (6)
9-22
: Great documentation explaining code purpose and structureThe comment block provides excellent context about the inspiration, functionality, and structure of the code. This kind of documentation is extremely helpful for maintenance and onboarding.
24-34
: File identification and mapping looks goodThe code effectively identifies relevant static assets using fast-glob and properly maps them to their future gzipped versions.
35-38
: Good use of environment variables for configurationUsing
COMPRESS_MAX_THREADS
as an environment variable with a sensible default is a good practice for making the behavior configurable.
39-46
: Effective use of worker pool for parallel processingThe Piscina worker pool is well-configured for TypeScript and properly utilized for parallel compression of files.
71-77
: Worker export implementation is correctThe conditional export for worker threads is implemented correctly using the
isMainThread
check.
65-69
:❌ Incorrect review comment
Potential issue with binary file handling
You're reading all files as UTF-8 text, but some of your target files (like SVGs) may contain binary data. This could cause corruption in the compressed output.
const compress = async ({ from, to }: CompressInputs) => { - const fileContent = await fs.readFile(from, 'utf8'); + const fileContent = await fs.readFile(from); const compressedContent = await gzipAsync(fileContent, options); await fs.writeFile(to, compressedContent); };
🏁 Script executed:
#!/bin/bash # Check if any SVG files might contain binary data that would be corrupted by UTF-8 reading echo "Checking for potential binary content in SVG files..." fd ".svg$" public -t f --exec file {} \; | grep -v "text"Length of output: 271
🏁 Script executed:
#!/bin/bash # 1. Show the beginning of the compressAssets.ts to identify which directories/files are targeted if [ -f data/onPostBuild/compressAssets.ts ]; then echo "=== data/onPostBuild/compressAssets.ts (first 200 lines) ===" sed -n '1,200p' data/onPostBuild/compressAssets.ts else echo "ERROR: data/onPostBuild/compressAssets.ts not found" exit 1 fi # 2. Search the entire repo for .svg files and detect any non-text (binary) SVGs echo echo "=== Scanning all .svg files in repo for non-text content ===" fd --extension svg --type f --exec file {} \; | grep -v text || echo "No binary SVGs detected"Length of output: 4371
No changes needed for UTF-8 reading—only text assets are targeted
The glob in onPostBuild only includes
.css
,.js
,.json
, and.svg
files, all of which are text-based. Reading them withutf8
encoding will not corrupt any binary data. You can safely leave thereadFile(from, 'utf8')
call as-is.Likely an incorrect or invalid review comment.
bin/assert-compressed.sh
Outdated
if [ ${ORIGINAL_COUNT} -ne ${GZ_COUNT} ]; then | ||
echo "Error: Found ${ORIGINAL_COUNT} original files but only ${GZ_COUNT} .gz files" | ||
echo "Original files:" | ||
find public -type f \( -name "*.css" -o -name "*.js" -o -name "*.json" -o -name "*.svg" \) | ||
echo "GZ files:" | ||
find public -type f -name "*.gz" | ||
exit 1 | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve error reporting with specific missing files.
When compression verification fails, it would be more helpful to see exactly which files are missing their compressed versions rather than dumping all files.
-if [ ${ORIGINAL_COUNT} -ne ${GZ_COUNT} ]; then
- echo "Error: Found ${ORIGINAL_COUNT} original files but only ${GZ_COUNT} .gz files"
- echo "Original files:"
- find public -type f \( -name "*.css" -o -name "*.js" -o -name "*.json" -o -name "*.svg" \)
- echo "GZ files:"
- find public -type f -name "*.gz"
+if [ $MISSING_COUNT -gt 0 ]; then
+ echo "Error: Found ${MISSING_COUNT} files without corresponding .gz versions"
+ echo "Missing compressed versions for:"
+ for file in "${MISSING_FILES[@]}"; do
+ echo " $file"
+ done
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In bin/assert-compressed.sh around lines 15 to 22, the error reporting lists all
original and compressed files when counts differ, which is not specific enough.
Modify the script to identify and display only the original files that do not
have corresponding .gz compressed versions. This can be done by comparing the
lists of original files and compressed files and printing the missing compressed
files explicitly to improve clarity in the error message.
bin/assert-compressed.sh
Outdated
# Count original files | ||
ORIGINAL_COUNT=$(find public -type f \( -name "*.css" -o -name "*.js" -o -name "*.json" -o -name "*.svg" \) | wc -l) | ||
|
||
# Count .gz files | ||
GZ_COUNT=$(find public -type f -name "*.gz" | wc -l) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider verifying file-by-file correspondence instead of just counts.
The current implementation only checks that the total counts match, but doesn't verify that each specific file has a corresponding .gz version.
-# Count original files
-ORIGINAL_COUNT=$(find public -type f \( -name "*.css" -o -name "*.js" -o -name "*.json" -o -name "*.svg" \) | wc -l)
-
-# Count .gz files
-GZ_COUNT=$(find public -type f -name "*.gz" | wc -l)
+# Find all files that should be compressed
+FILES=$(find public -type f \( -name "*.css" -o -name "*.js" -o -name "*.json" -o -name "*.svg" \))
+ORIGINAL_COUNT=$(echo "$FILES" | wc -l)
+
+# Check each file for a corresponding .gz version
+MISSING_FILES=()
+for file in $FILES; do
+ if [ ! -f "${file}.gz" ]; then
+ MISSING_FILES+=("$file")
+ fi
+done
+
+MISSING_COUNT=${#MISSING_FILES[@]}
🤖 Prompt for AI Agents
In bin/assert-compressed.sh around lines 9 to 14, the script currently compares
only the total counts of original files and .gz files, which does not ensure
each original file has a corresponding compressed version. Modify the script to
iterate over each original file found and check if a corresponding .gz file
exists in the same location. If any original file lacks a matching .gz file, the
script should report this and exit with an error to ensure full correspondence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is more thorough, but I don't see how it's more effective for our use case - by all means do accept tho
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jamiehenson That's fair! The count-based approach is certainly simpler. The file-by-file check would mainly help with debugging by identifying exactly which files missed compression if that ever happens. But if your compression process is reliable, the current approach works fine for your needs. Thanks for considering the suggestion!
ʕ•ᴥ•ʔ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderabbitai these are great suggestions, I've applied them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kennethkalmer Thank you for implementing the suggestions! The file-by-file verification approach will make debugging easier if any compression issues occur in the future, as you'll be able to identify exactly which files are missing their .gz
counterparts. This is a nice enhancement to your compression validation process.
ʕ•ᴥ•ʔ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent, yes it's slow on build, but the effects downstream are worth it. Agree with both the bunny's recommendations but they're not blockers so will leave up to you
<3 I've applied Abbot's suggestions, the output is much better. |
7e597dd
to
86f6333
Compare
Description
To improve the experience for our users we should be serving up pre-compressed versions of CSS, JS, JSON & SVG files.
This change takes inspiration from
gatsby-plugin-zopfli
and is essentially a smaller, inlined version of it.The best way to test it is to open the review app and then to look in the network inspector for savings like this:
Checklist
Summary by CodeRabbit
New Features
llms.txt
, is generated after each build, listing key site pages for easier discovery.Improvements
Chores