[WEB-4399] Compress static assets post-build #2601

kennethkalmer · 2025-05-14T11:55:47Z

Description

To improve the experience for our users we should be serving up pre-compressed versions of CSS, JS, JSON & SVG files.

This change takes inspiration from gatsby-plugin-zopfli and is essentially a smaller, inlined version of it.

The best way to test it is to open the review app and then to look in the network inspector for savings like this:

Checklist

Commits have been rebased.
Linting has been run against the changed file(s).
The PR adheres to the writing style guide and contribution guide.

Summary by CodeRabbit

New Features
- Static assets are now automatically compressed after each build to improve loading performance.
- A new validation step ensures all required files are properly compressed before deployment.
- A new file, llms.txt, is generated after each build, listing key site pages for easier discovery.
Improvements
- Enhanced server configuration to serve pre-compressed assets more efficiently.
- Increased build resources for faster and more reliable builds.
Chores
- Added new dependencies to support asset compression and validation.

coderabbitai · 2025-05-14T11:55:53Z

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This change introduces a post-build asset compression step for Gatsby sites, using a new script and worker pool to gzip static assets. It adds a verification script to ensure all assets are compressed, updates the build and test workflows in CircleCI, and modifies the Nginx configuration to serve pre-compressed files. Dependencies for compression and file globbing are added.

Changes

File(s)	Change Summary
.circleci/config.yml	Updated build job to use `COMPRESS_MAX_THREADS`, increased resource class, and added a compression verification step to test-nginx.
bin/assert-compressed.sh	Added new script to assert that all relevant files in `public/` have corresponding `.gz` files.
config/nginx.conf.erb	Moved `gzip_static on;` to global `http` block and removed from location block; minor indentation fix.
data/onPostBuild/compressAssets.ts	Added Gatsby `onPostBuild` hook for compressing static assets using worker threads and zopfli gzip.
data/onPostBuild/llmstxt.ts	Added Gatsby `onPostBuild` hook to generate `llms.txt` file with site pages’ metadata.
data/onPostBuild/index.ts	Refactored to sequentially invoke `llmstxt` and `compressAssets` hooks instead of inline logic.
package.json	Added dependencies: `@gfx/zopfli`, `fast-glob`, and `piscina` for compression and file operations.

Sequence Diagram(s)

sequenceDiagram
    participant Gatsby as Gatsby Build
    participant Compress as compressAssets (onPostBuild)
    participant Worker as Piscina Worker Threads
    participant Nginx as Nginx Server

    Gatsby->>Compress: onPostBuild()
    Compress->>Worker: Compress static assets (.css, .js, .json, .svg)
    Worker-->>Compress: Write .gz files
    Compress-->>Gatsby: Compression complete

    Nginx->>Nginx: Serve request
    Nginx->>Nginx: gzip_static on (serve .gz if present)

Assessment against linked issues

Objective	Addressed	Explanation
Compress static assets in Gatsby build and configure Nginx to serve pre-compressed files (WEB-4399)	✅

Suggested reviewers

m-hulbert
jamiehenson

Poem

A hop, a skip, a gzip run,
Compressing assets, oh what fun!
With worker threads and zopfli’s might,
Our files are smaller, servers light.
Nginx now serves with glee—
Fast and snappy, as it should be!
🐇✨

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

kennethkalmer · 2025-05-15T16:17:22Z

@coderabbitai full review

coderabbitai · 2025-05-15T16:17:28Z

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (4)

.circleci/config.yml (1)
54-55: Consider dynamic thread allocation based on environment.

While setting a fixed thread count works well for CI, you might want to make this configurable based on the environment for local development or production deployments.
 environment:
-  COMPRESS_MAX_THREADS: 8
+  COMPRESS_MAX_THREADS: ${COMPRESS_MAX_THREADS:-8}
data/onPostBuild/llmstxt.ts (2)
31-35: Consider adding URL validation

The URL construction looks good, but consider adding validation to ensure the constructed URL is valid, especially since you're handling URL path prefixes.
const prefixPath = ({ url, siteUrl, pathPrefix = `` }: { url: string; siteUrl: string; pathPrefix?: string }) => {
-  return new URL(pathPrefix + withoutTrailingSlash(url), siteUrl).toString();
+  try {
+    return new URL(pathPrefix + withoutTrailingSlash(url), siteUrl).toString();
+  } catch (error) {
+    throw new Error(`Invalid URL: Could not construct URL from ${siteUrl}, ${pathPrefix}, and ${url}`);
+  }
};
102-108: Consider using async file operations

Since you're already in an async function, consider using fs.promises.writeFile instead of fs.writeFileSync for consistency and to avoid blocking the event loop.
  const llmsTxtPath = path.join(process.cwd(), 'public', 'llms.txt');
  try {
-    fs.writeFileSync(llmsTxtPath, serializedPages.join('\n'));
+    await fs.promises.writeFile(llmsTxtPath, serializedPages.join('\n'));
    reporter.info(`${REPORTER_PREFIX} Successfully wrote llms.txt with ${serializedPages.length} pages`);
  } catch (err) {
    reporter.panic(`${REPORTER_PREFIX} Error writing llms.txt file`, err as Error);
  }
data/onPostBuild/compressAssets.ts (1)
56-58: Consider making compression options configurable

The number of iterations is hardcoded to 15. Consider making this configurable through an environment variable, similar to how you handle thread count.
const options = {
-  numiterations: 15,
+  numiterations: parseInt(process.env.COMPRESS_ITERATIONS || '15', 10),
};

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 16de4fa and 7e597dd.

⛔ Files ignored due to path filters (1)

yarn.lock is excluded by !**/yarn.lock, !**/*.lock

📒 Files selected for processing (7)

.circleci/config.yml (2 hunks)
bin/assert-compressed.sh (1 hunks)
config/nginx.conf.erb (1 hunks)
data/onPostBuild/compressAssets.ts (1 hunks)
data/onPostBuild/index.ts (1 hunks)
data/onPostBuild/llmstxt.ts (1 hunks)
package.json (3 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (2)

data/onPostBuild/llmstxt.ts (1)

data/onPostBuild/index.ts (1)

onPostBuild (5-9)

data/onPostBuild/index.ts (2)

data/onPostBuild/compressAssets.ts (1)

onPostBuild (24-49)

data/onPostBuild/llmstxt.ts (1)

onPostBuild (42-109)

🔇 Additional comments (20)

package.json (2)

46-46: LGTM: Good choice of compression library.

The @gfx/zopfli package is an excellent choice for this use case, providing better compression ratios than standard gzip while maintaining full gzip compatibility.

55-55: Well-structured implementation for parallelized compression.

The combination of fast-glob for efficient file discovery and piscina for worker thread management will enable efficient parallel compression of static assets.

Also applies to: 81-81

config/nginx.conf.erb (2)

86-87: Excellent optimization for serving pre-compressed assets.

Moving gzip_static on; to the http level ensures all relevant static assets will be served in their pre-compressed form when available, which is more efficient than on-the-fly compression.

91-91: LGTM: Fixed indentation.

The indentation correction improves code readability while maintaining the same functionality.

.circleci/config.yml (2)

58-58: LGTM: Resource upgrade supports parallel compression.

Upgrading from large to xlarge resource class provides more CPU cores to effectively utilize the parallel compression threads.

107-109: Good validation step for compression process.

This verification step ensures all static assets are properly compressed before proceeding with the Nginx tests, providing an early failure if compression wasn't successful.

bin/assert-compressed.sh (2)

1-8: LGTM: Clear documentation and usage instructions.

The script header provides clear documentation about the purpose and usage of this utility.

24-24: LGTM: Clear success message.

The success message clearly indicates that all files have been properly compressed.

data/onPostBuild/index.ts (2)

2-3: Clean module imports for the new functionality

The imports are clear and appropriately named, following the pattern of importing the specific onPostBuild functions from their respective modules.

5-8: Well-structured sequential execution of post-build tasks

This implementation effectively chains the post-build steps in a clear, sequential manner. The comment accurately explains what's happening, and the code implements a clean orchestration of different post-build processes.

data/onPostBuild/llmstxt.ts (4)

5-9: Clear documentation on module purpose

The file header clearly explains the purpose of this script, which helps with maintainability.

37-40: The escapeMarkdown function is well implemented

Good attention to detail in escaping all relevant Markdown special characters.

42-72: Good error handling for GraphQL query

The implementation includes thorough error handling and validation for the GraphQL query results, with appropriate error messages.

87-100: Robust page serialization with error handling

The implementation handles each page serialization individually with proper error handling, which prevents a single page error from breaking the entire process.
data/onPostBuild/compressAssets.ts (6)

9-22: Great documentation explaining code purpose and structure

The comment block provides excellent context about the inspiration, functionality, and structure of the code. This kind of documentation is extremely helpful for maintenance and onboarding.

24-34: File identification and mapping looks good

The code effectively identifies relevant static assets using fast-glob and properly maps them to their future gzipped versions.

35-38: Good use of environment variables for configuration

Using COMPRESS_MAX_THREADS as an environment variable with a sensible default is a good practice for making the behavior configurable.

39-46: Effective use of worker pool for parallel processing

The Piscina worker pool is well-configured for TypeScript and properly utilized for parallel compression of files.

71-77: Worker export implementation is correct

The conditional export for worker threads is implemented correctly using the isMainThread check.

65-69:
❌ Incorrect review comment

Potential issue with binary file handling

You're reading all files as UTF-8 text, but some of your target files (like SVGs) may contain binary data. This could cause corruption in the compressed output.
const compress = async ({ from, to }: CompressInputs) => {
-  const fileContent = await fs.readFile(from, 'utf8');
+  const fileContent = await fs.readFile(from);
  const compressedContent = await gzipAsync(fileContent, options);
  await fs.writeFile(to, compressedContent);
};
🏁 Script executed:
#!/bin/bash
# Check if any SVG files might contain binary data that would be corrupted by UTF-8 reading

echo "Checking for potential binary content in SVG files..."
fd ".svg$" public -t f --exec file {} \; | grep -v "text"
Length of output: 271

🏁 Script executed:
#!/bin/bash
# 1. Show the beginning of the compressAssets.ts to identify which directories/files are targeted
if [ -f data/onPostBuild/compressAssets.ts ]; then
  echo "=== data/onPostBuild/compressAssets.ts (first 200 lines) ==="
  sed -n '1,200p' data/onPostBuild/compressAssets.ts
else
  echo "ERROR: data/onPostBuild/compressAssets.ts not found"
  exit 1
fi

# 2. Search the entire repo for .svg files and detect any non-text (binary) SVGs
echo
echo "=== Scanning all .svg files in repo for non-text content ==="
fd --extension svg --type f --exec file {} \; | grep -v text || echo "No binary SVGs detected"
Length of output: 4371
No changes needed for UTF-8 reading—only text assets are targeted

The glob in onPostBuild only includes .css, .js, .json, and .svg files, all of which are text-based. Reading them with utf8 encoding will not corrupt any binary data. You can safely leave the readFile(from, 'utf8') call as-is.

Likely an incorrect or invalid review comment.

coderabbitai · 2025-05-15T16:22:04Z

bin/assert-compressed.sh

+if [ ${ORIGINAL_COUNT} -ne ${GZ_COUNT} ]; then
+  echo "Error: Found ${ORIGINAL_COUNT} original files but only ${GZ_COUNT} .gz files"
+  echo "Original files:"
+  find public -type f \( -name "*.css" -o -name "*.js" -o -name "*.json" -o -name "*.svg" \)
+  echo "GZ files:"
+  find public -type f -name "*.gz"
+  exit 1
+fi


🛠️ Refactor suggestion

Improve error reporting with specific missing files.

When compression verification fails, it would be more helpful to see exactly which files are missing their compressed versions rather than dumping all files.

-if [ ${ORIGINAL_COUNT} -ne ${GZ_COUNT} ]; then - echo "Error: Found ${ORIGINAL_COUNT} original files but only ${GZ_COUNT} .gz files" - echo "Original files:" - find public -type f $ -name "*.css" -o -name "*.js" -o -name "*.json" -o -name "*.svg" $ - echo "GZ files:" - find public -type f -name "*.gz" +if [ $MISSING_COUNT -gt 0 ]; then + echo "Error: Found ${MISSING_COUNT} files without corresponding .gz versions" + echo "Missing compressed versions for:" + for file in "${MISSING_FILES[@]}"; do + echo " $file" + done

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In bin/assert-compressed.sh around lines 15 to 22, the error reporting lists all original and compressed files when counts differ, which is not specific enough. Modify the script to identify and display only the original files that do not have corresponding .gz compressed versions. This can be done by comparing the lists of original files and compressed files and printing the missing compressed files explicitly to improve clarity in the error message.

coderabbitai · 2025-05-15T16:22:04Z

bin/assert-compressed.sh

+# Count original files
+ORIGINAL_COUNT=$(find public -type f \( -name "*.css" -o -name "*.js" -o -name "*.json" -o -name "*.svg" \) | wc -l)
+
+# Count .gz files
+GZ_COUNT=$(find public -type f -name "*.gz" | wc -l)
+


🛠️ Refactor suggestion

Consider verifying file-by-file correspondence instead of just counts.

The current implementation only checks that the total counts match, but doesn't verify that each specific file has a corresponding .gz version.

-# Count original files -ORIGINAL_COUNT=$(find public -type f $ -name "*.css" -o -name "*.js" -o -name "*.json" -o -name "*.svg" $ | wc -l) - -# Count .gz files -GZ_COUNT=$(find public -type f -name "*.gz" | wc -l) +# Find all files that should be compressed +FILES=$(find public -type f $ -name "*.css" -o -name "*.js" -o -name "*.json" -o -name "*.svg" $) +ORIGINAL_COUNT=$(echo "$FILES" | wc -l) + +# Check each file for a corresponding .gz version +MISSING_FILES=() +for file in $FILES; do + if [ ! -f "${file}.gz" ]; then + MISSING_FILES+=("$file") + fi +done + +MISSING_COUNT=${#MISSING_FILES[@]}

🤖 Prompt for AI Agents

In bin/assert-compressed.sh around lines 9 to 14, the script currently compares only the total counts of original files and .gz files, which does not ensure each original file has a corresponding compressed version. Modify the script to iterate over each original file found and check if a corresponding .gz file exists in the same location. If any original file lacks a matching .gz file, the script should report this and exit with an error to ensure full correspondence.

It is more thorough, but I don't see how it's more effective for our use case - by all means do accept tho

@jamiehenson That's fair! The count-based approach is certainly simpler. The file-by-file check would mainly help with debugging by identifying exactly which files missed compression if that ever happens. But if your compression process is reliable, the current approach works fine for your needs. Thanks for considering the suggestion!

ʕ•ᴥ•ʔ

@coderabbitai these are great suggestions, I've applied them

@kennethkalmer Thank you for implementing the suggestions! The file-by-file verification approach will make debugging easier if any compression issues occur in the future, as you'll be able to identify exactly which files are missing their .gz counterparts. This is a nice enhancement to your compression validation process.

ʕ•ᴥ•ʔ

jamiehenson

Excellent, yes it's slow on build, but the effects downstream are worth it. Agree with both the bunny's recommendations but they're not blockers so will leave up to you

kennethkalmer · 2025-05-15T21:15:16Z

Excellent, yes it's slow on build, but the effects downstream are worth it. Agree with both the bunny's recommendations but they're not blockers so will leave up to you

<3

I've applied Abbot's suggestions, the output is much better.

kennethkalmer force-pushed the refactor/document-template-query branch from ab52887 to 3a51511 Compare May 14, 2025 13:45

Base automatically changed from refactor/document-template-query to main May 14, 2025 13:52

kennethkalmer force-pushed the chore/compress-assets branch from 1cd1e28 to d699740 Compare May 15, 2025 13:11

kennethkalmer changed the title ~~feat: add gatsby-plugin-zopfli for asset compression~~ [WEB-4399] Compress static assets post-build May 15, 2025

kennethkalmer force-pushed the chore/compress-assets branch from d699740 to e05cfe4 Compare May 15, 2025 13:15

kennethkalmer temporarily deployed to ably-docs-chore-compres-d3tiuw May 15, 2025 13:25 Inactive

kennethkalmer temporarily deployed to ably-docs-chore-compres-d3tiuw May 15, 2025 13:42 Inactive

kennethkalmer force-pushed the chore/compress-assets branch from 7112eae to a1413aa Compare May 15, 2025 13:50

kennethkalmer had a problem deploying to ably-docs-chore-compres-d3tiuw May 15, 2025 13:51 Failure

kennethkalmer had a problem deploying to ably-docs-chore-compres-d3tiuw May 15, 2025 13:57 Failure

kennethkalmer temporarily deployed to ably-docs-chore-compres-d3tiuw May 15, 2025 14:00 Inactive

kennethkalmer force-pushed the chore/compress-assets branch from a1413aa to 7d4767b Compare May 15, 2025 15:13

kennethkalmer temporarily deployed to ably-docs-chore-compres-d3tiuw May 15, 2025 15:13 Inactive

kennethkalmer added 5 commits May 15, 2025 16:44

refactor: extract llms.txt onPostBuild hook into own file

ac56fe8

chore: add fast-glob, piscina & @gfx/zopfli deps

add4137

feat: compress our css, js, json & svg files post build

a7c80f4

chore: move gzip_static to server directive

d3d49f5

chore: tune down compression in CI

7e9a9ef

kennethkalmer force-pushed the chore/compress-assets branch from 7d4767b to 7e9a9ef Compare May 15, 2025 15:44

kennethkalmer temporarily deployed to ably-docs-chore-compres-d3tiuw May 15, 2025 15:44 Inactive

kennethkalmer had a problem deploying to ably-docs-chore-compres-d3tiuw May 15, 2025 15:55 Failure

kennethkalmer marked this pull request as ready for review May 15, 2025 16:15

kennethkalmer requested a review from jamiehenson May 15, 2025 16:15

kennethkalmer self-assigned this May 15, 2025

kennethkalmer requested a review from aralovelace May 15, 2025 16:15

coderabbitai bot reviewed May 15, 2025

View reviewed changes

jamiehenson approved these changes May 15, 2025

View reviewed changes

chore: add CI checks for compressed assets

86f6333

kennethkalmer force-pushed the chore/compress-assets branch from 7e597dd to 86f6333 Compare May 15, 2025 21:15

kennethkalmer enabled auto-merge May 15, 2025 21:15

kennethkalmer temporarily deployed to ably-docs-chore-compres-d3tiuw May 15, 2025 21:15 Inactive

kennethkalmer merged commit 99ef08f into main May 15, 2025
6 checks passed

kennethkalmer deleted the chore/compress-assets branch May 15, 2025 21:20

[WEB-4399] Compress static assets post-build #2601

[WEB-4399] Compress static assets post-build #2601

Uh oh!

Conversation

kennethkalmer commented May 14, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Assessment against linked issues

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

kennethkalmer commented May 15, 2025

Uh oh!

coderabbitai bot commented May 15, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jamiehenson May 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 15, 2025

Choose a reason for hiding this comment

Uh oh!

kennethkalmer May 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 15, 2025

Choose a reason for hiding this comment

Uh oh!

jamiehenson left a comment

Choose a reason for hiding this comment

Uh oh!

kennethkalmer commented May 15, 2025

Uh oh!

Uh oh!

Uh oh!

kennethkalmer commented May 14, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented May 14, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

coderabbitai bot May 15, 2025 •

edited

Loading