Skip to content

[WEB-4399] Compress static assets post-build #2601

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 15, 2025
Merged

Conversation

kennethkalmer
Copy link
Member

@kennethkalmer kennethkalmer commented May 14, 2025

Description

To improve the experience for our users we should be serving up pre-compressed versions of CSS, JS, JSON & SVG files.

This change takes inspiration from gatsby-plugin-zopfli and is essentially a smaller, inlined version of it.

The best way to test it is to open the review app and then to look in the network inspector for savings like this:

Screenshot 2025-05-15 at 17 16 58

Checklist

Summary by CodeRabbit

  • New Features

    • Static assets are now automatically compressed after each build to improve loading performance.
    • A new validation step ensures all required files are properly compressed before deployment.
    • A new file, llms.txt, is generated after each build, listing key site pages for easier discovery.
  • Improvements

    • Enhanced server configuration to serve pre-compressed assets more efficiently.
    • Increased build resources for faster and more reliable builds.
  • Chores

    • Added new dependencies to support asset compression and validation.

Copy link

coderabbitai bot commented May 14, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This change introduces a post-build asset compression step for Gatsby sites, using a new script and worker pool to gzip static assets. It adds a verification script to ensure all assets are compressed, updates the build and test workflows in CircleCI, and modifies the Nginx configuration to serve pre-compressed files. Dependencies for compression and file globbing are added.

Changes

File(s) Change Summary
.circleci/config.yml Updated build job to use COMPRESS_MAX_THREADS, increased resource class, and added a compression verification step to test-nginx.
bin/assert-compressed.sh Added new script to assert that all relevant files in public/ have corresponding .gz files.
config/nginx.conf.erb Moved gzip_static on; to global http block and removed from location block; minor indentation fix.
data/onPostBuild/compressAssets.ts Added Gatsby onPostBuild hook for compressing static assets using worker threads and zopfli gzip.
data/onPostBuild/llmstxt.ts Added Gatsby onPostBuild hook to generate llms.txt file with site pages’ metadata.
data/onPostBuild/index.ts Refactored to sequentially invoke llmstxt and compressAssets hooks instead of inline logic.
package.json Added dependencies: @gfx/zopfli, fast-glob, and piscina for compression and file operations.

Sequence Diagram(s)

sequenceDiagram
    participant Gatsby as Gatsby Build
    participant Compress as compressAssets (onPostBuild)
    participant Worker as Piscina Worker Threads
    participant Nginx as Nginx Server

    Gatsby->>Compress: onPostBuild()
    Compress->>Worker: Compress static assets (.css, .js, .json, .svg)
    Worker-->>Compress: Write .gz files
    Compress-->>Gatsby: Compression complete

    Nginx->>Nginx: Serve request
    Nginx->>Nginx: gzip_static on (serve .gz if present)
Loading

Assessment against linked issues

Objective Addressed Explanation
Compress static assets in Gatsby build and configure Nginx to serve pre-compressed files (WEB-4399)

Suggested reviewers

  • m-hulbert
  • jamiehenson

Poem

A hop, a skip, a gzip run,
Compressing assets, oh what fun!
With worker threads and zopfli’s might,
Our files are smaller, servers light.
Nginx now serves with glee—
Fast and snappy, as it should be!
🐇✨


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@kennethkalmer kennethkalmer force-pushed the refactor/document-template-query branch from ab52887 to 3a51511 Compare May 14, 2025 13:45
Base automatically changed from refactor/document-template-query to main May 14, 2025 13:52
@kennethkalmer kennethkalmer force-pushed the chore/compress-assets branch from 1cd1e28 to d699740 Compare May 15, 2025 13:11
@kennethkalmer kennethkalmer changed the title feat: add gatsby-plugin-zopfli for asset compression [WEB-4399] Compress static assets post-build May 15, 2025
@kennethkalmer kennethkalmer force-pushed the chore/compress-assets branch from d699740 to e05cfe4 Compare May 15, 2025 13:15
@kennethkalmer kennethkalmer temporarily deployed to ably-docs-chore-compres-d3tiuw May 15, 2025 13:25 Inactive
@kennethkalmer kennethkalmer temporarily deployed to ably-docs-chore-compres-d3tiuw May 15, 2025 13:42 Inactive
@kennethkalmer kennethkalmer force-pushed the chore/compress-assets branch from 7112eae to a1413aa Compare May 15, 2025 13:50
@kennethkalmer kennethkalmer had a problem deploying to ably-docs-chore-compres-d3tiuw May 15, 2025 13:51 Failure
@kennethkalmer kennethkalmer had a problem deploying to ably-docs-chore-compres-d3tiuw May 15, 2025 13:57 Failure
@kennethkalmer kennethkalmer temporarily deployed to ably-docs-chore-compres-d3tiuw May 15, 2025 14:00 Inactive
@kennethkalmer kennethkalmer force-pushed the chore/compress-assets branch from a1413aa to 7d4767b Compare May 15, 2025 15:13
@kennethkalmer kennethkalmer temporarily deployed to ably-docs-chore-compres-d3tiuw May 15, 2025 15:13 Inactive
@kennethkalmer kennethkalmer force-pushed the chore/compress-assets branch from 7d4767b to 7e9a9ef Compare May 15, 2025 15:44
@kennethkalmer kennethkalmer temporarily deployed to ably-docs-chore-compres-d3tiuw May 15, 2025 15:44 Inactive
@kennethkalmer kennethkalmer had a problem deploying to ably-docs-chore-compres-d3tiuw May 15, 2025 15:55 Failure
@kennethkalmer kennethkalmer marked this pull request as ready for review May 15, 2025 16:15
@kennethkalmer kennethkalmer requested a review from jamiehenson May 15, 2025 16:15
@kennethkalmer kennethkalmer self-assigned this May 15, 2025
@kennethkalmer kennethkalmer requested a review from aralovelace May 15, 2025 16:15
@kennethkalmer
Copy link
Member Author

@coderabbitai full review

Copy link

coderabbitai bot commented May 15, 2025

✅ Actions performed

Full review triggered.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
.circleci/config.yml (1)

54-55: Consider dynamic thread allocation based on environment.

While setting a fixed thread count works well for CI, you might want to make this configurable based on the environment for local development or production deployments.

 environment:
-  COMPRESS_MAX_THREADS: 8
+  COMPRESS_MAX_THREADS: ${COMPRESS_MAX_THREADS:-8}
data/onPostBuild/llmstxt.ts (2)

31-35: Consider adding URL validation

The URL construction looks good, but consider adding validation to ensure the constructed URL is valid, especially since you're handling URL path prefixes.

const prefixPath = ({ url, siteUrl, pathPrefix = `` }: { url: string; siteUrl: string; pathPrefix?: string }) => {
-  return new URL(pathPrefix + withoutTrailingSlash(url), siteUrl).toString();
+  try {
+    return new URL(pathPrefix + withoutTrailingSlash(url), siteUrl).toString();
+  } catch (error) {
+    throw new Error(`Invalid URL: Could not construct URL from ${siteUrl}, ${pathPrefix}, and ${url}`);
+  }
};

102-108: Consider using async file operations

Since you're already in an async function, consider using fs.promises.writeFile instead of fs.writeFileSync for consistency and to avoid blocking the event loop.

  const llmsTxtPath = path.join(process.cwd(), 'public', 'llms.txt');
  try {
-    fs.writeFileSync(llmsTxtPath, serializedPages.join('\n'));
+    await fs.promises.writeFile(llmsTxtPath, serializedPages.join('\n'));
    reporter.info(`${REPORTER_PREFIX} Successfully wrote llms.txt with ${serializedPages.length} pages`);
  } catch (err) {
    reporter.panic(`${REPORTER_PREFIX} Error writing llms.txt file`, err as Error);
  }
data/onPostBuild/compressAssets.ts (1)

56-58: Consider making compression options configurable

The number of iterations is hardcoded to 15. Consider making this configurable through an environment variable, similar to how you handle thread count.

const options = {
-  numiterations: 15,
+  numiterations: parseInt(process.env.COMPRESS_ITERATIONS || '15', 10),
};
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 16de4fa and 7e597dd.

⛔ Files ignored due to path filters (1)
  • yarn.lock is excluded by !**/yarn.lock, !**/*.lock
📒 Files selected for processing (7)
  • .circleci/config.yml (2 hunks)
  • bin/assert-compressed.sh (1 hunks)
  • config/nginx.conf.erb (1 hunks)
  • data/onPostBuild/compressAssets.ts (1 hunks)
  • data/onPostBuild/index.ts (1 hunks)
  • data/onPostBuild/llmstxt.ts (1 hunks)
  • package.json (3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
data/onPostBuild/llmstxt.ts (1)
data/onPostBuild/index.ts (1)
  • onPostBuild (5-9)
data/onPostBuild/index.ts (2)
data/onPostBuild/compressAssets.ts (1)
  • onPostBuild (24-49)
data/onPostBuild/llmstxt.ts (1)
  • onPostBuild (42-109)
🔇 Additional comments (20)
package.json (2)

46-46: LGTM: Good choice of compression library.

The @gfx/zopfli package is an excellent choice for this use case, providing better compression ratios than standard gzip while maintaining full gzip compatibility.


55-55: Well-structured implementation for parallelized compression.

The combination of fast-glob for efficient file discovery and piscina for worker thread management will enable efficient parallel compression of static assets.

Also applies to: 81-81

config/nginx.conf.erb (2)

86-87: Excellent optimization for serving pre-compressed assets.

Moving gzip_static on; to the http level ensures all relevant static assets will be served in their pre-compressed form when available, which is more efficient than on-the-fly compression.


91-91: LGTM: Fixed indentation.

The indentation correction improves code readability while maintaining the same functionality.

.circleci/config.yml (2)

58-58: LGTM: Resource upgrade supports parallel compression.

Upgrading from large to xlarge resource class provides more CPU cores to effectively utilize the parallel compression threads.


107-109: Good validation step for compression process.

This verification step ensures all static assets are properly compressed before proceeding with the Nginx tests, providing an early failure if compression wasn't successful.

bin/assert-compressed.sh (2)

1-8: LGTM: Clear documentation and usage instructions.

The script header provides clear documentation about the purpose and usage of this utility.


24-24: LGTM: Clear success message.

The success message clearly indicates that all files have been properly compressed.

data/onPostBuild/index.ts (2)

2-3: Clean module imports for the new functionality

The imports are clear and appropriately named, following the pattern of importing the specific onPostBuild functions from their respective modules.


5-8: Well-structured sequential execution of post-build tasks

This implementation effectively chains the post-build steps in a clear, sequential manner. The comment accurately explains what's happening, and the code implements a clean orchestration of different post-build processes.

data/onPostBuild/llmstxt.ts (4)

5-9: Clear documentation on module purpose

The file header clearly explains the purpose of this script, which helps with maintainability.


37-40: The escapeMarkdown function is well implemented

Good attention to detail in escaping all relevant Markdown special characters.


42-72: Good error handling for GraphQL query

The implementation includes thorough error handling and validation for the GraphQL query results, with appropriate error messages.


87-100: Robust page serialization with error handling

The implementation handles each page serialization individually with proper error handling, which prevents a single page error from breaking the entire process.

data/onPostBuild/compressAssets.ts (6)

9-22: Great documentation explaining code purpose and structure

The comment block provides excellent context about the inspiration, functionality, and structure of the code. This kind of documentation is extremely helpful for maintenance and onboarding.


24-34: File identification and mapping looks good

The code effectively identifies relevant static assets using fast-glob and properly maps them to their future gzipped versions.


35-38: Good use of environment variables for configuration

Using COMPRESS_MAX_THREADS as an environment variable with a sensible default is a good practice for making the behavior configurable.


39-46: Effective use of worker pool for parallel processing

The Piscina worker pool is well-configured for TypeScript and properly utilized for parallel compression of files.


71-77: Worker export implementation is correct

The conditional export for worker threads is implemented correctly using the isMainThread check.


65-69:

❌ Incorrect review comment

Potential issue with binary file handling

You're reading all files as UTF-8 text, but some of your target files (like SVGs) may contain binary data. This could cause corruption in the compressed output.

const compress = async ({ from, to }: CompressInputs) => {
-  const fileContent = await fs.readFile(from, 'utf8');
+  const fileContent = await fs.readFile(from);
  const compressedContent = await gzipAsync(fileContent, options);
  await fs.writeFile(to, compressedContent);
};

🏁 Script executed:

#!/bin/bash
# Check if any SVG files might contain binary data that would be corrupted by UTF-8 reading

echo "Checking for potential binary content in SVG files..."
fd ".svg$" public -t f --exec file {} \; | grep -v "text"

Length of output: 271


🏁 Script executed:

#!/bin/bash
# 1. Show the beginning of the compressAssets.ts to identify which directories/files are targeted
if [ -f data/onPostBuild/compressAssets.ts ]; then
  echo "=== data/onPostBuild/compressAssets.ts (first 200 lines) ==="
  sed -n '1,200p' data/onPostBuild/compressAssets.ts
else
  echo "ERROR: data/onPostBuild/compressAssets.ts not found"
  exit 1
fi

# 2. Search the entire repo for .svg files and detect any non-text (binary) SVGs
echo
echo "=== Scanning all .svg files in repo for non-text content ==="
fd --extension svg --type f --exec file {} \; | grep -v text || echo "No binary SVGs detected"

Length of output: 4371


No changes needed for UTF-8 reading—only text assets are targeted

The glob in onPostBuild only includes .css, .js, .json, and .svg files, all of which are text-based. Reading them with utf8 encoding will not corrupt any binary data. You can safely leave the readFile(from, 'utf8') call as-is.

Likely an incorrect or invalid review comment.

Comment on lines 15 to 30
if [ ${ORIGINAL_COUNT} -ne ${GZ_COUNT} ]; then
echo "Error: Found ${ORIGINAL_COUNT} original files but only ${GZ_COUNT} .gz files"
echo "Original files:"
find public -type f \( -name "*.css" -o -name "*.js" -o -name "*.json" -o -name "*.svg" \)
echo "GZ files:"
find public -type f -name "*.gz"
exit 1
fi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve error reporting with specific missing files.

When compression verification fails, it would be more helpful to see exactly which files are missing their compressed versions rather than dumping all files.

-if [ ${ORIGINAL_COUNT} -ne ${GZ_COUNT} ]; then
-  echo "Error: Found ${ORIGINAL_COUNT} original files but only ${GZ_COUNT} .gz files"
-  echo "Original files:"
-  find public -type f \( -name "*.css" -o -name "*.js" -o -name "*.json" -o -name "*.svg" \)
-  echo "GZ files:"
-  find public -type f -name "*.gz"
+if [ $MISSING_COUNT -gt 0 ]; then
+  echo "Error: Found ${MISSING_COUNT} files without corresponding .gz versions"
+  echo "Missing compressed versions for:"
+  for file in "${MISSING_FILES[@]}"; do
+    echo "  $file"
+  done

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In bin/assert-compressed.sh around lines 15 to 22, the error reporting lists all
original and compressed files when counts differ, which is not specific enough.
Modify the script to identify and display only the original files that do not
have corresponding .gz compressed versions. This can be done by comparing the
lists of original files and compressed files and printing the missing compressed
files explicitly to improve clarity in the error message.

Comment on lines 9 to 20
# Count original files
ORIGINAL_COUNT=$(find public -type f \( -name "*.css" -o -name "*.js" -o -name "*.json" -o -name "*.svg" \) | wc -l)

# Count .gz files
GZ_COUNT=$(find public -type f -name "*.gz" | wc -l)

Copy link

@coderabbitai coderabbitai bot May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider verifying file-by-file correspondence instead of just counts.

The current implementation only checks that the total counts match, but doesn't verify that each specific file has a corresponding .gz version.

-# Count original files
-ORIGINAL_COUNT=$(find public -type f \( -name "*.css" -o -name "*.js" -o -name "*.json" -o -name "*.svg" \) | wc -l)
-
-# Count .gz files
-GZ_COUNT=$(find public -type f -name "*.gz" | wc -l)
+# Find all files that should be compressed
+FILES=$(find public -type f \( -name "*.css" -o -name "*.js" -o -name "*.json" -o -name "*.svg" \))
+ORIGINAL_COUNT=$(echo "$FILES" | wc -l)
+
+# Check each file for a corresponding .gz version
+MISSING_FILES=()
+for file in $FILES; do
+  if [ ! -f "${file}.gz" ]; then
+    MISSING_FILES+=("$file")
+  fi
+done
+
+MISSING_COUNT=${#MISSING_FILES[@]}
🤖 Prompt for AI Agents
In bin/assert-compressed.sh around lines 9 to 14, the script currently compares
only the total counts of original files and .gz files, which does not ensure
each original file has a corresponding compressed version. Modify the script to
iterate over each original file found and check if a corresponding .gz file
exists in the same location. If any original file lacks a matching .gz file, the
script should report this and exit with an error to ensure full correspondence.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is more thorough, but I don't see how it's more effective for our use case - by all means do accept tho

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamiehenson That's fair! The count-based approach is certainly simpler. The file-by-file check would mainly help with debugging by identifying exactly which files missed compression if that ever happens. But if your compression process is reliable, the current approach works fine for your needs. Thanks for considering the suggestion!

ʕ•ᴥ•ʔ

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai these are great suggestions, I've applied them

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kennethkalmer Thank you for implementing the suggestions! The file-by-file verification approach will make debugging easier if any compression issues occur in the future, as you'll be able to identify exactly which files are missing their .gz counterparts. This is a nice enhancement to your compression validation process.

ʕ•ᴥ•ʔ

Copy link
Member

@jamiehenson jamiehenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, yes it's slow on build, but the effects downstream are worth it. Agree with both the bunny's recommendations but they're not blockers so will leave up to you

@kennethkalmer
Copy link
Member Author

Excellent, yes it's slow on build, but the effects downstream are worth it. Agree with both the bunny's recommendations but they're not blockers so will leave up to you

<3

I've applied Abbot's suggestions, the output is much better.

@kennethkalmer kennethkalmer force-pushed the chore/compress-assets branch from 7e597dd to 86f6333 Compare May 15, 2025 21:15
@kennethkalmer kennethkalmer enabled auto-merge May 15, 2025 21:15
@kennethkalmer kennethkalmer temporarily deployed to ably-docs-chore-compres-d3tiuw May 15, 2025 21:15 Inactive
@kennethkalmer kennethkalmer merged commit 99ef08f into main May 15, 2025
6 checks passed
@kennethkalmer kennethkalmer deleted the chore/compress-assets branch May 15, 2025 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants