Skip to content

Use piped streams to avoid loading cache entry bytes into memory #49

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
aSemy opened this issue Mar 16, 2024 · 2 comments
Open

Use piped streams to avoid loading cache entry bytes into memory #49

aSemy opened this issue Mar 16, 2024 · 2 comments

Comments

@aSemy
Copy link
Contributor

aSemy commented Mar 16, 2024

Currently the Build Cache implementations load the build cache entries into memory as a ByteArray

I believe this will negatively impact performance (although I admit I haven't done any testing, so I could be wrong!).

It can be avoided by piping the streams. For example:

    override fun store(key: BuildCacheKey, writer: BuildCacheEntryWriter) {
        // ...

        val incoming = PipedOutputStream()
        writer.writeTo(incoming)
        val contents = PipedInputStream(incoming)

        storageService.store(cacheKey, contents, writer.size) // must manually pass the size down
    }

I'd be happy to contribute a PR.

@liutikas
Copy link
Member

If you have some benchmarks showing this helps, i'm happy to take a PR.

@aSemy
Copy link
Contributor Author

aSemy commented Mar 21, 2024

I've done some experimenting (see https://github.com/aSemy/gcp-gradle-build-cache/tree/experiments/input-streams), and used JMH on the S3 bucket, testing three options:

  • Using ByteArray (the current version)
  • Using PipedInputStream/PipedInputStream (which requires an additional thread)
  • Using an Okio Buffer

I only measured performance (operators-per-second), not memory usage. I'd like to measure the memory usage, but kotlinx.benchmark doesn't support profiler arguments yet.

tl;dr: Buffer is slowest. Piped streams are on average faster, but the speed is inconsistent. ByteArray is slower than Piped streams, but more consistent.

Benchmark                         (mode)   Mode  Cnt   Score     Error  Units
AwsBenchmark.storeRandomData  byte-array  thrpt   10  50,267 ±  29,572  ops/s
AwsBenchmark.storeRandomData       piped  thrpt   10  64,742 ± 158,550  ops/s
AwsBenchmark.storeRandomData      buffer  thrpt   10  34,091 ±  19,698  ops/s
image

2024-03-21T13.01.18.356566 main.json

Based on this, I'd probably not move to Piped streams. However, it might be worth investigating concurrency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants