Skip to content

Support cache exclusion based on file name pattern #2704

Open
@siddharthab

Description

@siddharthab

Feature Request
The current logic to cache a file on disk assumes that if the user starts reading
the file from 0 offset then the entire file is supposed to be read and so should be
preemptively cached. However, this logic assumes the absence of file headers
at the beginning of the file, and/or the absence of magic number checks.

For example, in Bioinformatics, there are BAM files that can be 100+ GB in file size.
They are typically meant to be stored remotely and accessed only through random
reads. Random access is usually enabled through use of a separate index file.
However, these files also have metadata stored in the file header that all clients will
want to read first. An attempt to read the metadata from the file header will make
gcsfuse assume that the entire file will be read and gcsfuse will begin caching the
entire file in on-disk cache. This can very quickly deplete the available cache capacity.
So the user might want to exclude only these special files while still caching all other files.

Proposed solution
The configuration can include options to exclude files from on-disk cache if their names
follow certain patterns. An example implementation is provided at #2043.

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature requestFeature request: request to add new features or functionalityp2P2

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions