Skip to content

feat: Initial foyer layer implementation #5906

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

jorgehermo9
Copy link
Contributor

@jorgehermo9 jorgehermo9 commented Mar 28, 2025

Closes #5678

I open this as a draft to get some feedback about what I'm doing right now. There are missing tests and documentation, but I would like to get some feedback before working more on this

My main concern is, what would be the scope of this cache layer? Should we only cache read calls? Should we also cache list or stat calls? If so, I could implement that also in this PR, but I want to know if it makes sense first. Right now, only async read calls are cached

I will fix CI, add tests and documentation once this initial feedback!

Thanks a lot~

@github-actions github-actions bot added the releases-note/feat The PR implements a new feature or has a title that begins with "feat" label Mar 28, 2025
@meteorgan
Copy link
Contributor

Hi, I've got a question: since it's a cache, when do we clear out expired data ? I couldn't spot the code handling that.

@jorgehermo9
Copy link
Contributor Author

jorgehermo9 commented Mar 30, 2025

@meteorgan I think there is no concept of TTL in foyer. You specify a max size in bytes of the cache, and it will be filled until the cache is full. If a new key is going yo be inserted with the cache being full, another entry is evicted (see https://foyer.rs/docs/tutorial/in-memory-cache#22-count-entries-by-weight)

@meteorgan
Copy link
Contributor

@meteorgan I think there is no concept of TTL in foyer. You specify a max size in bytes of the cache, and it will be filled until the cache is full. If a new key is going yo be inserted with the cache being full, another entry is evicted (see https://foyer.rs/docs/tutorial/in-memory-cache#22-count-entries-by-weight)

What I mean is, if a key gets deleted or updated, we should refresh the cache. Otherwise, we'll end up with stale data.

Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jorgehermo9 for the work! I believe we are very close to have a working demo.

where
Self: Sized,
{
let mut buf = Vec::new();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @MrCroxx, this can be much better if we can know the size before decode.

@jorgehermo9
Copy link
Contributor Author

jorgehermo9 commented Apr 15, 2025

Thanks for the review @Xuanwo, I was still working on the previous reviews and didn't have all things done, thats why I didnt mentioned you yet. I will work a lot more on this today though :)

Didn't forgot/ignore the previous comments, I have all of them in mind, just had been working slowly on them..

Comment on lines 123 to 125
pub use self::foyer::CacheKey;
#[cfg(feature = "layers-foyer")]
pub use self::foyer::CacheValue;
Copy link
Contributor Author

@jorgehermo9 jorgehermo9 Apr 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to expose both CacheKey and CacheValue because they are part of public API (generic params of the HybridCache we receive in FoyerLayer::new)

Should we expose those two this way, or do pub mod foyer instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Negative. I don't want to expose anything other than FoyerLayer. The HybridCache should be HybridCache<String, Buffer>, which means we don't need any additional types. It's the user's responsibility to use compatible versions of Foyer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could do that for the CacheKey, but the cache value can't be Buffer because of what you suggested here
image

As Code should not be implemented for raw Buffer, the cache value should be the CacheValue wrapper. We can't have Buffer as the cache value because it does not implement Code (CacheValue does).

@jorgehermo9
Copy link
Contributor Author

jorgehermo9 commented Apr 15, 2025

Hi @Xuanwo, I addressed all the comments I could, but left some doubts on opened threads, could you revisit that?

#[derive(Debug, Clone)]
pub struct CacheValue(Buffer);

impl Code for CacheValue {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder @MrCroxx how should we take care of CacheKey and CacheValue evolution throught time. Should we take care of breaking changes such including a new mandatory field in the CacheKey struct?

For example, if we add a new mandatory field to CacheKey, previous version's keys in disk cache will fail to deserialize? How should we do things here? We must always do back-compatible changes to both CacheKey and CacheValue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no deserialize for CacheKey. The cache just try to read the key we input. And CacheValue itself is a plain Buffer which contains the raw content.

I think we don't need to concern about the back-compatible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think deserialization is done for the cache key.

Good point!

cc @MrCroxx, what happens if the cache key changes? Will Foyer silently ignore it and remove the stale cache value, or will we encounter an error?

Self: Sized,
{
let mut buf = Vec::new();
reader.read_to_end(&mut buf).map_err(CodeError::Io)?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it okay to read all contents from reader at once? I don't know how to avoid this copy.
Is there any way to wrap the reader we receive in the parameters and return it instead of copying to an intermediate buffer? Or to at least lazily read it on-demand and not beforehand

I'm afraid of #5906 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it okay to read all contents from reader at once? I don't know how to avoid this copy.

That’s why we need to limit the cache’s read size. The current cache layer doesn’t handle chunking for users; it’s up to users to ensure the read size is optimal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will handle this situation with this #5906 (comment) right?

@jorgehermo9
Copy link
Contributor Author

jorgehermo9 commented Apr 16, 2025

Hi @Xuanwo, I think I addressed everything (left some comments in a few threads). I plan to address documentation & tests once we are confident about how the final implementation looks like

Thank you very much for your reviews, I found them very learnable.

use super::*;

#[test]
fn test_build_cache_key_with_default_args() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be better to just have one parametrized test with all these cases instead of 1 test for each case. What is the current approach in the codebase?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
releases-note/feat The PR implements a new feature or has a title that begins with "feat"
Projects
None yet
Development

Successfully merging this pull request may close these issues.

new feature: Add cache layer for opendal
5 participants