Skip to content

feat(bindings/python): Enhance Reader and Writer #6086

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

chitralverma
Copy link

@chitralverma chitralverma commented Apr 24, 2025

Which issue does this PR close?

Closes #5943

What changes are included in this PR?

  • Support all read and write options exposed by reader_with and writer_with on async operator open(...)
  • Add missing capabilities to py bindings
  • Implement AsyncFile::write_from

Are there any user-facing changes?

No breaking changes, but users can use the new features.

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. releases-note/feat The PR implements a new feature or has a title that begins with "feat" labels Apr 24, 2025
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Apr 28, 2025
@chitralverma chitralverma changed the title WIP: feat(bindings/python): Enhance Reader and Writer feat(bindings/python): Enhance Reader and Writer Apr 28, 2025
@chitralverma
Copy link
Author

chitralverma commented Apr 28, 2025

@Xuanwo @messense @Zheaoli please have a look whenever you get a chance

the 2 failing tests, seem to show a different service specific bug. the if_not_exist capability is there but the behaviour is not correct.

@chitralverma
Copy link
Author

@Xuanwo @messense @Zheaoli any suggestions you have for this PR?
I think addition of reader/ writer options to AsyncOperator and the lazy streaming copy of data from 1 file to another can be quite useful features.

@chitralverma
Copy link
Author

@messense @Xuanwo added some benchmarks and updated the scope of this PR.

See Comment

Please let me know if you have some review comments

@@ -472,6 +475,20 @@ class AsyncFile:
Args:
bs (bytes): The content to write.
"""
async def write_from(self, other: AsyncFile) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking if it's a good idea for us to provide something like shutil.copyfileobj. Is this more friendly for python users?

cc @messense and @erickguan for comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Operator works with shutil.copyfileobj but not AsyncOperator. Perhaps something like this as a utility function:

async def async_copyfileobj(reader, writer, bufsize: int = 64 * 1024):
    while True:
        chunk = await reader.read(bufsize)
        if not chunk:
            break
        await writer.write(chunk)

It would be great if python offers an async version of shutil.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rust based write_from seems to have an improvement of 18~28% copy throughput compared to the naive python loop based on the benchmark shared here: #5943 (comment)

is the suggestion here to provide a function with similar function signature as shutil.copyfileobj but copying is still done on the rust side?

but on the rust side, the chunk size itself is already an option for both the reader and writer,
not sure if there is a need to ask for bufsize again.. (I assume the bufsize would be most optimal if set based on the chunk size options set on the reader and writer)


#[pyclass(module = "opendal")]
#[derive(FromPyObject, Default)]
pub struct ReaderOptions {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that we don't change the class name.

range_end = randint(range_start, len(content) - 1)

async with await async_operator.open(
filename, "rb", range_start=range_start, range_end=range_end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"range_start" and "range_end" don't seem ideal to me, since Python users may not be familiar with the term "range" in this context. How about using "offset" and "size" instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
releases-note/feat The PR implements a new feature or has a title that begins with "feat" size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

new feature: Streaming Reader and Writer interfaces in python binding
4 participants