-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Torchcodec decoding #7616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Torchcodec decoding #7616
Conversation
…atter handles torchcodec objects. Fixed test scripts to work with new Audio backend
…_audio_feature_map_is_decoded test case. Implemented casting for VideoDecoder and AudioDecoder types
…ideo and Audio features. Fixed the the rest of the test files to be compatible with new Audio and Video features.
@lhoestq any updates on when this will be merged? Let me know if theres anything you need from my end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome ! I added a few comments :)
My main comment is about backward compatibility for audio, the rest looks good to me
num_ffmpeg_threads=self.num_ffmpeg_threads, | ||
device=self.device, | ||
seek_mode=self.seek_mode, | ||
) | ||
video._hf_encoded = {"path": path, "bytes": bytes_} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need _hf_encoded
in audio now that it's also a reader (useful to speed up re-encoding when saving the a dataset)
src/datasets/features/audio.py
Outdated
{'array': array([ 2.3443763e-05, 2.1729663e-04, 2.2145823e-04, ..., | ||
3.8356509e-05, -7.3497440e-06, -2.1754686e-05], dtype=float32), | ||
'path': '/root/.cache/huggingface/datasets/downloads/extracted/f14948e0e84be638dd7943ac36518a4cf3324e8b7aa331c5ab11541518e9368c/en-US~JOINT_ACCOUNT/602ba55abb1e6d0fbce92065.wav', | ||
'sampling_rate': 16000} | ||
<torchcodec.decoders._audio_decoder.AudioDecoder object at 0x11642b6a0> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for backward compatibility we'll need to subclass AudioDecoder to allow this:
audio_numpy_array = decorder["array"]
audio_sampling_rate = decorder["sampling_rate"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure I'm understanding correctly, are you saying to instead return a object of a class that extends torchcodec.decoders._audio_decoder.AudioDecoder and overwrites the getitem() method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, hopefully torchcodec allows subclassing.
This is pretty necessary for audio since most training frameworks or scripts use the ["array"]
syntax at the moment.
Btw I plan to release |
Co-authored-by: Quentin Lhoest <[email protected]>
@lhoestq just pushed the new changes. |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Great ! I took the liberty to move the AudioDecoder to its own file and make small edits in the docs and docstrings If it looks good to you I think we can merge :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright this is ready ! Congrats on implementing this :)
I might do another pass at documentation before doing the new datasets
release but this is already quite in good shape
Closes #7607
New signatures
Audio
Video
Notes
Audio features constructor takes in 1 new optional param stream_index which is passed to the AudioDecoder constructor to select the stream index of a file.
Audio feature can now take in torchcodec.decoders.AudioDecoder as input to encode_example()
Audio feature decode_example() returns torchcodec.decoders.AudioDecoder
Video feature constructor takes in 5 new optional params stream_index, dimension_order, num_ffmpeg_threads, device, seek_mode all of which are passed to VideoDecoder constructor
Video feature decode_example() returns torchcodec.decoders.VideoDecoder
Video feature can now take in torchcodec.decoders.VideoDecoder as input to encode_example()
All test cases have been updated to reflect these changes
All documentation has also been updated to reflect these changes.
Both VideoDecoder and AudioDecoder when formatted with (np_formatter, tf_formatter, etc) will ignore the type and return themselves. Formatting test cases were updated accordingly to reflect this. (Pretty simple to make this not the case if we want though)
Errors
This test case from
tests/packaged_modules/test_audiofolder.py
Fails now because AudioDecoder needs to access the files after the lines below are run, but there seems to be some context issues. The file the decoder is trying to read is closed before the decoder gets the chance to decode it.