Skip to content

Torchcodec decoding #7616

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Jun 19, 2025
Merged

Torchcodec decoding #7616

merged 29 commits into from
Jun 19, 2025

Conversation

TyTodd
Copy link
Contributor

@TyTodd TyTodd commented Jun 13, 2025

Closes #7607

New signatures

Audio

Audio(sampling_rate: Optional[int] = None, mono: bool = True, decode: bool = True, stream_index: Optional[int] = None)

Audio.encode_example(self, value: Union[str, bytes, bytearray, dict, "AudioDecoder"]) -> dict

Audio.decode_example(self, value: dict, token_per_repo_id: Optional[dict[str, Union[str, bool, None]]] = None) -> "AudioDecoder":

Video

Video(decode: bool = True, stream_index: Optional[int] = None, dimension_order: Literal['NCHW', 'NHWC'] = 'NCHW', num_ffmpeg_threads: int = 1, device: Optional[Union[str, "torch.device"]] = 'cpu', seek_mode: Literal['exact', 'approximate'] = 'exact')

Video.encode_example(self, value: Union[str, bytes, bytearray, Example, np.ndarray, "VideoDecoder"]) -> Example:

Video.decode_example(self, value: Union[str, Example], token_per_repo_id: Optional[dict[str, Union[bool, str]]] = None, ) -> "VideoDecoder":

Notes

Audio features constructor takes in 1 new optional param stream_index which is passed to the AudioDecoder constructor to select the stream index of a file.
Audio feature can now take in torchcodec.decoders.AudioDecoder as input to encode_example()
Audio feature decode_example() returns torchcodec.decoders.AudioDecoder

Video feature constructor takes in 5 new optional params stream_index, dimension_order, num_ffmpeg_threads, device, seek_mode all of which are passed to VideoDecoder constructor
Video feature decode_example() returns torchcodec.decoders.VideoDecoder
Video feature can now take in torchcodec.decoders.VideoDecoder as input to encode_example()

All test cases have been updated to reflect these changes
All documentation has also been updated to reflect these changes.

Both VideoDecoder and AudioDecoder when formatted with (np_formatter, tf_formatter, etc) will ignore the type and return themselves. Formatting test cases were updated accordingly to reflect this. (Pretty simple to make this not the case if we want though)

Errors

This test case from tests/packaged_modules/test_audiofolder.py

@require_librosa
@require_sndfile
@pytest.mark.parametrize("streaming", [False, True])
def test_data_files_with_metadata_and_archives(streaming, cache_dir, data_files_with_zip_archives):
    audiofolder = AudioFolder(data_files=data_files_with_zip_archives, cache_dir=cache_dir)
    audiofolder.download_and_prepare()
    datasets = audiofolder.as_streaming_dataset() if streaming else audiofolder.as_dataset()
    for split, data_files in data_files_with_zip_archives.items():
        num_of_archives = len(data_files)  # the metadata file is inside the archive
        expected_num_of_audios = 2 * num_of_archives
        assert split in datasets
        dataset = list(datasets[split])
        assert len(dataset) == expected_num_of_audios
        # make sure each sample has its own audio (all arrays are different) and metadata
        assert (
            sum(np.array_equal(dataset[0]["audio"].get_all_samples().data.numpy(), example["audio"].get_all_samples().data.numpy()) for example in dataset[1:])
            == 0
        )
        assert len({example["text"] for example in dataset}) == expected_num_of_audios
        assert all(example["text"] is not None for example in dataset)

Fails now because AudioDecoder needs to access the files after the lines below are run, but there seems to be some context issues. The file the decoder is trying to read is closed before the decoder gets the chance to decode it.

audiofolder.download_and_prepare()
datasets = audiofolder.as_streaming_dataset() if streaming else audiofolder.as_dataset()

TyTodd added 5 commits June 12, 2025 02:33
…atter handles torchcodec objects. Fixed test scripts to work with new Audio backend
…_audio_feature_map_is_decoded test case. Implemented casting for VideoDecoder and AudioDecoder types
…ideo and Audio features. Fixed the the rest of the test files to be compatible with new Audio and Video features.
@TyTodd
Copy link
Contributor Author

TyTodd commented Jun 17, 2025

@lhoestq any updates on when this will be merged? Let me know if theres anything you need from my end.

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome ! I added a few comments :)

My main comment is about backward compatibility for audio, the rest looks good to me

num_ffmpeg_threads=self.num_ffmpeg_threads,
device=self.device,
seek_mode=self.seek_mode,
)
video._hf_encoded = {"path": path, "bytes": bytes_}
Copy link
Member

@lhoestq lhoestq Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need _hf_encoded in audio now that it's also a reader (useful to speed up re-encoding when saving the a dataset)

Comment on lines 58 to 62
{'array': array([ 2.3443763e-05, 2.1729663e-04, 2.2145823e-04, ...,
3.8356509e-05, -7.3497440e-06, -2.1754686e-05], dtype=float32),
'path': '/root/.cache/huggingface/datasets/downloads/extracted/f14948e0e84be638dd7943ac36518a4cf3324e8b7aa331c5ab11541518e9368c/en-US~JOINT_ACCOUNT/602ba55abb1e6d0fbce92065.wav',
'sampling_rate': 16000}
<torchcodec.decoders._audio_decoder.AudioDecoder object at 0x11642b6a0>
Copy link
Member

@lhoestq lhoestq Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for backward compatibility we'll need to subclass AudioDecoder to allow this:

audio_numpy_array = decorder["array"]
audio_sampling_rate = decorder["sampling_rate"]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I'm understanding correctly, are you saying to instead return a object of a class that extends torchcodec.decoders._audio_decoder.AudioDecoder and overwrites the getitem() method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, hopefully torchcodec allows subclassing.
This is pretty necessary for audio since most training frameworks or scripts use the ["array"] syntax at the moment.

@lhoestq
Copy link
Member

lhoestq commented Jun 17, 2025

Btw I plan to release datasets 4.0 after your PR, this will be a major milestone :)

@TyTodd
Copy link
Contributor Author

TyTodd commented Jun 17, 2025

@lhoestq just pushed the new changes.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@lhoestq
Copy link
Member

lhoestq commented Jun 18, 2025

Great ! I took the liberty to move the AudioDecoder to its own file and make small edits in the docs and docstrings

If it looks good to you I think we can merge :)

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright this is ready ! Congrats on implementing this :)

I might do another pass at documentation before doing the new datasets release but this is already quite in good shape

@lhoestq lhoestq merged commit 161f99d into huggingface:main Jun 19, 2025
10 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Video and audio decoding with torchcodec
3 participants