Torchcodec decoding #7616

TyTodd · 2025-06-13T19:06:07Z

New signatures

Audio

Audio(sampling_rate: Optional[int] = None, mono: bool = True, decode: bool = True, stream_index: Optional[int] = None)

Audio.encode_example(self, value: Union[str, bytes, bytearray, dict, "AudioDecoder"]) -> dict

Audio.decode_example(self, value: dict, token_per_repo_id: Optional[dict[str, Union[str, bool, None]]] = None) -> "AudioDecoder":

Video

Video(decode: bool = True, stream_index: Optional[int] = None, dimension_order: Literal['NCHW', 'NHWC'] = 'NCHW', num_ffmpeg_threads: int = 1, device: Optional[Union[str, "torch.device"]] = 'cpu', seek_mode: Literal['exact', 'approximate'] = 'exact')

Video.encode_example(self, value: Union[str, bytes, bytearray, Example, np.ndarray, "VideoDecoder"]) -> Example:

Video.decode_example(self, value: Union[str, Example], token_per_repo_id: Optional[dict[str, Union[bool, str]]] = None, ) -> "VideoDecoder":

Notes

Audio features constructor takes in 1 new optional param stream_index which is passed to the AudioDecoder constructor to select the stream index of a file.
Audio feature can now take in torchcodec.decoders.AudioDecoder as input to encode_example()
Audio feature decode_example() returns torchcodec.decoders.AudioDecoder

Video feature constructor takes in 5 new optional params stream_index, dimension_order, num_ffmpeg_threads, device, seek_mode all of which are passed to VideoDecoder constructor
Video feature decode_example() returns torchcodec.decoders.VideoDecoder
Video feature can now take in torchcodec.decoders.VideoDecoder as input to encode_example()

All test cases have been updated to reflect these changes
All documentation has also been updated to reflect these changes.

Both VideoDecoder and AudioDecoder when formatted with (np_formatter, tf_formatter, etc) will ignore the type and return themselves. Formatting test cases were updated accordingly to reflect this. (Pretty simple to make this not the case if we want though)

Errors

This test case from tests/packaged_modules/test_audiofolder.py

@require_librosa
@require_sndfile
@pytest.mark.parametrize("streaming", [False, True])
def test_data_files_with_metadata_and_archives(streaming, cache_dir, data_files_with_zip_archives):
    audiofolder = AudioFolder(data_files=data_files_with_zip_archives, cache_dir=cache_dir)
    audiofolder.download_and_prepare()
    datasets = audiofolder.as_streaming_dataset() if streaming else audiofolder.as_dataset()
    for split, data_files in data_files_with_zip_archives.items():
        num_of_archives = len(data_files)  # the metadata file is inside the archive
        expected_num_of_audios = 2 * num_of_archives
        assert split in datasets
        dataset = list(datasets[split])
        assert len(dataset) == expected_num_of_audios
        # make sure each sample has its own audio (all arrays are different) and metadata
        assert (
            sum(np.array_equal(dataset[0]["audio"].get_all_samples().data.numpy(), example["audio"].get_all_samples().data.numpy()) for example in dataset[1:])
            == 0
        )
        assert len({example["text"] for example in dataset}) == expected_num_of_audios
        assert all(example["text"] is not None for example in dataset)

Fails now because AudioDecoder needs to access the files after the lines below are run, but there seems to be some context issues. The file the decoder is trying to read is closed before the decoder gets the chance to decode it.

audiofolder.download_and_prepare()
datasets = audiofolder.as_streaming_dataset() if streaming else audiofolder.as_dataset()

…atter handles torchcodec objects. Fixed test scripts to work with new Audio backend

…_audio_feature_map_is_decoded test case. Implemented casting for VideoDecoder and AudioDecoder types

…ideo and Audio features. Fixed the the rest of the test files to be compatible with new Audio and Video features.

TyTodd · 2025-06-17T13:52:48Z

@lhoestq any updates on when this will be merged? Let me know if theres anything you need from my end.

lhoestq

Awesome ! I added a few comments :)

My main comment is about backward compatibility for audio, the rest looks good to me

src/datasets/features/audio.py

lhoestq · 2025-06-17T14:28:12Z

src/datasets/features/video.py

+                num_ffmpeg_threads=self.num_ffmpeg_threads,
+                device=self.device,
+                seek_mode=self.seek_mode,
+            )
        video._hf_encoded = {"path": path, "bytes": bytes_}


We also need _hf_encoded in audio now that it's also a reader (useful to speed up re-encoding when saving the a dataset)

lhoestq · 2025-06-17T14:32:45Z

src/datasets/features/audio.py

-    {'array': array([ 2.3443763e-05,  2.1729663e-04,  2.2145823e-04, ...,
-         3.8356509e-05, -7.3497440e-06, -2.1754686e-05], dtype=float32),
-     'path': '/root/.cache/huggingface/datasets/downloads/extracted/f14948e0e84be638dd7943ac36518a4cf3324e8b7aa331c5ab11541518e9368c/en-US~JOINT_ACCOUNT/602ba55abb1e6d0fbce92065.wav',
-     'sampling_rate': 16000}
+    <torchcodec.decoders._audio_decoder.AudioDecoder object at 0x11642b6a0>


for backward compatibility we'll need to subclass AudioDecoder to allow this:

audio_numpy_array = decorder["array"] audio_sampling_rate = decorder["sampling_rate"]

Just to make sure I'm understanding correctly, are you saying to instead return a object of a class that extends torchcodec.decoders._audio_decoder.AudioDecoder and overwrites the getitem() method?

Yes, hopefully torchcodec allows subclassing.
This is pretty necessary for audio since most training frameworks or scripts use the ["array"] syntax at the moment.

lhoestq · 2025-06-17T14:36:49Z

Btw I plan to release datasets 4.0 after your PR, this will be a major milestone :)

Co-authored-by: Quentin Lhoest <[email protected]>

TyTodd · 2025-06-17T19:05:55Z

@lhoestq just pushed the new changes.

HuggingFaceDocBuilderDev · 2025-06-18T14:32:11Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

lhoestq · 2025-06-18T16:28:32Z

Great ! I took the liberty to move the AudioDecoder to its own file and make small edits in the docs and docstrings

If it looks good to you I think we can merge :)

lhoestq

Alright this is ready ! Congrats on implementing this :)

I might do another pass at documentation before doing the new datasets release but this is already quite in good shape

TyTodd added 5 commits June 12, 2025 02:33

passes all but 1 test case

ea9fda8

Migrated Audio feature to use torchcodec as a backend. Fixed how form…

7be0dcf

…atter handles torchcodec objects. Fixed test scripts to work with new Audio backend

fixed audio and video features so they now pass the test_dataset_with…

c0d3fce

…_audio_feature_map_is_decoded test case. Implemented casting for VideoDecoder and AudioDecoder types

added load dataset test case to test_video.py

12511a3

Modified documentation to document new torchcodec implementation of V…

72f3ade

…ideo and Audio features. Fixed the the rest of the test files to be compatible with new Audio and Video features.

TyTodd mentioned this pull request Jun 13, 2025

Video and audio decoding with torchcodec #7607

Closed

TyTodd and others added 2 commits June 14, 2025 08:59

code formatting for torchcodec changes

c1843c3

Merge branch 'main' into torchcodec-decoding

8b29d61

TyTodd and others added 2 commits June 17, 2025 09:56

Merge branch 'main' into torchcodec-decoding

c4a1ac0

Merge branch 'main' into torchcodec-decoding

4dfff64

lhoestq reviewed Jun 17, 2025

View reviewed changes

TyTodd and others added 2 commits June 17, 2025 10:37

Update src/datasets/features/audio.py

e8b68e5

Co-authored-by: Quentin Lhoest <[email protected]>

added backwards compatibility support and _hf_encoded for Audio feature.

e9a4a14

lhoestq added 4 commits June 18, 2025 17:53

move AudioDecoder to its own file

6c0e425

naming

e74a9ee

docs

28e0173

style

c50c505

lhoestq and others added 8 commits June 19, 2025 16:18

update tests

806a4ba

Merge branch 'main' into torchcodec-decoding

f5a53c4

no torchcodec for windows

3ee5f90

further cleaning

eb6324c

fix

8a1e0bc

install ffmpeg in ci

661b574

fix ffmpeg installation

8036265

fix mono backward compatibility

b582c5b

lhoestq added 6 commits June 19, 2025 18:09

fix ffmpeg

4e265db

again

f043c0c

fix mono backward compat

37763db

fix tests

5198748

fix tests

f06ef21

again

4a637bd

lhoestq approved these changes Jun 19, 2025

View reviewed changes

lhoestq merged commit 161f99d into huggingface:main Jun 19, 2025
10 of 14 checks passed

lhoestq mentioned this pull request Jun 20, 2025

Windows Support pytorch/torchcodec#640

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Torchcodec decoding #7616

Torchcodec decoding #7616

Uh oh!

TyTodd commented Jun 13, 2025 •

edited

Loading

Uh oh!

TyTodd commented Jun 17, 2025

Uh oh!

lhoestq left a comment

Uh oh!

Uh oh!

lhoestq Jun 17, 2025 •

edited

Loading

Uh oh!

lhoestq Jun 17, 2025 •

edited

Loading

Uh oh!

TyTodd Jun 17, 2025

Uh oh!

lhoestq Jun 17, 2025

Uh oh!

lhoestq commented Jun 17, 2025

Uh oh!

TyTodd commented Jun 17, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jun 18, 2025

Uh oh!

lhoestq commented Jun 18, 2025

Uh oh!

lhoestq left a comment

Uh oh!

Uh oh!

Uh oh!

Torchcodec decoding #7616

Torchcodec decoding #7616

Uh oh!

Conversation

TyTodd commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New signatures

Audio

Video

Notes

Errors

Uh oh!

TyTodd commented Jun 17, 2025

Uh oh!

lhoestq left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lhoestq Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lhoestq Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TyTodd Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

lhoestq Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

lhoestq commented Jun 17, 2025

Uh oh!

TyTodd commented Jun 17, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jun 18, 2025

Uh oh!

lhoestq commented Jun 18, 2025

Uh oh!

lhoestq left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

TyTodd commented Jun 13, 2025 •

edited

Loading

lhoestq Jun 17, 2025 •

edited

Loading

lhoestq Jun 17, 2025 •

edited

Loading