MusicCap Dataset Testing: Overfitting Issues in Model Predictions #9

oyzh888 · 2024-01-31T08:42:57Z

Hello,

I appreciate your excellent work and have a question regarding the testing process, specifically on how to ensure proper testing without falling into the trap of overfitting.

We conducted a test using the MusicCap dataset (https://huggingface.co/datasets/google/MusicCaps), which contains approximately 5.52K samples, somewhat akin to the 60K mentioned in your paper.

However, we encountered an issue where some samples appear to be "overfitting". Is this a normal occurrence? For instance, we observed cases where the model's prediction exactly matches the label in the MusicCap dataset.

One example involves the YouTube video with the ID -FFx68qSAuY (https://www.youtube.com/watch?v=-FFx68qSAuY).
Audio file(you should uncompress it):
-FFx68qSAuY.wav.zip

The model predicted:

{
  'text': 'This is a punk rock music piece. There are male vocals singing in a grunt-like manner. The melody is being played by an electric guitar while a bass guitar plays in the background. The rhythm consists of a slightly fast-paced rock acoustic drum beat. The piece has an aggressive atmosphere. It could be used in the soundtrack of an action-filled video game.',
  'time': '0:00-10:00'
}

For our tests, we used the following code: https://github.com/seungheondoh/lp-music-caps/blob/main/lpmc/music_captioning/captioning.py#L52, and executed the command:

python3 captioning.py --audio_path ../music_cap/lp-music-caps/lpmc/music_captioning/workspace/audio/-FFx68qSAuY.wav

Additionally, we noticed similar overfitting issues in approximately 10% of the samples, including these YouTube links:
https://www.youtube.com/watch?v=PpJKo-JPVU0
https://www.youtube.com/watch?v=p0oRrGDrQQw
Could you provide insights or guidance on this matter?

Thank you.

The text was updated successfully, but these errors were encountered:

diggerdu · 2024-03-04T09:01:52Z

@seungheondoh
Hello,
I really appreciate your excellent work. But unknown train-test splits make it hard to follow up.

seungheondoh · 2024-04-29T12:51:41Z

@oyzh888
Sorry for the late reply. My model was trained with MusicCaps Training Split.
Therefore, if you infer with MusicCaps, you may get similar results.
The uploaded data is MusicCaps Test Split?

seungheondoh · 2024-04-29T12:52:04Z

@diggerdu please check #4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MusicCap Dataset Testing: Overfitting Issues in Model Predictions #9

MusicCap Dataset Testing: Overfitting Issues in Model Predictions #9

oyzh888 commented Jan 31, 2024 •

edited

Loading

diggerdu commented Mar 4, 2024

seungheondoh commented Apr 29, 2024 •

edited

Loading

seungheondoh commented Apr 29, 2024

MusicCap Dataset Testing: Overfitting Issues in Model Predictions #9

MusicCap Dataset Testing: Overfitting Issues in Model Predictions #9

Comments

oyzh888 commented Jan 31, 2024 • edited Loading

diggerdu commented Mar 4, 2024

seungheondoh commented Apr 29, 2024 • edited Loading

seungheondoh commented Apr 29, 2024

oyzh888 commented Jan 31, 2024 •

edited

Loading

seungheondoh commented Apr 29, 2024 •

edited

Loading