Skip to content

A collection of YouTube videos transcripts : Podcasts (Joe Rogan Experience, Tim Ferris, Jocko podcast, ..), lectures (YaleCourses, MIT lectures, Jordan B. Peterson talks, ..). A big transcripts salad spanning history, geography, science, politics, film making and more.

Notifications You must be signed in to change notification settings

ericchagnon15/ScribeSalad

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

ScribeSalad

In absence of searchable transcripts, many interesting YouTube videos, podcasts, lectures and talks are hard to explore, quote and summarize. ScribeSalad is an open data project regrouping over 100k YouTube video transcripts discussing social and political issues, psychology, history and scientific topics ranging from biology, mathematics to artificial intelligence : The Joe Rogan Experience, The Rubin Report, Jordan B. Peterson talks, Yale courses, MIT lectures and more. This project is a first step towards making great content more available and inspiring speakers, storytellers, interviewers and scientists better heard.

Available transcripts

Transcription quality

Some of the transcriptions originate from YouTube (subtitles uploaded by the video's owner) while the rest are generated automatically using a high-accuracy large-vocabulary continuous speech recognition system (~90% of accuracy in clean conditions : no background noise, no heavy accents and good quality audio).

Filenames and formats

The transcripts identified using the corresponding YouTube videos IDs and each one is available in three formats : text, vtt (Text Tracks Format) and srt (SubRip Subtitle Format).

To open the original video, replace "ID" in https://www.youtube.com/watch?v=ID by the transcript filename.

Terms of use

This is an open data project, feel free to fork this repository, download, share and use any of the transcripts.

TODO

  • Cleaning-up transcripts : removing fillers (hum, ah, etc) and repetitions.
  • Topic modeling : automatically discovering the abstract "topics" that occur in a each transcript.
  • Speaker identification : who spoken when ? and for how long ?
  • Creating a search engine : exploring subjects by speaker, topic, channel, etc.
  • Multiligual transcripts : Translating all transcripts to other languages.
  • More channels & more videos.

About

A collection of YouTube videos transcripts : Podcasts (Joe Rogan Experience, Tim Ferris, Jocko podcast, ..), lectures (YaleCourses, MIT lectures, Jordan B. Peterson talks, ..). A big transcripts salad spanning history, geography, science, politics, film making and more.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published