Without further ado, let’s get to it.
Target:
Make a summary of the youtube video
Installation
Install python 3.10. Strongly recommend using pyenv.
Install ffmpeg and yt-dlp. Scripts for mac.
# ffmpeg
brew install ffmpeg
# yt-dlp
wget https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp_macos
chmod +x yt-dlp_macos
sudo mv yt-dlp_macos /usr/local/bin/yt-dlp
Install whisper. For greater good use virtualenv:
pip install -U openai-whisper
Usage
Download video as an audio. ba == bestaudio.
yt-dlp -f "ba" -o "to_transcribe.%(ext)s" "https://youtube.com/watch?v=6VhD5QyDEh4"
In case webm was downloaded, convert it to aac:
ffmpeg -y -i "to_transcribe.webm" "to_transcribe.aac"
Run whisper:
python -m whisper --model medium to_transcribe.aac
If you have more than 10G of RAM or GPU memory, you can change model model from medium to large. But you’ll have to wait two times more.
Enjoy the result. Btw, I’d like to suggest using bat:
cat to_transcribe.txt
With 4 cores and 8 threads (i3-10105, 3.7 ГГц) and 8G RAM, it took 16 minutes to transcribe the video.
Do what you want with text. Summary or whatever.