Without further ado, let’s get to it.
Target:
Make a summary of the youtube video
Installation
Install python 3.10. Strongly recommend using pyenv.
Install ffmpeg and yt-dlp. Scripts for mac.
# ffmpeg
brew install ffmpeg
# yt-dlp
wget https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp_macos
chmod +x yt-dlp_macos
sudo mv yt-dlp_macos /usr/local/bin/yt-dlp
Install whisper. For greater good use virtualenv:
pip install -U openai-whisper
Usage
Download video as an audio. ba
== bestaudio
.
yt-dlp -f "ba" -o "to_transcribe.%(ext)s" "https://youtube.com/watch?v=6VhD5QyDEh4"
In case webm was downloaded, convert it to aac:
ffmpeg -y -i "to_transcribe.webm" "to_transcribe.aac"
Run whisper:
python -m whisper --model medium to_transcribe.aac
If you have more than 10G of RAM or GPU memory, you can change model model
from medium
to large
. But you’ll have to wait two times more.
Enjoy the result. Btw, I’d like to suggest using bat:
cat to_transcribe.txt
With 4 cores and 8 threads (i3-10105, 3.7 ГГц) and 8G RAM, it took 16 minutes to transcribe the video.
Do what you want with text. Summary or whatever.