YouTube.com Browsing Skill
Provides instructions for programmatically extracting video subtitles and transcripts using yt-dlp on Windows.
YouTube (youtube.com) Browsing Skill
This skill documents how to programmatically interact with YouTube, specifically for extracting transcripts.
YouTube Video Transcripts
Use yt-dlp to download subtitle files directly to disk, bypassing all Windows PowerShell encoding issues. Do NOT use youtube-transcript-api via CLI piping — it produces UTF-16LE/BOM-corrupted files on Windows.
- Run the yt-dlp command (no pip install needed if already installed; check with
python -m yt_dlp --version):
python -m yt_dlp --write-auto-subs --sub-format json3 --skip-download -o "<OUTPUT_NAME>" <YOUTUBE_URL>
This writes a file like <OUTPUT_NAME>.en.json3 (or .en.vtt if json3 isn't available for that video) directly to the current directory.
- Extract plain text from the json3 file:
import json
data = json.load(open('<OUTPUT_NAME>.en.json3', 'r', encoding='utf-8'))
text = ' '.join([seg['utf8'] for ev in data.get('events', []) for seg in ev.get('segs', []) if 'utf8' in seg])
- If the file is
.vttinstead of.json3, extract plain text with:
import re
raw = open('<OUTPUT_NAME>.en.vtt', 'r', encoding='utf-8').read()
text = re.sub(r'<[^>]+>|[0-9]{2}:[0-9]{2}:[0-9]{2}.*', '', raw).replace('\n', ' ')
Notes:
- yt-dlp writes files via its own internal I/O, completely bypassing the PowerShell stdout encoding pipeline that caused all previous corruption.
- If
yt-dlpis not installed:pip install yt-dlp - Tested and confirmed working 2026-04-24.