← Back to Workflow
Skill

YouTube.com Browsing Skill

Provides instructions for programmatically extracting video subtitles and transcripts using yt-dlp on Windows.

YouTube (youtube.com) Browsing Skill

This skill documents how to programmatically interact with YouTube, specifically for extracting transcripts.

YouTube Video Transcripts

Use yt-dlp to download subtitle files directly to disk, bypassing all Windows PowerShell encoding issues. Do NOT use youtube-transcript-api via CLI piping — it produces UTF-16LE/BOM-corrupted files on Windows.

  1. Run the yt-dlp command (no pip install needed if already installed; check with python -m yt_dlp --version):
python -m yt_dlp --write-auto-subs --sub-format json3 --skip-download -o "<OUTPUT_NAME>" <YOUTUBE_URL>

This writes a file like <OUTPUT_NAME>.en.json3 (or .en.vtt if json3 isn't available for that video) directly to the current directory.

  1. Extract plain text from the json3 file:
import json
data = json.load(open('<OUTPUT_NAME>.en.json3', 'r', encoding='utf-8'))
text = ' '.join([seg['utf8'] for ev in data.get('events', []) for seg in ev.get('segs', []) if 'utf8' in seg])
  1. If the file is .vtt instead of .json3, extract plain text with:
import re
raw = open('<OUTPUT_NAME>.en.vtt', 'r', encoding='utf-8').read()
text = re.sub(r'<[^>]+>|[0-9]{2}:[0-9]{2}:[0-9]{2}.*', '', raw).replace('\n', ' ')

Notes:

  • yt-dlp writes files via its own internal I/O, completely bypassing the PowerShell stdout encoding pipeline that caused all previous corruption.
  • If yt-dlp is not installed: pip install yt-dlp
  • Tested and confirmed working 2026-04-24.

This is used in: