dzivkovi/video-intel
4 stars · Last commit 2026-04-27
README preview
# Video Intel > **30 seconds to read a mind map vs. 30 minutes to watch the video.** > Scanned 15 videos from a single channel in under 2 minutes, ~$0.15-0.25 each. > Free tier covers 8 hours of YouTube video per day. Multimodal video intelligence powered by Gemini. Scan YouTube channels, generate thematic mind maps, and produce enriched transcripts that capture what was said AND what was shown on screen. ## Key Principles - **Multimodal, not transcript-based.** Gemini sees video frames at 1 FPS, reads all on-screen text, and hears audio simultaneously. When a presenter says "as you can see here," the output tells you what was actually shown. - **Decoupled task prompting.** Transcription (audio) and speaker identification (vision) run as separate tasks within a single prompt to preserve attention quality, borrowed from Laurent Picard's research. - **Scan-then-triage funnel.** Mind maps are cheap and fast. Read 30-second summaries, then spend transcript budget only on videos worth deep engagement.