What is it?
A tool that helps NerdCast podcast listeners find exactly which episode and timestamp discusses a specific topic or tells a particular story they're looking for.
Features
- Natural language search across all NerdCast episodes
- Timestamp-specific results pointing to exact moments in episodes
- Participant filtering using '@' followed by the participant's name
- Context-aware search that improves with more detailed queries
- Direct links to the exact moments in episodes
Tech Stack
- Next.js
- OpenAI API (ChatGPT-4o-mini) for RAG implementation
- Whisper medium for high-quality transcription
- Upstash for vector database storage
- Cloudflare for hosting and distribution
- mxbai-embed-large-v1 for embeddings
The Backstory
As a long-time NerdCast listener, I often found myself trying to remember which episode contained a specific story or discussion. This is a common problem in the NerdCast community - remembering the content but not where to find it.
The project started when I discovered a dataset of transcribed NerdCast episodes. I built a proof of concept by dividing transcriptions into chunks with overlap, saving timestamps, and storing them in a vector database. After testing the initial search capabilities with ChatGPT, I noticed transcription errors affecting the results.
To improve accuracy, I re-transcribed all episodes using Whisper medium (RIP my GPU) and experimented with different chunk sizes and overlaps until finding the optimal configuration. The result is a tool that helps listeners quickly locate specific content across the vast NerdCast archive.
The more context you provide in your search, the better the results. You can also filter by participant by typing '@' followed by their name.