A Year in Review: What I've Been Up To (And What I've Learned)
Reflections on growth, working in AI, and walking the dog.
Reflections on growth, working in AI, and walking the dog.
DeepSeek-R1, OpenAI o1 and o3, test-time compute scaling, model post-training — and what the shift toward Reasoning Language Models actually means for the people building agent systems on top of them.
This piece introduces the Pyramid Approach, an agentic knowledge distillation technique that transforms messy source documents into structured, retrieval-ready knowledge.
A novel approach to lightweight safety classification that outperforms GPT-4o on content safety and prompt injection detection — using fewer than 100 training examples and a 0.5B parameter model. Here's how it works and why it matters.
When AI agents can see and interact with a screen the way a human does, what actually changes? This piece explores the multimodal shift and what it means for how we build and deploy agent systems.
Tool calling is often described as magic. It isn't. This piece unpacks exactly how function calling works under the hood — and why the interplay between tool use and model reasoning is what makes or breaks agentic systems.
GPT-4o mini, Llama 3.1, Mistral NeMo 12B — July 2024 brought a wave of model releases. Here's a clear-eyed guide to what actually changed, what's worth paying attention to, and what's mostly noise.
Pre-training, fine-tuning, RAG, prompt engineering — these aren't just buzzwords. This piece breaks down the actual mechanics of each approach and helps you choose the right technique for your specific problem.
Leaderboard scores tell you how a model performs on a benchmark. They tell you much less about how it performs on your problem. This piece examines the gap between evaluation and deployment — and what it means for how we assess AI progress.