AI in Practice

Improving Agent Systems & AI Reasoning

DeepSeek-R1, OpenAI o1 and o3, test-time compute scaling, model post-training — and what the shift toward Reasoning Language Models actually means for the people building agent systems on top of them.

February 2, 2025
Research

Introducing Layer Enhanced Classification (LEC)

A novel approach to lightweight safety classification that outperforms GPT-4o on content safety and prompt injection detection — using fewer than 100 training examples and a 0.5B parameter model. Here's how it works and why it matters.

December 15, 2024
Navigating a Fast-Moving Field

Navigating the Latest GenAI Model Announcements — July 2024

GPT-4o mini, Llama 3.1, Mistral NeMo 12B — July 2024 brought a wave of model releases. Here's a clear-eyed guide to what actually changed, what's worth paying attention to, and what's mostly noise.

July 25, 2024
AI in Practice

Understanding Techniques for Solving GenAI Challenges

Pre-training, fine-tuning, RAG, prompt engineering — these aren't just buzzwords. This piece breaks down the actual mechanics of each approach and helps you choose the right technique for your specific problem.

May 14, 2024
AI in Practice

Are Language Models Benchmark Savants or Real-World Problem Solvers?

Leaderboard scores tell you how a model performs on a benchmark. They tell you much less about how it performs on your problem. This piece examines the gap between evaluation and deployment — and what it means for how we assess AI progress.

February 28, 2024