The Upper Left-Hand Brick

Speculative Decoding in LLMs

An overview of major techniques

I recently gave a talk on Speculative Decoding for the EleutherAI ML Performance Reading Group, focusing on how modern speculative sampling techniques improve LLM inference latency and throughput.

The talk covers the evolution of speculative decoding from both a research and systems perspective, including:

Beyond algorithmic intuition, the talk discusses systems tradeoffs such as acceptance rates, verification cost, batching behavior, KV-cache management, and practical implementations in vLLM and SGLang.

Talk & slides:

## certainty-certain #engineering #llm #research #systems #technical