Skip to content

vLLM Spyre Roadmap — Q3 2025

Features

Feature Priority PRs
Continuous batching (homogeneous Tkv) P0
FP8 model loading P0 #316
Embedding model support (V1) P0
LoRA support P1
Continuous batching (heterogeneous Tkv) P1
Prefix caching (full/majority matching) P1

vLLM Integration

Feature Priority PRs
Deprecate V0 API P0 #241, #344
Use BlockManager for batching P1
Replace FMS model loading with vLLM P2

Testing

Feature Priority PRs
Continuous batching (homogeneous Tkv) P0
Precompiled model loading with continuous batching P0
128K context length support P0
FP8 model loading P0 #350, #359

See vLLM's Q3-2025 roadmap for its incoming features.