vLLM Spyre Roadmap — Q3 2025
Features
Feature |
Priority |
PRs |
Continuous batching (homogeneous Tkv) |
P0 |
|
FP8 model loading |
P0 |
#316 |
Embedding model support (V1) |
P0 |
|
LoRA support |
P1 |
|
Continuous batching (heterogeneous Tkv) |
P1 |
|
Prefix caching (full/majority matching) |
P1 |
|
vLLM Integration
Feature |
Priority |
PRs |
Deprecate V0 API |
P0 |
#241, #344 |
Use BlockManager for batching |
P1 |
|
Replace FMS model loading with vLLM |
P2 |
|
Testing
Feature |
Priority |
PRs |
Continuous batching (homogeneous Tkv) |
P0 |
|
Precompiled model loading with continuous batching |
P0 |
|
128K context length support |
P0 |
|
FP8 model loading |
P0 |
#350, #359 |
See vLLM's Q3-2025 roadmap for its incoming features.