Supported Models¶
The vLLM Spyre plugin relies on model code implemented by the Foundation Model Stack.
Configurations¶
The following models have been verified to run on vLLM Spyre with the listed configurations.
Decoder Models¶
Static Batching:
Model | AIUs | Prompt Length | New Tokens | Batch Size |
---|---|---|---|---|
Granite-3.3-8b | 4 | 7168 | 1024 | 4 |
Continuous Batching:
Model | AIUs | Context Length | Batch Size |
---|---|---|---|
Granite-3.3-8b | 1 | 3072 | 16 |
Granite-3.3-8b | 4 | 32768 | 32 |
Granite-3.3-8b (FP8) | 1 | 3072 | 16 |
Granite-3.3-8b (FP8) | 4 | 32768 | 32 |
Encoder Models¶
Model | AIUs | Context Length | Batch Size |
---|---|---|---|
Granite-Embedding-125m (English) | 1 | 512 | 1 |
Granite-Embedding-125m (English) | 1 | 512 | 64 |
Granite-Embedding-278m (Multilingual) | 1 | 512 | 1 |
Granite-Embedding-278m (Multilingual) | 1 | 512 | 64 |
BAAI/BGE-Reranker (v2-m3) | 1 | 8192 | 1 |
BAAI/BGE-Reranker (Large) | 1 | 512 | 1 |
BAAI/BGE-Reranker (Large) | 1 | 512 | 64 |