State Space Models (SSMs) ā particularly the Mamba architecture family ā are emerging as a serious alternative to transformers for language modeling. Their key advantage: linear scaling with sequence length instead of the quadratic cost of self-attention.
The Technical Shift
Traditional transformers compute attention over every token pair, creating O(n²) complexity that makes very long contexts expensive. SSMs process sequences recurrently with fixed-size hidden states, achieving O(n) complexity. The Mamba architecture adds selective state space mechanisms that allow the model to dynamically filter information ā keeping relevant context while discarding noise.
Current Results
Hybrid architectures combining SSM layers with sparse attention layers are showing the most promise. These models match transformer quality on standard benchmarks while offering 3-5x faster inference on long sequences. The training efficiency gap is narrower but still meaningful ā SSM-based models reach competitive quality with 20-30% less compute.
What This Means
For practitioners, this could mean running context windows of 100K+ tokens on hardware that currently struggles with 8K. For researchers, it challenges the assumption that attention is all you need.
Frequently Asked Questions
Will SSMs replace transformers?
More likely we will see hybrid architectures that combine both. Pure SSMs still lag slightly on tasks requiring precise long-range retrieval, while transformers remain better at tasks needing explicit token-to-token comparison.
Can I use SSM models with Ollama?
As GGUF-format SSM models become available, Ollama support will follow. The inference runtime requirements are different from transformers, so adapter work is needed.