The article discusses the development of sequence models that can handle longer sequences with more context.
- While traditional attention-based Transformers scale quadratically with sequence length, the authors have developed models based on structured state space models (SSMs) that scale nearly linearly, allowing for much longer sequence lengths.
- These models, including Hippo, S4, H3, and Hyena, have shown promising results in various benchmarks and tasks.
- The authors also explore the use of the FFT and learned matrices to improve efficiency and performance.
- The article concludes with the exciting possibilities that longer-sequence models can offer for various applications
















