Introduction

Recently, the most popular language models like OpenAI’s ChatGPT, Google’s Gemini, and GitHub’s Copilot are all based on Transformer architecture. However, Transformers have a significant drawback due to their use of attention mechanisms, which result in quadratic growth in computational demands as sequence length increases. Simply put, for quick interactions (such as asking ChatGPT to tell a joke), this is manageable. However, for tasks requiring extensive word processing (such as summarizing a 100-page document), Transformers can become prohibitively slow. Mamba aims to address this issue by offering better performance than similarly sized Transformers, with computational requirements scaling linearly with sequence length.

Minimum spanning tree

Triton

Some tutorial

CUDA学习第18课:Blelloch扫描算法以及如何选择扫描算法_哔哩哔哩_bilibili

State Space Duality (Mamba-2) Part I - The Model

A Visual Guide to Mamba and State Space Models - Maarten Grootendorst

Some Papers

State Space Model (SSM)

(NeurIPS 2020 Spotlight) HiPPO: Recurrent Memory with Optimal Polynomial Projections Paper Code Stars

(ICLR 2022) S4: Efficiently Modeling Long Sequences with Structured State Spaces Paper Code Stars

(ICLR 2023) H3: Hungry Hungry Hippos: Toward Language Modeling with State Space Models Paper Code Stars

(Arxiv 24.05.26) A Unified Implicit Attention Formulation for Gated-Linear Recurrent Sequence Models Paper Code Stars

(Arxiv 24.05.27) The Expressive Capacity of State Space Models: A Formal Language Perspective Paper Code Stars

(Arxiv 24.06.12) An Empirical Study of Mamba-based Language Models Paper

Mamba

(Arxiv 23.12.01) Mamba: Linear-Time Sequence Modeling with Selective State Spaces Paper Code Stars

(Arxiv 24.05.31, ICML24, Mamba-2) Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality Paper Code Stars

Vision

(Arxiv 24.01.17) Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model Paper Code Stars

(Arxiv 24.01.18) VMamba: Visual State Space Model Paper Code Stars

(Arxiv 24.02.05) Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining Paper Code Stars

(Arxiv 24.02.06) U-shaped Vision Mamba for Single Image Dehazing Paper Code Stars

(Arxiv 24.02.23) MambaIR: A Simple Baseline for Image Restoration with State-Space Model Paper Code Stars

(Arxiv 24.03.15) EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba Paper Code Stars

(Arxiv 24.03.28) RSMamba: Remote Sensing Image Classification with State Space Model Paper Code Stars

(Arxiv 24.05.26) Demystify Mamba in Vision: A Linear Attention Perspective Paper Code Stars

(Arxiv 24.06.04) GrootVL: Tree Topology is All You Need in State Space Model Paper Code Stars