Introducing v0.1-2x2 Stage 002
Neural Mesh rethinks Mixture-of-Experts by arranging experts in a bidimensional grid that allows direct neighbor communication. Instead of routing to isolated experts, Mesh enables tokens to be dynamically routed while letting selected experts exchange information along grid edges, improving collaboration and contextual consistency.
mesh
Introducing mesh
This is our first ever model! Allow us to explain how the mesh architecture works in detail.
Top-1 routing: 173M active parameters Top-2 routing: 242M active parameters (default) Dense routing: 302M active parameters
Mesh model routing
Expert 1, 1
Expert 2, 1
Expert 1, 2
Expert 2, 2
top-1
Active: Expert 1,1
173M active parameters
Animation: an expert is activated, then the routing mode toggles to top-2
and a bond forms between experts (vertical or horizontal) to illustrate neighbor exchange.