Introducing v0.1-2x2 Stage 002

Neural Mesh rethinks Mixture-of-Experts by arranging experts in a bidimensional grid that allows direct neighbor communication. Instead of routing to isolated experts, Mesh enables tokens to be dynamically routed while letting selected experts exchange information along grid edges, improving collaboration and contextual consistency.

mesh

Introducing mesh

This is our first ever model! Allow us to explain how the mesh architecture works in detail.

Top-1 routing: 173M active parameters
Top-2 routing: 242M active parameters (default)
Dense routing: 302M active parameters

Mesh model routing

Expert 1, 1

Expert 2, 1

Expert 1, 2

Expert 2, 2

top-1

Active: Expert 1,1

173M active parameters

Animation: an expert is activated, then the routing mode toggles to top-2 and a bond forms between experts (vertical or horizontal) to illustrate neighbor exchange.