🔬 Expert Co-activation Geometry
Research Question: Does OLMoE's router systematically co-activate geometrically complementary expert pairs (low head_sim)?
| Step | Hardware | What it does |
|---|---|---|
| 1 | CPU | Precompute & cache 64×64 head_sim matrices |
| 2 | GPU (≤60s) | Forward pass on WikiText-2, capture routing indices |
| 3 | CPU | Statistical analysis: co-activated vs random |
Each layer ≈ 3–5 min. Cached to disk — runs once only.
Projection