Do language models need sleep?
A CMU/UMD paper shows that what limits SSM-attention hybrids on deep reasoning isn't memory capacity but the number of passes over the context. A 'sleep' mechanism runs N recurrent passes at window eviction and consolidates the context into fast weights, moving computational depth offline.
Jakub KontraDeveloper