Preliminary code release for our paper "A Minimalist Optimizer Design for LLM Pretraining", by Athanasios Glentis, Jiaxiang Li, Andi Han and Mingyi Hong. We introduce our proposed optimizer, SCALE, ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results