Hacker News — vinext + Cloudflare Workers

new
past
show
ask
show
jobs
submit

▲Gram Newton-Schulz: A Fast, Hardware-Aware Newton-Schulz Algorithm for Muon (tridao.me)

30 points by jxmorris12 4 days ago | 5 comments

jnwatson 1 days ago [-]

Back in the elden days, I took a course called "Large Scale Scientific Computing". It was mostly about multiplying large matrices. I didn't think this was going to be remotely applicable to anything commercial.

Boy was I wrong.

cs702 1 days ago [-]

A superior alternative to standard Muon and AdamW optimizers for training large models.

Fantastic work, instantly valuable, immediately usable.

A big THANK YOU to the authors:

Jack Zhang, Noah Amsel, Berlin Chen, and Tri Dao

ainch 1 days ago [-]

Tri Dao's lab must have saved countless watts with FlashAttention. Great to see them continuing to open-source massive efficiency gains.

spwa4 1 days ago [-]

Ah ... the temptation of the optimizer. It's such a simple algorithm, it has far more impact on back-propagation calculations than ... the actual backprop calculation, never mind details like model architecture. So tempting to work on it.

But so very, very, very, very hard to make progress on it. Even at PhD level. Just don't try ...

akoboldfrying 1 days ago [-]

Only read the first section but this sounds really impressive -- up to 50% of up to 17% of training time when using the Muon optimiser, so up to around 7% of basically pure improvement with no downside.

Rendered at 10:32:23 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.