If I understand the problem correctly, unfortunately I don't think VectorAdam could solve that as VectorAdam does a similar uniform scaling for gradients that has a vector structure.

Nov 1, 2022 · 1:47 AM UTC

2