If I understand the problem correctly, unfortunately I don't think VectorAdam could solve that as VectorAdam does a similar uniform scaling for gradients that has a vector structure.
Nov 1, 2022 · 1:47 AM UTC
2
Nov 1, 2022 · 1:47 AM UTC