@ylecun the work Elad posted interleaves fixed convolutions over the sequence with learnable MLPs. So it's not that learning doesn't happen, it's just that it doesn't happen in the part of the model which transmits information along the sequence
Feb 12, 2024 · 4:25 PM UTC
5
