Replying to @jxmnop
The reason this mistake breaks deep ResNets is well known. Lots of papers showed why but I think ours is the simplest: arxiv.org/abs/2002.10444 In short, deep ResNets are trainable if the activations on branches are much smaller than the activations on the skip connection.

Nov 21, 2024 · 11:19 AM UTC

9