NHacker Next
- new
- past
- show
- ask
- show
- jobs
- submit
login
Odd that the author didn’t try giving a latent embedding to the standard neural network (or modulated the activations with a FiLM layer) and had static embeddings as the baseline. There’s no real advantage to using a hypernetwork and they tend to be more unstable and difficult to train, and scale poorly unless you train a low rank adaptation.
Rendered at 22:16:59 GMT+0000 (Coordinated Universal Time) with Vercel.