hyperparameter accuracy MLP
hyperparameter based PyTorch implementation for kernel regularization.
- Input
- 2358-dim embedding
- Encoder
- 91 x MLP with 28 heads
- Output
- perplexity projection
Training config
optimizer=NAdam, lr=0.523, scheduler=plateau, warmup=1001