Loading...

Something went wrong while trying to load the full version of this site. Try hard-refreshing this page to fix the error.

All Discussions

Why does removing stop words sometimes hurt a sentiment analysis model?
Many models use relative position instead of absolute position embedding
The same weights for both the embedding layer and the layer just before softmax?
What are some of the problems with context-based word embeddings?
What’s the difference between count-based and prediction-based word embeddings?
Why do we need word embeddings?
Language models are often referred to as unsupervised learning
Why do we say a language model is a density estimator?
RNN & LSTM
Common practice for the learning rate to be reduced throughout the training
Weight Decay in ML
Squared L2 norm sometimes preferred to L2 norm for regularizing?
Compare batch norm and layer norm
What’s learning rate warmup? Why do we need it?
Explain regularization techniques such as L1 or L2 regularization
The model’ weights fluctuate a lot during training
Training deep learning models using epochs
Gradient descent vs SGD vs mini-batch SGD?
What criteria would you use for early stopping?
When the validation loss is often lower than the train loss

« Previous Page Next Page »