Loading...

Something went wrong while trying to load the full version of this site. Try hard-refreshing this page to fix the error.

All Discussions

Why are GANs so hard to train?
What do GANs converge to?
Why do we say that Bayesian neural networks are natural ensembles?
Pros and cons of Bayesian NN compared to the mainstream neural networks?
How do Bayesian methods differ from the mainstream deep learning approach?
What’s gradual unfreezing? How might it help with transfer learning?
Having very little labeled data to build a classifier to predict sentiment?
Changing the number of heads in multi-headed attention
Why need multi-headed attention instead of just one head for attention?
Why would you choose a self-attention architecture over RNNs or CNNs?
What’s the motivation for self-attention?
When would an autoencoder be useful
TF-IDF, Cosine Similarity and Top-Ranked Documents
When using an n-gram or a neural language model?
Does increasing the context length (n-gram) improve the model’s performance?
Problems of using softmax as the last layer for word-level language models
What's the Levenshtein distance?
The pros and cons of BLEU - a popular metric for machine translation
Character-level entropy vs word-level entropy
Case-sensitive or case-insensitive text corpus to train a NER model

« Previous Page Next Page »