Difference between next-token-prediction vs masked-language-modeling in LLM?

Both are techniques for training large language models and involve predicting a word in a sequence of words.

Next token prediction: the model is given a sequence of words with the goal of predicting the next word. For example, given the phrase Hannah is a ____, the model would try to predict:
- Hannah is a sister
- Hannah is a friend
- Hannah is a marketer
- Hannah is a comedian
Masked-language-modeling: the model is given a sequence of words with the goal of predicting a masked word in the middle. For example, given the phrase, Jako mask reading, the model would try to fill the gap as,
- Jacob fears reading
- Jacob loves reading
- Jacob enjoys reading
- Jacon hates reading