Consider the input sentence - "I am good".
In RNNs, we feed the sentence to the network word by word. That is, first the word "I
" is passed as input, next the word "am
" is passed, and so on. We feed the sentence word by word so that our network understands the sentence completely.
But with the transformer network, we don't follow the recurrence mechanism. So, instead of feeding the sentence word by word, we feed all the words in the sentence parallel to the network. Feeding the words in parallel helps in decreasing the training time and also helps in learning the long-term dependency.
When we feed the words parallel to the transformer, the word order (position of the words in the sentence) is important. So, we should give some information about the word order to the transformer so that it can understand the sentence.
If we pass the input matrix directly to the transformer, it cannot understand the word order. So, instead of feeding the input matrix directly to the transformer, we need to add some information indicating the word order (position of the word) so that our network can understand the meaning of the sentence. To do this, we introduce a technique called positional encoding. Positional encoding, as the name suggests, is an encoding indicating the position of the word in a sentence (word order).