Text preprocessing is done to transform a text into a more digestible form so that the machine learning algorithms can perform better. It is found that in tasks such as sentiment analysis, performing some preprocessing such as removing stop-words helps improve the accuracy of the machine learning model.
- Some common text preprocessing done are:
- removing HTML tags,
- removing stop-words,
- removing numbers,
- lower casing all letters,
- Lemmatization.