This project uses Natural Language Processing (NLP) to classify Amazon product reviews based on written text. The goal was to gain consumer insight by leveraging text-based features and machine learning models.
Source: Amazon Product Reviews on Kaggle
Size: 500,000+ reviews
Of our 500,000 reviews, significantly more were given a 5-star rating. This bias should be kept in mind for the product conclusions.
We can see a clear difference in the tone of the phrases for 1 vs 5-star ratings. However, we can also see some outliers such as 'phone case' and 'screen protector'. This give us insight as to what the users are focused on in their reviews and can potentially be used to target product improvements.
I compared sentiment polarity with the actual star rating and trained models to predict the rating from review text. I was able to do this with 81% accuracy. We should keep bias in mind here and in the future, we would option to test our model on new reviews to get a better meausure of accuracy.