Articles via Databases
Articles via Journals
Online Catalog
E-books
Research & Information Literacy
Interlibrary loan
Theses & Dissertations
Collections
Policies
Services
About / Contact Us
Administration
Littman Architecture Library
This site will be removed in January 2019, please change your bookmarks.
This page will redirect to https://digitalcommons.njit.edu/dissertations/1565/ in 5 seconds

The New Jersey Institute of Technology's
Electronic Theses & Dissertations Project

Title: Statistics-based anomaly detection and correction method for amazon customer reviews
Author: Chatterjee, Ishani
View Online: njit-etd2021-064
(XIV, 96 pages ~ 2.7 MB pdf)
Department: Department of Electrical and Computer Engineering
Degree: Doctor of Philosophy
Program: Computer Engineering
Document Type: Dissertation
Advisory Committee: Zhou, MengChu (Committee chair)
Ansari, Nirwan (Committee member)
Nguyen, Hieu Pham Trung (Committee member)
Liu, Qing Gary (Committee member)
Yan, Zhipeng (Committee member)
Date: 2021-12
Keywords: Amazon customer review
Anomaly detection
Machine learning
Natural language processing
Outlier detection
Sentiment analysis
Availability: Unrestricted
Abstract:

People nowadays use the Internet to project their assessments, impressions, ideas, and observations about various subjects or products on numerous social networking sites. These sites serve as a great source of gathering information for data analytics, sentiment analysis, natural language processing, etc. The most critical challenge is interpreting this data and capturing the sentiment behind these expressions. Sentiment analysis is analyzing, processing, concluding, and inferencing subjective texts with the views. Companies use sentiment analysis to understand public opinions, perform market research, analyze brand reputation, recognize customer experiences, and study social media influence. According to the different needs for aspect granularity, it can be divided into document, sentence, and aspect-based sentiment analysis.

Conventionally, the true sentiment of a customer review matches its corresponding star rating. There are exceptions when the star rating of a review is opposite to its true nature. These are labeled as the outliers in a dataset for this work. The state-of-the-art methods for anomaly detection involve manual search, predefined rules, or machine learning techniques to detect such instances. This dissertation work proposes a statistics-based anomaly detection and correction method (SADCM), which helps identify such reviews and rectify their star ratings to enhance the performance of a sentiment analysis algorithm without any data loss. This data analysis pipeline preserves these outliers to correct them and prevents any information loss.

This research work focuses on performing SADCM in datasets containing customer reviews of various products, which are a) scraped from Amazon.com and b) publicly available. The scraped dataset includes 35,000 Amazon customer reviews while the publicly available dataset includes 100,000 Amazon customer reviews for multiple products reviewed this year. The research work also analyzes these datasets and concludes the effect of SADCM on the performances of several sentiment analysis algorithms. The results exhibit that SADCM outperforms other state-of-the-art anomaly detection algorithms with a higher accuracy and recall percentage for all the datasets. The proposed method should thus help businesses that rely on public reviews to enhance their performances in better decision-making.


If you have any questions please contact the ETD Team, libetd@njit.edu.

 
ETD Information
Digital Commons @ NJIT
Theses and DIssertations
ETD Policies & Procedures
ETD FAQ's
ETD home

Request a Scan
NDLTD

NJIT's ETD project was given an ACRL/NJ Technology Innovation Honorable Mention Award in spring 2003