Geller, James (Committee chair)
Chun, Soon Ae (Committee member)
Phan, Hai Nhat (Committee member)
Date:
2017-12
Keywords:
Machine learning
Looping predictive method
Availability:
Unrestricted
Abstract:
The topic of this project is an analysis of drug-related tweets. The goal is to build a Machine Learning Model that can distinguish between tweets that indicate drug abuse and other tweets that also contain the name of a drug but do not describe abuse. Drugs can be illegal, such as heroin, or legal drugs with a potential of abuse, such as painkillers. However, building a good Machine Learning Model requires a large amount of training data. For each training tweet, a human expert has determined whether it indicates drug abuse or not. This is difficult work for humans. In this project a new “Looping Predictive Method” was developed that allows generating large training datasets from a small seed set of tweets by repeatedly adding machine-labeled tweets to the human-labeled tweets. With this method, an accuracy improvement of 15.4% was achieved from an initial set of 1,075 tweets, by expanding the training set to 29,908 tweets.
If you have any questions please contact the ETD Team, libetd@njit.edu.