Detecting similar opinion holders for massive sentiment analysis
Main Article Content
Abstract
Sentiment Analysis is the study of acquisition, extraction and interpretation of human opinions, sentiments, attitudes and emotions from both structured and unstructured data sources. Also called opinion mining, the field is becoming crucial for various application areas including market researches, politics, sociology and economics. Therefore, many outstanding research efforts are performed on the fields including both theoretical and practical aspects. This paper aims to develop a supportive framework for sentiment analysis, focusing on the similarity of opinion holders in a massive dataset. We used e-commerce review dataset of Amazon spanning May 1996 – July 2014. The whole review set includes more than 140 million entries. As a preprocessing task each review is structured and expressed on a quadruple form of 4 dimensions: Target entity, opinion holder, sentiment and time. The aim of this study is to find out similar opinion holders for a given customer on a certain product in real time. We have defined a new method spanning all the opinions of an individual. The idea behind this calculation of similarity is rating of the same product with the same sentiment factor by two different opinion holders. The real-time calculation is also performed on Hadoop clusters. Performance enhancements and accuracy rates are then discussed.
Keywords: sentiment analysis, opinion mining, big data analytics, Map-Reduce
Downloads
Article Details
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
References
M. J. Shaw, C. Subramaniam, G. W. Tan, & M. E. Welge, (2001). “Knowledge management and data mining for marketing,†Decis. Support Syst., 31(1), 127–137,
J. McAuley, “Amazon product data,†2015. [Online]. Available: http://jmcauley.ucsd.edu/data/amazon/.
Pandas, “Pandas Data Analysis Library,†2015. [Online]. Available: http://pandas.pydata.org/.
C. L. Philip Chen and C.-Y. Zhang, “Data-intensive applications, challenges, techniques and technologies: A survey on Big Data,†Inf. Sci. (Ny)., vol. 275, pp. 314–347, Aug. 2014.
U. Gupta and L. Fegaras, “Map-based graph analysis on MapReduce,†2013 IEEE Int. Conf. Big Data, pp. 24–30, Oct. 2013.