Science, Technology, Engineering and Mathematics.
Open Access

IMPROVING RATING PREDICTION ACCURACY THROUGH ADVANCED TEXT SUMMARIZATION AND SENTIMENT ANALYSIS TECHNIQUES

Download as PDF

Volume 1, Issue 1, Pp 1-6, 2024 

DOI: 10.61784/jtfe3003

Author(s)

Michael Muskan1*, Jude Berner1, Jenny Winmark2

Affiliation(s)

1 School of Computer Science, University of Bristol, Bristol, UK.

2 Department of Computer Science, University of Manchester, Manchester, UK.

Corresponding Author

Michael Muskan

ABSTRACT

In the era of, online reviews have become a cornerstone of decision-making for consumers. This study addresses the challenging task of predicting ratings for long-form movie reviews, a problem that has been less explored compared to short review analysis. We propose a novel approach that combines advanced text summarization techniques with sentiment analysis to improve rating prediction accuracy. Utilizing an enhanced TextRank algorithm and Support Vector Machine (SVM) classification, our method demonstrates superior performance in predicting ratings for extensive movie reviews. The study uses a large dataset from Douban, a popular Chinese social networking service, and shows that summarized reviews can match or exceed the prediction accuracy of full-length reviews. Our findings highlight the effectiveness of integrating sentiment features and position-based weighting in the summarization process, opening new avenues for processing and analyzing long-form user-generated content.

KEYWORDS

Rating prediction accuracy; Digital content consumption; E-commerce; TextRank algorithm

CITE THIS PAPER

Michael Muskan, Jude Berner, Jenny Winmark. Improving rating prediction accuracy through advanced text summarization and sentiment analysis techniques. Journal of Trends in Financial and Economics. 2024, 1(1): 1-6. DOI: 10.61784/jtfe3003.

REFERENCES

[1] Zhang Y, Lin Z. Predicting the helpfulness of online product reviews: A multilingual approach. Electronic Commerce Research and Applications, 2018, 27: 1-10.

[2] Duan W, Gu B, Whinston AB. The dynamics of online word-of-mouth and product sales—An empirical investigation of the movie industry. Journal of Retailing, 2008, 84(2): 233-242.

[3] Salehan M, Kim DJ. Predicting the performance of online consumer reviews: A sentiment mining approach to big data analytics. Decision Support Systems, 2016, 81: 30-40.

[4] Pang B, Lee L. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd annual meeting on association for computational linguistics. 2005: 115-124.

[5] McAuley J, Targett C, Shi Q, Van Den Hengel A. Image-based recommendations on styles and substitutes. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. 2015: 43-52.

[6] Yang Y, Yan Y, Qiu M, Bao F. Semantic analysis and helpfulness prediction of text for online product reviews. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Short Papers). 2015, 2: 38-44.

[7] Kim SM, Pantel P, Chklovski T, Pennacchiotti M. Automatically assessing review helpfulness. In Proceedings of the 2006 Conference on empirical methods in natural language processing. 2006: 423-430.

[8] Ghose A, Ipeirotis PG. Estimating the helpfulness and economic impact of product reviews: Mining text and reviewer characteristics. IEEE transactions on knowledge and data engineering, 2011, 23(10): 1498-1512.

[9] Liu J, Cao Y, Lin CY, Huang Y, Zhou M. Low-quality product review detection in opinion summarization. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). 2007: 334-342.

[10] Mihalcea R, Tarau P. TextRank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing. 2004: 404-411.

[11] Koren Y, Bell R, Volinsky C. Matrix factorization techniques for recommender systems. Computer, 2009, 42(8): 30-37.

[12] Liu B. Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, 2012, 5(1): 1-167.

[13] McAuley J, Leskovec J. Hidden factors and hidden topics: understanding rating dimensions with review text. In Proceedings of the 7th ACM conference on Recommender systems. 2013: 165-172.

[14] Erkan G, Radev DR. LexRank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research, 2004, 22: 457-479.

[15] See A, Liu PJ, Manning CD. Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Long Papers). 2017, 1: 1073-1083.

[16] Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020.

[17] Zhang J, Zhao Y, Saleh M, Liu P. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In International Conference on Machine Learning. PMLR. 2020: 11328-11339.

[18] Liu M, Ma Z, Li J, Wu YC, Wang, X. Deep-Learning-Based Pre-training and Refined Tuning for Web Summarization Software. IEEE Access, 2024, 12: 92120-92129.

[19] Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E. Hierarchical attention networks for document classification. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies. 2016: 1480-1489.

[20] Chen P, Sun Z, Bing L, Yang W. Recurrent attention network on memory for aspect sentiment analysis. In Proceedings of the 2017 conference on empirical methods in natural language processing. 2017: 452-461.

[21] Li W, Xu W, Li C, Xu S, Qin Y, Gao W. A novel transfer learning-based sentiment-aware abstractive summarization model for product reviews. Knowledge-Based Systems, 2019, 171: 148-158.

[22] Devlin J, Chang M. W, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019: 4171-4186.

[23] Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler DM, Wu J, Winter C, Hesse C. Language Models Are Few-Shot Learners. Arxiv.org, 4. 2020.

[24] Pontiki M, Galanis D, Papageorgiou H, Androutsopoulos I, Manandhar S, AL-Smadi M, Al-Ayyoub M, Zhao Y, Qin B, De Clercq O, Hoste V, Apidianaki M, Tannier X, Loukachevitch N, Kotelnikov E, Bel N, Jimenez-Zafra SM, Eryigit G. SemEval-2016 Task 5: Aspect Based Sentiment Analysis. HAL Archives Ouvertes. 2016.

[25] Mudambi SM, Schuff D. Research note: What makes a helpful online review? A study of customer reviews on Amazon. com. MIS quarterly, 2010: 185-200.

[26] Mukherjee S, Popat K, Weikum G. Exploring latent semantic factors to find useful product reviews. In Proceedings of the 2017 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics. 2017: 480-488.

[27] Liu B, Zhang L. A survey of opinion mining and sentiment analysis. In Mining text data. Springer, Boston, MA. 2012: 415-463.

[28] Chen X. Using Big Data Analysis Technology to Analyze the Impact of Household Leverage Ratio on House Price Bubble. In International Conference On Signal And Information Processing, Networking And Computers. Singapore: Springer Nature Singapore. 2021: 900-909.

[29] Hu M, Liu B. Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. 2004: 168-177.

[30] Chen X, Liu M, Niu Y, Wang X, Wu YC. Deep-Learning-Based Lithium Battery Defect Detection via Cross-Domain Generalization. IEEE Access, 2024, 12: 78505-78514

[31] Liu M. Machine Learning Based Graph Mining of Large-scale Network and Optimization. In 2021 2nd International Conference on Artificial Intelligence and Information Systems. 2021: 1-5.

[32] Wang X, Wu YC, Ma Z. Blockchain in the courtroom: exploring its evidentiary significance and procedural implications in US judicial processes. Frontiers in Blockchain, 2024, 7, 1306058.

[33] Ma Z, Chen X, Sun T, Wang X, Wu YC, Zhou M. Blockchain-Based Zero-Trust Supply Chain Security Integrated with Deep Reinforcement Learning for Inventory Optimization. Future Internet, 2024, 16(5): 163.

[34] Wang X, Wu YC, Zhou M, FuH. Beyond Surveillance: Privacy, Ethics, and Regulations in Face Recognition Technology. Frontiers in Big Data, 2024, 7, 1337465.

All published work is licensed under a Creative Commons Attribution 4.0 International License. sitemap
Copyright © 2017 - 2024 Science, Technology, Engineering and Mathematics.   All Rights Reserved.