Science, Technology, Engineering and Mathematics.
Open Access

LLM AND SOCIAL MEDIA FAKE NEWS DETECTION

Download as PDF

Volume 2, Issue 2, Pp 9-14, 2025

DOI: https://doi.org/10.61784/its3017

Author(s)

ChenYe Zhao

Affiliation(s)

Xi'an Jiaotong-Liverpool University, Suzhou 215000, Jiangsu, China.

Corresponding Author

ChenYe Zhao

ABSTRACT

The proliferation of fake news on social media has made automatic detection methods necessary. Traditional approaches often rely on dynamic or unavailable social context, while this study explores text-based classification methods. We empirically evaluated the performance of several classic machine learning models (Logistic Regression, Naive Bayes, Random Forest, Support Vector Machine, XGBoost) and the advanced large language model DeBERTa on a real-world dataset for fake news detection. For the classic models, we used the TF-IDF vectorizer to extract features. The results show that tree-based ensemble methods, especially Random Forest and XGBoost, performed exceptionally well, with accuracy and F1 scores exceeding 99%. In contrast, the DeBERTa model's performance was only slightly better than random guessing (accuracy of 50.32%), which is attributed to catastrophic overfitting due to its large number of parameters (184 million) when trained on a relatively small dataset. This highlights a key challenge in applying powerful LLMs to specialized tasks: their performance is highly dependent on a large amount of high-quality training data. The research findings suggest that in tasks with limited data, robust classic models may be more effective than complex models that require a large amount of data.

KEYWORDS

Fake news detection; Text classification; Machine Learning; Large Language Models

CITE THIS PAPER

ChenYe Zhao. LLM and social media fake news detection. Innovation and Technology Studies. 2025, 2(2): 9-14. DOI: https://doi.org/10.61784/its3017.

REFERENCES

[1] Klyuev V. Fake news filtering: Semantic approaches. 2018 7th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), 2018: 9-15. DOI: 10.1109/icrito.2018.8748506.

[2] Mehta N, Pacheco M L, Goldwasser D. Tackling fake news detection by continually improving social context representations using graph neural networks. Proceedings of the ACL, 2022.

[3] Ma X, Wu J, Xue S, et al. A comprehensive survey on graph anomaly detection with deep learning. IEEE Transactions on Knowledge and Data Engineering, 2021.

[4] Wang B, Ma J, Lin H, et al. Explainable fake news detection with large language model via defense among competing wisdom. Proceedings of the ACM on Web Conference 2024, 2024: 2452-2463.

[5] Thota A, Tilak P, Ahluwalia S, et al. Fake news detection: a deep learning approach. SMU Data Science Review, 2018.

[6] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Advances in Neural Information Processing Systems (NeurIPS), 2017.

[7] Ouyang L, Wu J, Jiang X, et al. Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155, 2022.

[8] Thurner S, Hanel R, Klimek P. Introduction to the Theory of Complex Systems. Oxford University Press, 2018.

[9] Christiano P F, Leike J, Brown T, et al. Deep reinforcement learning from human preferences. Advances in Neural Information Processing Systems, 2017: 4299-4307.

[10] Zhou H, Liu F, Wu J, et al. DrugGPT: A collaborative large language model for drug analysis. Nature Biomedical Engineering, 2025, 9(1): 1-15. DOI: 10.1038/s41551-025-01471-z.

[11] Truveta Staff. Advancing clinical information extraction with LLM-Augmenter. Truveta, 2025, Oct 14.

[12] He P, Liu X, Gao J, et al. DEBERTA: Decoding-enhanced BERT with disentangled attention. Microsoft, 2020.

[13] de Sales M. TF-IDF in Hadoop Part 1: Word frequency in doc. 2009, Dec 31. https://marcellodesales.wordpress.com/2009/12/31/tf-idf-in-hadoop-part-1-word-frequency-in-doc/

[14] Dietterich T. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting and randomization. Machine Learning, 2000, 40: 139-157. DOI: 10.1023/A:1007607513941.

[15] Dilmegani C, Palazo?lu M. The future of large language models. AIMultiple, 2025, Oct 19.

All published work is licensed under a Creative Commons Attribution 4.0 International License. sitemap
Copyright © 2017 - 2025 Science, Technology, Engineering and Mathematics.   All Rights Reserved.