TEXT OPTIMIZATION ANALYSIS FOR THE FINANCIAL CORPUS
Volume 1, Issue 1, pp 34-39
Author(s)
Besufikad Enideg Getnet
Affiliation(s)
School of Computer and Information Engineering, Beijing Technology and Business University, Beijing, 100048, China.
Corresponding Author
Besufikad Enideg Getnet
ABSTRACT
The corpus is a useful tool for the linguistics statistical analysis to check occurrences or validate linguistic rules within a specific language territory. The general corpus is very extensive, such as Google N-Grams Corpus and American National Corpus, etc, while they cannot satisfy the specific need of the financial field, in which some especial financial words always didn’t be included and the text analysis results can’t be good enough for the applications. In this paper, we take the downloaded financial news as the original corpus, and use them as the input of the text classification system. This whole process forms a closed loop to get the optimized corpus. By the simulation for the financial news analysis, we compared the prediction results for the stock tendency between the optimized corpus and the original corpus, the results show the predictions are greatly developed by the optimized corpus.
KEYWORDS
Text classification, Optimization of the corpus, Feature selection methods, Prediction of time series data.
CITE THIS PAPER
Besufikad Enideg Getnet. Text optimization analysis for the financial corpus. Journal of Computer Science and Electrical Engineering. 2019, 1(1): 34-39.
REFERENCES
[1]. Hagenau, Michael, M. Liebmann, and D. Neumann. "Automated news reading: Stock price prediction based on financial news using context-capturing features." Decision Support Systems 55.3(2013):685–697.
[2]. Luss, R., and A. D’Aspremont. "Predicting abnormal returns from news using text classification. Wroking Paper from ORFE." Quantitative Finance (2009).
[3]. Antweiler, Werner, and M. Z. Frank. "Is All That Talk Just Noise? The Information Content of Internet Stock Message Boards." Journal of Finance 59.3(2004):1259-1294.
[4]. Bozhao L, Na C. AND Jing W., "Text categorization system for Stock prediction." International Journal of u- and e- Service, Science and Technology,2015,8(1), pp.4-5.
[5]. Liang, Xun, et al. "Associating stock prices with web financial information time series based on support vector regression."Neurocomputing 115(2013):142-149.
[6]. Ikonomakis, M., S. Kotsiantis, and V. Tampakas. "Text classification using machine learning techniques." Wseas Transactions on Computers4.2(2005):966-974.
[7]. Kaya, M. ? Yasef, and Karsl?gil, M. Elif. "Stock price prediction using financial news articles." Information and Financial Engineering (ICIFE), 2010 2nd IEEE International Conference on IEEE, 2010: 478-482.
[8]. Laursen, A. L., B.,Mousten, AND V., Jensen. "Using an AD-HOC Corpus to Write About Emerging Technologies for Technical Writing and Translation: The Case of Search Engine Optimization”, Professional Communication, IEEE Transactions, 2014,57(1), pp.2-10.
[9]. S. Biber, S. Conrad, and R. Reppen, "Corpus Linguistics: Investigating Language Structure and Use. Cambridge", UK: Cambridge Univ. Press, 1998.
[10]. Li, Xiangdong, and C. Zhang. "Research on enhancing the effectiveness of the Chinese text automatic categorization based on ICTCLAS segmentation method." Software Engineering and Service Science (ICSESS), 2013 4th IEEE International Conference on IEEE, 2013:267-270.
[11]. U?uz, Harun. "A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm." Knowledge-Based Systems 24.7(2011):1024–1032.
[12]. Xu, Y.Q.A New Feature Selection Method Based on Support Vector Machines for Text Categorization, ProQuest Dissertations and Theses ,2006, pp.6-15.
[13]. Cao, Jianfang, and H. Wang. "An improved incremental learning algorithm for text categorization using support vector machine." Journal of Chemical & Pharmaceutical Research (2014).
[14]. Hsu, C. W. AND Lin, C. J. A Simple Decomposition Method for Support Vector Machines ,Machine Learning, 2002,46, pp.291-314.
[15]. Manne, Suneetha, et al. "Features Selection Method for Automatic Text Categorization: A Comparative Study with WEKA and RapidMiner Tools." ICT and Critical Infrastructure: Proceedings of the 48th Annual Convention of Computer Society of India-Vol II. Springer International Publishing, 2014.
[16]. Hussein, Ashraf S., I. M. Hamed, and M. F. Tolba. "An Efficient System for Stock Market Prediction". Intelligent Systems'2014. Springer International Publishing, 2015:871-882.
[17]. Rose, Stuart J., W. E. Cowley, and V. L. Crow. "Systems and Processes for Identifying Features and Determining Feature Associations in Groups of Documents." US, US20130173257 A1. 2013.
[18]. Wolf, Christian, et al. "Evaluation of video activity localizations integrating quality and quantity measurements." Computer Vision & Image Understanding 127.10(2014):14–30.