|UK||University Of London|
People in society are increasingly aware of the benefits of the stock market. This knowledge increases public participation in stock trading, resulting in increased stock returns, sales volumes, and increased threats and risks, especially in cases of loss of monetary value, time, information, and environmental degradation. has been reported. This study investigates the development of a system that leverages large amounts of data and analytics to predict stock market returns. Return is defined as the value derived from the change between the buy and sell prices of a selected stock. The US stock market continued to generate between 7% and 17% of GDP due to heavy selling, even as developed countries sustained turnover due to increased exposure to multiple economic sectors.
Over the past 20 years, stock prices and market returns have increased, as have business capabilities. Obtaining results requires a lot of active and detailed consideration of hidden causal relationships. To apply large datasets, systems involving large datasets of stock returns with thousands of observations have been used. The entire data set of recorded data, including real-time data. The system should use reliable statistics that are realistic and cover real information about stock returns in public stock businesses. Most studies use well-known stocks of popular companies covering the US market. Stock returns from the collected data were used in evaluating the results of public participation in stock businesses (Chapman et al., 2000). The performance of stock businesses must be examined in light of changes in the behavior of people in society. The outcomes predicted by models involve returns and risks, and some predictions can be misleading. Using analytics applied to big data, humans can make stock price restructuring decisions and carry out reviews to directly impact market returns.
Purpose of this study
The purpose of this research is to construct an analytical model system to predict the value of stock business activities by utilizing big data analysis and machine learning. Tools used for data analysis include Power BI and Microsoft Excel. The system is designed to use machine learning techniques and models to improve the accuracy of predicting stock returns by taking stock price and sales volume into account. From this point of view, the results of data analysis by data mining models demonstrate their efficiency as algorithms used to predict consumer behavior in global business, which affects the return on investment. The basis of the system proposed in this study is shown below.
Figure 1: Big data analytics infrastructure
This section describes the analysis and visualization of data and the development of artifacts that represent its usefulness to your organization. The selected software, Power BI, will be used to develop various analytical dashboards and reports that display information that companies can easily apply from the infrastructure designed in Part 1. The actual data generated as an example is shown below in screenshot form.
Figure 1: Sample data used for analysis
The data generated is realistic, taken from www.yahoofinance.com and covers the latest period of 2020 and 2021. The data was selected and several data cleaning steps were performed to make it relevant to the business application. First, the data was passed through functions and procedures that can point out missing data, repeated data, inconsistent data values, and outliers. Data collection occurred in real time. So this is the true picture of the US stock market.
Accuracy, depth, and strength are also factors to consider when considering the usability and reliability of big data. Additionally, the size and distribution of the data is also very important. Data must be reliable and include a sufficient number of observations to eliminate the possibility of bias (Wu, 1997). To improve the validity of the model, the dataset was segmented to form two sets: a test set and a training set. The quality of data processing is improved by avoiding the use of too many variables to build an analytical system and by minimizing the time taken to process information. Nevertheless, the accuracy of the prediction decreases as the use of the variables used decreases.
The formulation of data for use in data analysis ends with the selection of data mining techniques, tools, and algorithms to be applied to artificial intelligence. Data cleansing includes four main activities:
Based on the results, we have used the available data to plot a histogram of market prices that indicates the potential for future growth and the likelihood of achieving higher business returns.
Figure 2: Histogram of stock closing prices
In this section, the purpose of the hypothesis is to test the relationship between various variables and stock returns in global business using statistical and visual results. Predictor variables in the dataset include close price, open price, low price, high price, volume, and adjusted close price. Hypotheses are associated with the data set by describing the proposed relationship between each predictor variable and stock returns.
Using the predictor variables, the hypotheses are stated below.
Alternative Hypothesis H1: Close has is positive predictor of the return on stock exchange business
Null Hypothesis H0: Close does not have is positive predictor of the return on stock exchange business
Alternative Hypothesis H1: Opening Price had positive predictor of the return on stock exchange business
Null Hypothesis H0: Opening Price does not have is positive predictor of the return on stock exchange business
Alternative Hypothesis H1: Low has is positive predictor of the return on stock exchange business
Null Hypothesis H0: Low does not have is positive predictor of the return on stock exchange business
Alternative Hypothesis H1: High Price had positive predictor of the return on stock exchange business
Null Hypothesis H0: High Price does not have is positive predictor of the return on stock exchange business
Alternative Hypothesis H1: Sales Volume has is positive predictor of the return on stock exchange business
Null Hypothesis H0: Sales Volume does not have is positive predictor of the return on stock exchange business
Alternative Hypothesis H1: Adjusted Close has is positive predictor of the return on stock exchange business
Null Hypothesis H0: Adjusted Close does not have is positive predictor of the return on stock exchange business
The dataset of return on stock and the sales volume covers the period of between 2020 and 2021. This data is stored in a spreadsheet and is imported into in the Power BI application for the purpose of conducing data analytics as well as visualization. The data shown in the analytics will allow companies to conduct necessary prediction of the return of stock market. The stock business generates the stock return as the dependent variable, generated gave after the cleaning of the original data.
Analysis of Sales
The analysis shows that the sales volume in this dataset was in the range of between 1 billion a maximum limit of 9 billion. In this period from 2020 to 2021, the stock traders were involved in high degree of public participation and sales volume.
The histogram below represents sales volume experienced during the period,
Figure 3: Histogram of Sales
The gap between the highest and the lowest sales volume in this project is an essential revelation from the evaluation of the data (Gregerson, 2019). The sales volume plays the most important role in realizing stock business return for the stock market. After the increase in the sales volume affecting the stock market return was the adjCls prices of the stocks since third significant variable in producing the stock return. From the spread of the stock business data and the regression analysis, the return of stock market return grows with the rise in the sales volume.
Figure 4 below shows the statistical spread of the adjCls prices in the various stocks is given below in the graph.
Figure 4: AdjCls Prices
Figure 4 above indicates that 74% of the adjCls prices were greater than the mean value of all the prices. This implies that approximately 74% of the adjCls stock prices take place when sales volume increase. sales amounts and rating of the sales level in the global business. The high percentage may be the outcome of the positive forecasting of the growth in the prices of stocks in the coming days and the efficiency of some market components like the precision of forecasting models for risk-takers.
The outcome of regression analysis plotting is presented below for the various predictor variables and the stock market return.
Figure 5: Regression Results Relating Earning on the stock of Open
Figure 6: Regression Results Relating Earning on the stock of High
Figure 7: Regression Results Relating Earning on the stock of Low
Figure 8: Regression Results Relating Earning on the stock of Adjusted Close
Figure 9: Regression Results Relating Earning on the stock of Sales Volume
Figure 10: Regression Results Relating Earning on the stock of Close
Discussion of the Hypothesis Tests:
The linear regression analysis carried out on the dataset is presented. The key data element in the regression analysis is the coefficients of association between the stock return and the various stock prices. The test for significance of the tests is done for each of the predictor variable using the p – values.
The sales volume gave positive coefficient (+0.42), indicating that the volume of sales had positive effect on return on investment of stock business market, or the sales volume can be used to give positive prediction of return on stock. The AdjCls stock price gave positive coefficient (+0.26), an indicator that the adjCls price had a is positive predictor of the return on stock exchange business. The stock Close gave positive coefficient of associations, (+0.44). This of the stock return Close had positive effect on return on investment of stock business market, or the Close can be uses as to predict positive values of the return on investment of stock business market. The Low stock price gave a negative coefficient (-0.24).
This is indicating that the Low has a negative effect on return on investment of stock business market, or the Low gives positive prediction of the return on investment of stock business market. The High gave positive coefficient (+0.66). This is an indicator that the High has positive effect on return on investment of stock business market, or the High gives positive prediction of the return on investment of stock business market. The Opening price gave a negative coefficient (-0.20). This is indicating that the Opening price had a negative effect on return on investment of stock business market, or the Opening price gives positive prediction of the return on investment of stock business market.
On the test of significance, all the statistical tests with all predictor variables (predictors variables) gave p – values above 0.05, at 95% degree of significance. The indication of this level of significance is that in all predictor variables, the process conducted to test the hypotheses, were not statistically significant and therefore, in all cases, the null hypotheses could not be rejected.
The failure to reject the null hypothesis in all the tests, implies that all predictor variables gave positive association with the return on investment of stock business except the volume of stock sales.
The sales volume plays a significant role of stock market, being positive predictor of the prices of stock market as well as the return on investment of stock business.
Plotting the Probability Spreads
The normal probability of the dataset was plotted for each variable as shown below.
Figure 11: Probability Spread of Sales Volume
For the duration covered in this research, the normal probability spread for all predictor variables as given below.
Figure 12: Probability Spread of Opening Price
Figure 13: Probability Spread of High
Figure 14: Probability Spread of Low
Figure 15: Probability Spread of Close
Figure 16: Probability Spread of AdjCls Price
Legal and ethical factors go hand in hand and cannot be separated. As it currently stands, there are chances of violation of the rights to privacy and freedom of expression among the stakeholders of the proposed system. First, the proposed system intends to use real time data that relates to people in the society, as well as disclosure of financial details of companies. Considering that the future development of the proposed system may include machine learning, there will be high probabilities of violation of data privacy and confidentiality bond. The legal aspects of research work requires that the entity that is interested in data acquisition must seek the consent of the company or individuals whose data they intend to use. Failure to comply with the consent requirement may lead to legal suits. To mitigate the risk of non-compliance with the legal and ethical concerns, the company dealing with the research needs to be conversant with the legal requirements regarding information and data management related to third parties (Zorzybsky, 1996). For the proposed system, the project uses data belonging to another party, and the analysis and interpretation. Additional concerns that may arise from machine learning is about intellectual property breach. For example, if the research is completed, there is need to patent the research in order to retain the rightful ownership of its originality.
At the point of development of the models and algorithms of machine learning, the predictor variables were applied for estimating their coefficients of association with the return on investment with stock business market to stabilize the future stock prices and return on investments. The fluctuation in the prices of stocks of stock business market, the probability of losses, and the acceptable amount of losses aids in classifying quality of the analysis of the outcomes as seen in the forecasted visual output of loss and volume of sales.
Outcome of the regression analysis included the forecasted effect of losses.
In understanding of the historical and the current values of stock, the proposed system will be able to calculates and predict the future prices of the stock and the future returns on stock trading. It will involve the classification of data in a way to neutralize the non-systematic components of each of the observation. To estimate the short-term return on investment with stock market, the system will be able to apply model fitting.
The model is built through separating of the data to permit the formation and control of the analytical model and to control the model. The two separate datasets are the training and the training data. set for all objective variable of the forecasted empirical models (Breiman, 2001). After choosing the models, the datasets are trained and the forecasted before using the test data. The system will use the probability and impact of losses as the testing values of the parameter. The return produced from investment in stock was produced in every set of data, being a function of the close and open prices within the datasets.
After constructing the system, the performance of the model is evaluated to gauge the precision of the model for use in grouping tasks, but the report is analysed for the model’s results. The quality of performance of the model is measured in the percentage error rate in the regression functions model.
From the data collected in the NYSE, the daily sales volumes of the stock between 2020 and 2021 were in the range of billions of US Dollars. The observation of the price trend in the outcome supports the process of decision-making and the business leadership in stock market and the whole corporate world to improve the returns on the stocks exchange business and produce competitive prices. This research showed the opportunity to forecast stock business return with algorithms of machine learning, tools, and methods. This study demonstrated the role of analytics in support of decision-making and in business performance, having a substantial effect on return on investment of stock business market. The outcome further demonstrated that through the combination of several attributes of stock business such as the sales volume and the adjusted close, there is an increase by about 72% in the accuracy level by which investors estimate the stock return. This study was successful in using the accessed dataset to forecast the actual effect of the stock prices and sales volume on the stock return.
Breiman L (2001). Statistical Modelling: The Two Cultures. Institute for Mathematical Statics. Available at: http://www2.math.uu.se/~thulin/mm/breiman.pdf)
Chapman P. et al. (2000). CRISP-DM 1.0: Step by Step Data Mining Guide. Available at: https://pdfs.semanticscholar.org/5406/1a4aa0cb241a726f54d0569efae1c13aab3a.pdf?.
Gregerson, D. (2019). AI Hierarchy of Needs. slideshare. June. Available at: https://www.slideshare.net/DylanGregersen/data-science-hierarchy-of-needs(accessed 18-09-2021).
Wu, C. F. J. (1997). Statistics = Data Science? (inaugural lecture entitled “Stat******ics = Data Science?”for his appointment to the H. C. Carver Professorship at the University of Michigan).
Zorzybsky A. (1996). On Structure, In Science and Sanity: An Introduction to Non-Ar******otelian Systems and General Semantics, CD-ROM, ed. Charlotte Schuchardt-Read. Englewood, NJ: Institute of General Semantics. Available at: http://esgs.free.fr/uk/art/sands.htm(accessed 20-09-2021).
Select your paper details and see how much our professional writing services will cost.
Our custom human-written papers from top essay writers are always free from plagiarism.
Your data and payment info stay secured every time you get our help from an essay writer.
Your money is safe with us. If your plans change, you can get it sent back to your card.
Check out some essay pieces from our best essay writers before your place an order. They will help you better understand what our service can do for you.
We offer more than just hand-crafted papers customized for you. Here are more of our greatest perks.