Predict stock market returns using big data and analytics

ProgrammingST2195
UKUniversity Of London

Predict stock market returns using big data and analytics

Question

The Design Of The Infrastructure

Business summary

People in society are increasingly aware of the benefits of the stock market. This knowledge increases public participation in stock trading, resulting in increased stock returns, sales volumes, and increased threats and risks, especially in cases of loss of monetary value, time, information, and environmental degradation. has been reported. This study investigates the development of a system that leverages large amounts of data and analytics to predict stock market returns. Return is defined as the value derived from the change between the buy and sell prices of a selected stock. The US stock market continued to generate between 7% and 17% of GDP due to heavy selling, even as developed countries sustained turnover due to increased exposure to multiple economic sectors.

Literature Review

Over the past 20 years, stock prices and market returns have increased, as have business capabilities. Obtaining results requires a lot of active and detailed consideration of hidden causal relationships. To apply large datasets, systems involving large datasets of stock returns with thousands of observations have been used. The entire data set of recorded data, including real-time data. The system should use reliable statistics that are realistic and cover real information about stock returns in public stock businesses. Most studies use well-known stocks of popular companies covering the US market. Stock returns from the collected data were used in evaluating the results of public participation in stock businesses (Chapman et al., 2000). The performance of stock businesses must be examined in light of changes in the behavior of people in society. The outcomes predicted by models involve returns and risks, and some predictions can be misleading. Using analytics applied to big data, humans can make stock price restructuring decisions and carry out reviews to directly impact market returns.

Purpose of this study

The purpose of this research is to construct an analytical model system to predict the value of stock business activities by utilizing big data analysis and machine learning. Tools used for data analysis include Power BI and Microsoft Excel. The system is designed to use machine learning techniques and models to improve the accuracy of predicting stock returns by taking stock price and sales volume into account. From this point of view, the results of data analysis by data mining models demonstrate their efficiency as algorithms used to predict consumer behavior in global business, which affects the return on investment. The basis of the system proposed in this study is shown below.

Literature Review

Figure 1: Big data analytics infrastructure

This section describes the analysis and visualization of data and the development of artifacts that represent its usefulness to your organization. The selected software, Power BI, will be used to develop various analytical dashboards and reports that display information that companies can easily apply from the infrastructure designed in Part 1. The actual data generated as an example is shown below in screenshot form.

Figure 1: Sample data used for analysis

The data generated is realistic, taken from www.yahoofinance.com and covers the latest period of 2020 and 2021. The data was selected and several data cleaning steps were performed to make it relevant to the business application. First, the data was passed through functions and procedures that can point out missing data, repeated data, inconsistent data values, and outliers. Data collection occurred in real time. So this is the true picture of the US stock market.

Accuracy, depth, and strength are also factors to consider when considering the usability and reliability of big data. Additionally, the size and distribution of the data is also very important. Data must be reliable and include a sufficient number of observations to eliminate the possibility of bias (Wu, 1997). To improve the validity of the model, the dataset was segmented to form two sets: a test set and a training set. The quality of data processing is improved by avoiding the use of too many variables to build an analytical system and by minimizing the time taken to process information. Nevertheless, the accuracy of the prediction decreases as the use of the variables used decreases.

The formulation of data for use in data analysis ends with the selection of data mining techniques, tools, and algorithms to be applied to artificial intelligence. Data cleansing includes four main activities:

  • Models should use only quantitative (measurable) data in machine learning processes and algorithms.
  • The second step is to remove all outliers to achieve normal spread and eliminate bias.
  • The data must be split into two groups: training data and test data.

Based on the results, we have used the available data to plot a histogram of market prices that indicates the potential for future growth and the likelihood of achieving higher business returns.

Figure 2: Histogram of stock closing prices

In this section, the purpose of the hypothesis is to test the relationship between various variables and stock returns in global business using statistical and visual results. Predictor variables in the dataset include close price, open price, low price, high price, volume, and adjusted close price. Hypotheses are associated with the data set by describing the proposed relationship between each predictor variable and stock returns.

Objective Of This Study

Hypotheses

Using the predictor variables, the hypotheses are stated below.

Close

Alternative Hypothesis H1: Close has is positive predictor of the return on stock exchange business

Null Hypothesis H0: Close does not have is positive predictor of the return on stock exchange business

Open

Alternative Hypothesis H1: Opening Price had positive predictor of the return on stock exchange business

Null Hypothesis H0: Opening Price does not have is positive predictor of the return on stock exchange business

Low

Alternative Hypothesis H1: Low has is positive predictor of the return on stock exchange business

Null Hypothesis H0: Low does not have is positive predictor of the return on stock exchange business

High

Alternative Hypothesis H1: High Price had positive predictor of the return on stock exchange business

Null Hypothesis H0: High Price does not have is positive predictor of the return on stock exchange business

Volume

Alternative Hypothesis H1: Sales Volume has is positive predictor of the return on stock exchange business

Null Hypothesis H0: Sales Volume does not have is positive predictor of the return on stock exchange business

Adjusted Close

Alternative Hypothesis H1: Adjusted Close has is positive predictor of the return on stock exchange business

Null Hypothesis H0: Adjusted Close does not have is positive predictor of the return on stock exchange business

The dataset of return on stock and the sales volume covers the period of between 2020 and 2021. This data is stored in a spreadsheet and is imported into in the Power BI application for the purpose of conducing data analytics as well as visualization. The data shown in the analytics will allow companies to conduct necessary prediction of the return of stock market. The stock business generates the stock return as the dependent variable, generated gave after the cleaning of the original data.

Analysis of Sales

Sales Volume

The analysis shows that the sales volume in this dataset was in the range of between 1 billion a maximum limit of 9 billion. In this period from 2020 to 2021, the stock traders were involved in high degree of public participation and sales volume.

The histogram below represents sales volume experienced during the period,

Histogram of Sales

Figure 3: Histogram of Sales

The gap between the highest and the lowest sales volume in this project is an essential revelation from the evaluation of the data (Gregerson, 2019). The sales volume plays the most important role in realizing stock business return for the stock market. After the increase in the sales volume affecting the stock market return was the adjCls prices of the stocks since third significant variable in producing the stock return. From the spread of the stock business data and the regression analysis, the return of stock market return grows with the rise in the sales volume.

The Development Of A Demonstrable Artefact

AdjCls stock:

Figure 4 below shows the statistical spread of the adjCls prices in the various stocks is given below in the graph.

AdjCls Prices

Figure 4: AdjCls Prices

Figure 4 above indicates that 74% of the adjCls prices were greater than the mean value of all the prices. This implies that approximately 74% of the adjCls stock prices take place when sales volume increase. sales amounts and rating of the sales level in the global business. The high percentage may be the outcome of the positive forecasting of the growth in the prices of stocks in the coming days and the efficiency of some market components like the precision of forecasting models for risk-takers.

Regression

The outcome of regression analysis plotting is presented below for the various predictor variables and the stock market return.

Regression Results Relating Earning on the stock of Open

Figure 5: Regression Results Relating Earning on the stock of Open

Regression Results Relating Earning on the stock of High

Figure 6: Regression Results Relating Earning on the stock of High

Regression Results Relating Earning on the stock of Low

Figure 7: Regression Results Relating Earning on the stock of Low

Regression Results Relating Earning on the stock of Adjusted Close

Figure 8: Regression Results Relating Earning on the stock of Adjusted Close

Regression Results Relating Earning on the stock of Sales Volume

Figure 9: Regression Results Relating Earning on the stock of Sales Volume

Regression Results Relating Earning on the stock of Close

Figure 10: Regression Results Relating Earning on the stock of Close

Discussion of the Hypothesis Tests:

The linear regression analysis carried out on the dataset is presented. The key data element in the regression analysis is the coefficients of association between the stock return and the various stock prices. The test for significance of the tests is done for each of the predictor variable using the p – values.

The sales volume gave positive coefficient (+0.42), indicating that the volume of sales had positive effect on return on investment of stock business market, or the sales volume can be used to give positive prediction of return on stock. The AdjCls stock price gave positive coefficient (+0.26), an indicator that the adjCls price had a is positive predictor of the return on stock exchange business. The stock Close gave positive coefficient of associations, (+0.44). This of the stock return Close had positive effect on return on investment of stock business market, or the Close can be uses as to predict positive values of the return on investment of stock business market. The Low stock price gave a negative coefficient (-0.24).

This is indicating that the Low has a negative effect on return on investment of stock business market, or the Low gives positive prediction of the return on investment of stock business market. The High gave positive coefficient (+0.66). This is an indicator that the High has positive effect on return on investment of stock business market, or the High gives positive prediction of the return on investment of stock business market. The Opening price gave a negative coefficient (-0.20). This is indicating that the Opening price had a negative effect on return on investment of stock business market, or the Opening price gives positive prediction of the return on investment of stock business market.

The Testing Of A Hypothesis

On the test of significance, all the statistical tests with all predictor variables (predictors variables) gave p – values above 0.05, at 95% degree of significance. The indication of this level of significance is that in all predictor variables, the process conducted to test the hypotheses, were not statistically significant and therefore, in all cases, the null hypotheses could not be rejected.

The failure to reject the null hypothesis in all the tests, implies that all predictor variables gave positive association with the return on investment of stock business except the volume of stock sales.

Sales Volume

The sales volume plays a significant role of stock market, being positive predictor of the prices of stock market as well as the return on investment of stock business.

Plotting the Probability Spreads

The normal probability of the dataset was plotted for each variable as shown below.

Probability Spread of Sales Volume

Figure 11: Probability Spread of Sales Volume

For the duration covered in this research, the normal probability spread for all predictor variables as given below.

Probability Spread of Opening Price

Figure 12: Probability Spread of Opening Price

Probability Spread of High

Figure 13: Probability Spread of High

Probability Spread of Low

Figure 14: Probability Spread of Low

Probability Spread of Close

Figure 15: Probability Spread of Close

Probability Spread of AdjCls Price

Figure 16: Probability Spread of AdjCls Price

Legal and ethical factors go hand in hand and cannot be separated. As it currently stands, there are chances of violation of the rights to privacy and freedom of expression among the stakeholders of the proposed system. First, the proposed system intends to use real time data that relates to people in the society, as well as disclosure of financial details of companies. Considering that the future development of the proposed system may include machine learning, there will be high probabilities of violation of data privacy and confidentiality bond. The legal aspects of research work requires that the entity that is interested in data acquisition must seek the consent of the company or individuals whose data they intend to use. Failure to comply with the consent requirement may lead to legal suits. To mitigate the risk of non-compliance with the legal and ethical concerns, the company dealing with the research needs to be conversant with the legal requirements regarding information and data management related to third parties (Zorzybsky, 1996). For the proposed system, the project uses data belonging to another party, and the analysis and interpretation. Additional concerns that may arise from machine learning is about intellectual property breach. For example, if the research is completed, there is need to patent the research in order to retain the rightful ownership of its originality.

At the point of development of the models and algorithms of machine learning, the predictor variables were applied for estimating their coefficients of association with the return on investment with stock business market to stabilize the future stock prices and return on investments. The fluctuation in the prices of stocks of stock business market, the probability of losses, and the acceptable amount of losses aids in classifying quality of the analysis of the outcomes as seen in the forecasted visual output of loss and volume of sales.

Outcome of the regression analysis included the forecasted effect of losses.

Model Fit

In understanding of the historical and the current values of stock, the proposed system will be able to calculates and predict the future prices of the stock and the future returns on stock trading. It will involve the classification of data in a way to neutralize the non-systematic components of each of the observation. To estimate the short-term return on investment with stock market, the system will be able to apply model fitting.

Model building

The model is built through separating of the data to permit the formation and control of the analytical model and to control the model. The two separate datasets are the training and the training data. set for all objective variable of the forecasted empirical models (Breiman, 2001). After choosing the models, the datasets are trained and the forecasted before using the test data. The system will use the probability and impact of losses as the testing values of the parameter. The return produced from investment in stock was produced in every set of data, being a function of the close and open prices within the datasets.

After constructing the system, the performance of the model is evaluated to gauge the precision of the model for use in grouping tasks, but the report is analysed for the model’s results. The quality of performance of the model is measured in the percentage error rate in the regression functions model.

Conclusion

From the data collected in the NYSE, the daily sales volumes of the stock between 2020 and 2021 were in the range of billions of US Dollars. The observation of the price trend in the outcome supports the process of decision-making and the business leadership in stock market and the whole corporate world to improve the returns on the stocks exchange business and produce competitive prices. This research showed the opportunity to forecast stock business return with algorithms of machine learning, tools, and methods. This study demonstrated the role of analytics in support of decision-making and in business performance, having a substantial effect on return on investment of stock business market. The outcome further demonstrated that through the combination of several attributes of stock business such as the sales volume and the adjusted close, there is an increase by about 72% in the accuracy level by which investors estimate the stock return. This study was successful in using the accessed dataset to forecast the actual effect of the stock prices and sales volume on the stock return.

References

Breiman L (2001). Statistical Modelling: The Two Cultures. Institute for Mathematical Statics. Available at: http://www2.math.uu.se/~thulin/mm/breiman.pdf)

Chapman P. et al. (2000). CRISP-DM 1.0: Step by Step Data Mining Guide. Available at: https://pdfs.semanticscholar.org/5406/1a4aa0cb241a726f54d0569efae1c13aab3a.pdf?.

Gregerson, D. (2019). AI Hierarchy of Needs. slideshare. June. Available at: https://www.slideshare.net/DylanGregersen/data-science-hierarchy-of-needs(accessed 18-09-2021).

Wu, C. F. J. (1997). Statistics = Data Science? (inaugural lecture entitled “Stat******ics = Data Science?”for his appointment to the H. C. Carver Professorship at the University of Michigan).

Zorzybsky A. (1996). On Structure, In Science and Sanity: An Introduction to Non-Ar******otelian Systems and General Semantics, CD-ROM, ed. Charlotte Schuchardt-Read. Englewood, NJ: Institute of General Semantics. Available at: http://esgs.free.fr/uk/art/sands.htm(accessed 20-09-2021).

Calculate the price of your order

Select your paper details and see how much our professional writing services will cost.

We`ll send you the first draft for approval by at
Price: $36
  • Freebies
  • Format
  • Formatting (MLA, APA, Chicago, custom, etc.)
  • Title page & bibliography
  • 24/7 customer support
  • Amendments to your paper when they are needed
  • Chat with your writer
  • 275 word/double-spaced page
  • 12 point Arial/Times New Roman
  • Double, single, and custom spacing
  • We care about originality

    Our custom human-written papers from top essay writers are always free from plagiarism.

  • We protect your privacy

    Your data and payment info stay secured every time you get our help from an essay writer.

  • You control your money

    Your money is safe with us. If your plans change, you can get it sent back to your card.

How it works

  1. 1
    You give us the details
    Complete a brief order form to tell us what kind of paper you need.
  2. 2
    We find you a top writer
    One of the best experts in your discipline starts working on your essay.
  3. 3
    You get the paper done
    Enjoy writing that meets your demands and high academic standards!

Samples from our advanced writers

Check out some essay pieces from our best essay writers before your place an order. They will help you better understand what our service can do for you.

Get your own paper from top experts

Order now

Perks of our essay writing service

We offer more than just hand-crafted papers customized for you. Here are more of our greatest perks.

  • Swift delivery
    Our writing service can deliver your short and urgent papers in just 4 hours!
  • Professional touch
    We find you a pro writer who knows all the ins and outs of your subject.
  • Easy order placing/tracking
    Create a new order and check on its progress at any time in your dashboard.
  • Help with any kind of paper
    Need a PhD thesis, research project, or a two-page essay? For you, we can do it all.
  • Experts in 80+ subjects
    Our pro writers can help you with anything, from nursing to business studies.
  • Calculations and code
    We also do math, write code, and solve problems in 30+ STEM disciplines.

Frequently asked questions

Get instant answers to the questions that students ask most often.

See full FAQ
    See full FAQ

    Take your studies to the next level with our experienced specialists

    Chat
    Hello, my name is Derreck. Kindly drop your inquiry; I will get back to you shortly. (WhatsApp)+254 729 707 173