Use the appropriate function to read the following data into R as a data frame named lab7.data: CLICK HERE Download CLICK HERE
The data set represents several attributes on data breaches across several organizations. It includes 500 observations and 10 variables. The names and descriptions of each variable in the data set is provided below.
1.
Use the best subsets approach to determine which variable(s) would best predict the cost of controls. Please be sure to exclude categorical variables such as event_ID.
NOTE: Please note that predictors is being used as a placeholder for the actual predictors in your model. In your answer below, make sure you replace all the blanks, such as [1] and [2], with the correct syntax so that the lines of code work. Make sure you also include the variable names of your predictors in place of predictors.
bestsubsets = [1]([2]~ predictors, data = [3], [4])
[5]([6], [7] = “adjr2”)
2.
The single best one-variable model includes which of the following variables?
Group of answer choices data_type
num_people
num_people_v2
num_records
per_sensitive
per_sensitive_v2
dys_impact
dys_detect
3.
The single best two-variable model includes which of the following variables?
Group of answer choices data_type
num_people
num_people_v2
num_records
per_sensitive
per_sensitive_v2
dys_impact
dys_detect
4.
The single best three-variable model includes which of the following variables?
Group of answer choices data_type
num_people
num_people_v2
num_records
per_sensitive
per_sensitive_v2
dys_impact
dys_detect
5.
The single best four-variable model includes which of the following variables?
Group of answer choices data_type
num_people
num_people_v2
num_records
per_sensitive
per_sensitive_v2
dys_impact
dys_detect
6.
Run five separate regression models that represent the five models shown in the best subsets plot in R. Number your models sequentially from Model 1 to 5 based on the number of predictors it includes. Provide the Adjusted R2 values for each of your five models below.
Note: Please the report the values as displayed in R. Do not round them.
Model 1:
Model 2:
Model 3:
Model 4:
Model 5:
7.
After examining the significance of the predictors in each model and their Adjusted R2, which of the following models provides the best fit for predicting the cost of controls?
Group of answer choices Model 1
Model 2
Model 3
Model 4
Model 5
Model 6
8.
Evaluate Model 5 for multicollinearity and provide the estimates below.
The highest VIF among your predictors is:
The lowest tolerance among your predictors is:
Note: Please report each of these values as displayed in R. Do not round them.
9.
Based on the results from your calculations for the tolerance, you can conclude that for Model 5 there is:
Group of answer choices a potential concern for multicollinearity.
a serious concern for multicollinearity.
no concern for multicollinearity.
10.
Based on the results from your calculations for the VIF, you can conclude that for Model 5 there is:
Group of answer choices a concern for multicollinearity.
no concern for multicollinearity.
11.
Use R to generate a correlation matrix for the predictors used in Model 5. Based on your results, the strongest correlation can be found between which of the following two predictors?
Group of answer choices num_people
num_records
per_sensitive
dys_impact
dys_detect
cost_controls
12.
The value of the strongest correlation between your predictors is:
Select your paper details and see how much our professional writing services will cost.
Our custom human-written papers from top essay writers are always free from plagiarism.
Your data and payment info stay secured every time you get our help from an essay writer.
Your money is safe with us. If your plans change, you can get it sent back to your card.
We offer more than just hand-crafted papers customized for you. Here are more of our greatest perks.
Get instant answers to the questions that students ask most often.
See full FAQ