INSTRUCTIONS
- The purpose of this assignment is to help you familiarized with the foundations of Data Mining through using descriptive statistics and data visualization with R.
- In this assignment we shall use R functions to get hands-on experience to calculate the correlations through graphical and numerical methods.
- We will create heatmap correlation plots for observing the distributions of correlations among those variables and calculate the descriptive statistics among those correlation coefficients.
- Please refer to Chapter 3, R codes for creating Figure 3 and 7 in Data Mining for Business Analytics: Concepts, Techniques, and Applications in R (in this week’s Reading & Resources). You may refer to the publisher website for the open resources to create the same Figures in chapter 3 to get familiarized with the chapter 3 contents.
- Then use the attached dataset NewYorkHousing.csv to answer the following questions. Please also open the second attached file for the sample R codes which you can easily revise to generate the Figures 3.1 to 3.4 and 3.5 to 3.8 required in this assignment:
- Create a heatmap with values (just run the R codes will get it).
- Calculate the minimum, maximum, medium, standard deviation of ALL the correlations, except those correlations which are equal to 1 in the diagonal cells in the heatmap. (Hints: use functions in R instead of finding them in the heatmap visually. Use the summary(cor.mat) will get the min, max and medium, and use function sd() for the standard deviation).
- Create scatterplot matrix (hints: using ggpairs in R) using MDEV with these predictors: INDUS, CHAS, NOX, RM, AGE, DIS, TAX and state which predictor has strongest correlation with MEDV?
- Please copy/paste screen images of your work in R, and put into a Word document for submission. Be sure to provide narrative of your answers (i.e., do not just copy/paste your answers without providing some explanation of what you did or your findings).
- Please make sure you use install.packages(“????”) before you invoke the library(????) otherwise you will have errors.
- Please include Introudction, R codes with outputs, Figures and explanations with cover and reference pages. A good conclusion to wrap up the assignment is also expected.
- Please follow APA format.
References:
The R Guide (http://cran.fhcrc.org/doc/contrib/Owen-TheRGuide.pdf)