library(tidyverse)
library(corrplot)
### read the data
<- read.csv("beyonce.csv")
beyonce
<- beyonce %>%
df select(c("acousticness", "liveness", "danceability", "loudness", "speechiness", "valence"))
Week 4
Week Layout
Tuesday
On Tuesday we focused primarily on the Chi-Square test. There are a number of resources on this website to walk you through that process, but the main point is that the test is a way of looking at whether the counts of data that you’re looking at are significantly different from what you expected.
A discussion of it from David Huron’s Empirical Methods workshop can be found here.
Thursday
On Thursday we spent time with R looking at correlation and regression. There were a couple of goals here:
- To look at how one calculates Pearson’s correlation coefficient.
- To look at correlation, and then regression through R.
- To build functions in R.
We did this with data scraped with the Spotify API. The code of this is below.
First, we load our libraries, and use the select
function from the tidyverse
package.
Then we run the cor
function to look at correlations, and plot it with a pie graph demonstrating the correlation coefficient of each pair of variables.
<- cor(df)
x round(x, 2)
corrplot(x, method="pie")
The goal at this point was to load data and write a function that would look at correlations between the spotify variables in any artist. The following reads the file in.
<- function(filename){
file_reader <- read.csv(file = paste0(filename, ".csv"))
file return(file)}
Then we wrote a combined function that read everything in, and ran the correlation inside the function.
<- function(artist){
artist_data <- read.csv(file = paste0(artist, ".csv"))
artist <- artist %>%
df select(c("acousticness", "liveness", "danceability", "loudness", "speechiness", "valence"))
<- cor(df)
x round(x, 2)
corrplot(x, method="pie")
}
At this point, we just played around with some regression models, and discussed what the output meant.
summary(lm(valence ~ loudness, data=beyonce))
summary(lm(valence ~ loudness + acousticness, data=beyonce))
summary(lm(valence ~ danceability, data=beyonce))
### is the data normal?
qqnorm(beyonce$danceability)
hist(beyonce$danceability)
shapiro.test(beyonce$danceability)
ks.test(beyonce$danceability, "pnorm")