r - Cramer's V with missing values gives different results -
my questions concern calculation of cramers v detect correlation between categorial variables. 've got dataset missing values, created fake dataset illustration 2 variables , b, 1 of them containing na's.
<- factor(c("m","","f","f","","m","f","f")) a2 <- factor(a, levels = c('m','f'),labels =c('male','female')) b <- factor(c("y","y","","y","n","n","n","y")) b2 <- factor(b, levels=c("y","n"),labels=c("yes","no")) df<-cbind(a2,b2)
the assocstats function gives me result cramers v:
require(vcd) > tab <-table(a,b) > assocstats(tab) x^2 df p(> x^2) likelihood ratio 1.7261 4 0.78597 pearson 1.3333 4 0.85570 phi-coefficient : 0.408 contingency coeff.: 0.378 cramer's v : 0.289
now want drop na's levels
a[a==""]<-na a3 <- droplevels(a) levels(a3) tab <-table(a,b) assocstats(tab)
but everytime remove na's result looks this:
x^2 df p(> x^2) likelihood ratio 0.13844 2 0.93312 pearson nan 2 nan phi-coefficient : nan contingency coeff.: nan cramer's v : nan
also, because have large dataset calculate matrix of cramer v results. found code here on stack overflow , seems work...
get.v<-function(y){ col.y<-ncol(y) v<-matrix(ncol=col.y,nrow=col.y) for(i in 1:col.y){ for(j in 1:col.y){ v[i,j]<-assocstats(table(y[,i],y[,j]))$cramer } } return(v) } get.v(tab)
only result different assocstats function:
[,1] [,2] [,3] [1,] 1.0 0.5 1 [2,] 0.5 1.0 1 [3,] 1.0 1.0 1
this can not right, because result every time, when changing number of observations... wrong code?
conclusion:i don't know 1 of result right. have large dataset lot of na's in it. first asocstat result , code give different results, altough there no big difference,because code creates matrix. second asocstat function gives nan.i cant detect errors... can me?
you don't have replace ""
na
if using factors--any unique value don't define in levels
converted na
factor
a <- factor(c("m","","f","f","","m","f","f")) a2 <- factor(a, levels = c('m','f'),labels =c('male','female')) # [1] m f f m f f # levels: f m a2 # [1] male <na> female female <na> male female female # levels: male female b <- factor(c("y","y","","y","n","n","n","y")) b2 <- factor(b, levels=c("y","n"),labels=c("yes","no")) (df <- cbind(a2,b2)) # a2 b2 # [1,] 1 1 # [2,] na 1 # [3,] 2 na # [4,] 2 1 # [5,] na 2 # [6,] 1 2 # [7,] 2 2 # [8,] 2 1
above, you're creating matrix loses labels created factor
. think want data frame:
(df <- data.frame(a2,b2)) # a2 b2 # 1 male yes # 2 <na> yes # 3 female <na> # 4 female yes # 5 <na> no # 6 male no # 7 female no # 8 female yes require('vcd') (tab <- table(a2,b2, usena = 'ifany')) # b2 # a2 yes no <na> # male 1 1 0 # female 2 1 1 # <na> 1 1 0 (tab <- table(a2,b2)) # b2 # a2 yes no # male 1 1 # female 2 1
you need explicitly tell table
if want see na
values in table. otherwise, drop them default "excluding" them when use assocstats
:
assocstats(tab) # x^2 df p(> x^2) # likelihood ratio 0.13844 1 0.70983 # pearson 0.13889 1 0.70939 # # phi-coefficient : 0.167 # contingency coeff.: 0.164 # cramer's v : 0.167
for get.v
pass data frame or matrix, not table:
get.v <- function(y) { col.y <- ncol(y) v <- matrix(ncol=col.y,nrow=col.y) for(i in 1:col.y){ for(j in 1:col.y){ v[i,j] <- assocstats(table(y[,i],y[,j]))$cramer } } return(v) } get.v(df) # [,1] [,2] # [1,] 1.0000000 0.1666667 # [2,] 0.1666667 1.0000000
Comments
Post a Comment