r - Cramer's V with missing values gives different results -


my questions concern calculation of cramers v detect correlation between categorial variables. 've got dataset missing values, created fake dataset illustration 2 variables , b, 1 of them containing na's.

 <-  factor(c("m","","f","f","","m","f","f")) a2 <- factor(a, levels = c('m','f'),labels =c('male','female')) b <- factor(c("y","y","","y","n","n","n","y")) b2 <- factor(b, levels=c("y","n"),labels=c("yes","no")) df<-cbind(a2,b2) 

the assocstats function gives me result cramers v:

require(vcd) > tab <-table(a,b) > assocstats(tab)                             x^2 df p(> x^2) likelihood ratio 1.7261  4  0.78597 pearson          1.3333  4  0.85570  phi-coefficient   : 0.408  contingency coeff.: 0.378  cramer's v        : 0.289 

now want drop na's levels

   a[a==""]<-na     a3 <- droplevels(a)     levels(a3)      tab <-table(a,b)     assocstats(tab) 

but everytime remove na's result looks this:

                    x^2 df p(> x^2) likelihood ratio 0.13844  2  0.93312 pearson              nan  2      nan  phi-coefficient   : nan  contingency coeff.: nan  cramer's v        : nan  

also, because have large dataset calculate matrix of cramer v results. found code here on stack overflow , seems work...

get.v<-function(y){   col.y<-ncol(y)   v<-matrix(ncol=col.y,nrow=col.y)   for(i in 1:col.y){     for(j in 1:col.y){       v[i,j]<-assocstats(table(y[,i],y[,j]))$cramer     }   }   return(v) }     get.v(tab) 

only result different assocstats function:

   [,1] [,2] [,3] [1,]  1.0  0.5    1 [2,]  0.5  1.0    1 [3,]  1.0  1.0    1 

this can not right, because result every time, when changing number of observations... wrong code?

conclusion:i don't know 1 of result right. have large dataset lot of na's in it. first asocstat result , code give different results, altough there no big difference,because code creates matrix. second asocstat function gives nan.i cant detect errors... can me?

you don't have replace "" na if using factors--any unique value don't define in levels converted na factor

a <-  factor(c("m","","f","f","","m","f","f")) a2 <- factor(a, levels = c('m','f'),labels =c('male','female'))  # [1] m   f f   m f f # levels:  f m a2 # [1] male   <na>   female female <na>   male   female female # levels: male female  b <- factor(c("y","y","","y","n","n","n","y")) b2 <- factor(b, levels=c("y","n"),labels=c("yes","no"))   (df <- cbind(a2,b2))  #      a2 b2 # [1,]  1  1 # [2,] na  1 # [3,]  2 na # [4,]  2  1 # [5,] na  2 # [6,]  1  2 # [7,]  2  2 # [8,]  2  1 

above, you're creating matrix loses labels created factor. think want data frame:

(df <- data.frame(a2,b2))  #       a2   b2 # 1   male  yes # 2   <na>  yes # 3 female <na> # 4 female  yes # 5   <na>   no # 6   male   no # 7 female   no # 8 female  yes   require('vcd') (tab <- table(a2,b2, usena = 'ifany')) #          b2 # a2       yes no <na> #   male     1  1    0 #   female   2  1    1 #   <na>     1  1    0  (tab <- table(a2,b2)) #          b2 # a2       yes no #   male     1  1 #   female   2  1 

you need explicitly tell table if want see na values in table. otherwise, drop them default "excluding" them when use assocstats:

assocstats(tab)  #                      x^2 df p(> x^2) # likelihood ratio 0.13844  1  0.70983 # pearson          0.13889  1  0.70939 #  # phi-coefficient   : 0.167  # contingency coeff.: 0.164  # cramer's v        : 0.167  

for get.v pass data frame or matrix, not table:

get.v <- function(y) {   col.y <- ncol(y)   v <- matrix(ncol=col.y,nrow=col.y)   for(i in 1:col.y){     for(j in 1:col.y){       v[i,j] <- assocstats(table(y[,i],y[,j]))$cramer     }   }   return(v) }   get.v(df)  #           [,1]      [,2] # [1,] 1.0000000 0.1666667 # [2,] 0.1666667 1.0000000 

Comments

Popular posts from this blog

c++ - Delete matches in OpenCV (Keypoints and descriptors) -

java - Could not locate OpenAL library -

sorting - opencl Bitonic sort with 64 bits keys -