r - Cramer's V with missing values gives different results -

- January 15, 2015

my questions concern calculation of cramers v detect correlation between categorial variables. 've got dataset missing values, created fake dataset illustration 2 variables , b, 1 of them containing na's.

 <-  factor(c("m","","f","f","","m","f","f")) a2 <- factor(a, levels = c('m','f'),labels =c('male','female')) b <- factor(c("y","y","","y","n","n","n","y")) b2 <- factor(b, levels=c("y","n"),labels=c("yes","no")) df<-cbind(a2,b2)

the assocstats function gives me result cramers v:

require(vcd) > tab <-table(a,b) > assocstats(tab)                             x^2 df p(> x^2) likelihood ratio 1.7261  4  0.78597 pearson          1.3333  4  0.85570  phi-coefficient   : 0.408  contingency coeff.: 0.378  cramer's v        : 0.289

now want drop na's levels

   a[a==""]<-na     a3 <- droplevels(a)     levels(a3)      tab <-table(a,b)     assocstats(tab)

but everytime remove na's result looks this:

                    x^2 df p(> x^2) likelihood ratio 0.13844  2  0.93312 pearson              nan  2      nan  phi-coefficient   : nan  contingency coeff.: nan  cramer's v        : nan

also, because have large dataset calculate matrix of cramer v results. found code here on stack overflow , seems work...

get.v<-function(y){   col.y<-ncol(y)   v<-matrix(ncol=col.y,nrow=col.y)   for(i in 1:col.y){     for(j in 1:col.y){       v[i,j]<-assocstats(table(y[,i],y[,j]))$cramer     }   }   return(v) }     get.v(tab)

only result different assocstats function:

   [,1] [,2] [,3] [1,]  1.0  0.5    1 [2,]  0.5  1.0    1 [3,]  1.0  1.0    1

this can not right, because result every time, when changing number of observations... wrong code?

conclusion:i don't know 1 of result right. have large dataset lot of na's in it. first asocstat result , code give different results, altough there no big difference,because code creates matrix. second asocstat function gives nan.i cant detect errors... can me?

you don't have replace "" na if using factors--any unique value don't define in levels converted na factor

a <-  factor(c("m","","f","f","","m","f","f")) a2 <- factor(a, levels = c('m','f'),labels =c('male','female'))  # [1] m   f f   m f f # levels:  f m a2 # [1] male   <na>   female female <na>   male   female female # levels: male female  b <- factor(c("y","y","","y","n","n","n","y")) b2 <- factor(b, levels=c("y","n"),labels=c("yes","no"))   (df <- cbind(a2,b2))  #      a2 b2 # [1,]  1  1 # [2,] na  1 # [3,]  2 na # [4,]  2  1 # [5,] na  2 # [6,]  1  2 # [7,]  2  2 # [8,]  2  1

above, you're creating matrix loses labels created factor. think want data frame:

(df <- data.frame(a2,b2))  #       a2   b2 # 1   male  yes # 2   <na>  yes # 3 female <na> # 4 female  yes # 5   <na>   no # 6   male   no # 7 female   no # 8 female  yes   require('vcd') (tab <- table(a2,b2, usena = 'ifany')) #          b2 # a2       yes no <na> #   male     1  1    0 #   female   2  1    1 #   <na>     1  1    0  (tab <- table(a2,b2)) #          b2 # a2       yes no #   male     1  1 #   female   2  1

you need explicitly tell table if want see na values in table. otherwise, drop them default "excluding" them when use assocstats:

assocstats(tab)  #                      x^2 df p(> x^2) # likelihood ratio 0.13844  1  0.70983 # pearson          0.13889  1  0.70939 #  # phi-coefficient   : 0.167  # contingency coeff.: 0.164  # cramer's v        : 0.167

for get.v pass data frame or matrix, not table:

get.v <- function(y) {   col.y <- ncol(y)   v <- matrix(ncol=col.y,nrow=col.y)   for(i in 1:col.y){     for(j in 1:col.y){       v[i,j] <- assocstats(table(y[,i],y[,j]))$cramer     }   }   return(v) }   get.v(df)  #           [,1]      [,2] # [1,] 1.0000000 0.1666667 # [2,] 0.1666667 1.0000000

Search This Blog

Print F

r - Cramer's V with missing values gives different results -

Comments

Post a Comment

Popular posts from this blog

node.js - How to mock a third-party api calls in the backend -

node.js - Why do I get "SOCKS connection failed. Connection not allowed by ruleset" for some .onion sites? -

matlab - 0-by-1 sym - What do I need to change in order to get proper symbolic results? -