text cleaning in R -


i have single column in r looks this:

path column ag.1.4->ao.5.5->iv.9.12->ag.4.35 ao.11.234->iv.345.455.1.2->ag.9.531 

i want transform into:

path column ag->ao->iv->ag ao->iv->ag 

how can this?

thank you

here full dput data:

structure(list(rank = c(10394749l, 36749879l), count = c(1l,  1l), percent = c(0.001011122, 0.001011122), path = c("ao.legacy payment.not_completed->ao.legacy payment.not_completed->ao.legacy payment.completed",  "ao.legacy payment.not_completed->agent.payment.completed")), .names = c("rank",  "count", "percent", "path"), class = "data.frame", row.names = c(na,  -2l)) 

you use gsub match . , numbers following . (\\.[0-9]+) , replace ''.

 df1$path.column <- gsub('\\.[0-9]+', '', df1$path.column)  df1  #           path.column  #1 ag -> ao -> iv -> ag  #2       ao -> iv -> ag 

update

for new dataset df2

gsub('\\.[^->]+(?=(->|\\b))', '', df2$path, perl=true) #[1] "ao->ao->ao" "ao->agent"  

and string showed in op's post

str2 <- c('ag.1.4->ao.5.5->iv.9.12->ag.4.35',     'ao.11.234->iv.345.455.1.2->ag.9.531')  gsub('\\.[^->]+(?=(->|\\b))', '', str2, perl=true)  #[1] "ag->ao->iv->ag" "ao->iv->ag"     

data

df1 <- structure(list(path.column = c("ag.1 -> ao.5 -> iv.9 -> ag.4",  "ao.11 -> iv.345 -> ag.9")), .names = "path.column",  class = "data.frame", row.names = c(na, -2l))  df2  <- structure(list(rank = c(10394749l, 36749879l), count = c(1l,  1l), percent = c(0.001011122, 0.001011122),  path = c("ao.legacy payment.not_completed->ao.legacy payment.not_completed->ao.legacy payment.completed",  "ao.legacy payment.not_completed->agent.payment.completed")),  .names = c("rank", "count", "percent", "path"), class = "data.frame",  row.names = c(na, -2l)) 

Comments

Popular posts from this blog

java - Could not locate OpenAL library -

c++ - Delete matches in OpenCV (Keypoints and descriptors) -

sorting - opencl Bitonic sort with 64 bits keys -