text cleaning in R -
i have single column in r looks this:
path column ag.1.4->ao.5.5->iv.9.12->ag.4.35 ao.11.234->iv.345.455.1.2->ag.9.531
i want transform into:
path column ag->ao->iv->ag ao->iv->ag
how can this?
thank you
here full dput data:
structure(list(rank = c(10394749l, 36749879l), count = c(1l, 1l), percent = c(0.001011122, 0.001011122), path = c("ao.legacy payment.not_completed->ao.legacy payment.not_completed->ao.legacy payment.completed", "ao.legacy payment.not_completed->agent.payment.completed")), .names = c("rank", "count", "percent", "path"), class = "data.frame", row.names = c(na, -2l))
you use gsub
match .
, numbers following .
(\\.[0-9]+
) , replace ''
.
df1$path.column <- gsub('\\.[0-9]+', '', df1$path.column) df1 # path.column #1 ag -> ao -> iv -> ag #2 ao -> iv -> ag
update
for new dataset df2
gsub('\\.[^->]+(?=(->|\\b))', '', df2$path, perl=true) #[1] "ao->ao->ao" "ao->agent"
and string showed in op's post
str2 <- c('ag.1.4->ao.5.5->iv.9.12->ag.4.35', 'ao.11.234->iv.345.455.1.2->ag.9.531') gsub('\\.[^->]+(?=(->|\\b))', '', str2, perl=true) #[1] "ag->ao->iv->ag" "ao->iv->ag"
data
df1 <- structure(list(path.column = c("ag.1 -> ao.5 -> iv.9 -> ag.4", "ao.11 -> iv.345 -> ag.9")), .names = "path.column", class = "data.frame", row.names = c(na, -2l)) df2 <- structure(list(rank = c(10394749l, 36749879l), count = c(1l, 1l), percent = c(0.001011122, 0.001011122), path = c("ao.legacy payment.not_completed->ao.legacy payment.not_completed->ao.legacy payment.completed", "ao.legacy payment.not_completed->agent.payment.completed")), .names = c("rank", "count", "percent", "path"), class = "data.frame", row.names = c(na, -2l))
Comments
Post a Comment