r - Add a where condition inside of aggregate function -


i have data this:

 head(data1[,1:5])                eid             created class_id   min.e.event_time. lead_date     2610966 284546 2015-03-19 11:21:17       36 2015-03-19 11:21:17      null     2610972 284554 2015-03-19 12:37:19       36 2015-03-19 12:37:19      null     2610973 284554 2015-03-19 12:37:19       36 2015-03-19 12:37:19      null     2610975 284558 2015-03-19 14:18:43       36 2015-03-19 14:18:43      null     2610976 284558 2015-03-19 14:18:43       36 2015-03-19 14:18:43      null     2610977 284558 2015-03-19 14:18:43       36 2015-03-19 14:18:43      null 

this events table , eid user id. each line instance of user experiencing event.

i'd count of events each user:

eid_email <- aggregate(data1$eid, list(data1$eid), function(x) length(x)) 

this appears work. great.

but need add condition. need count events each user, above, event_time less lead_date.

when type help(aggregate) manual says there subset argument can use aggregate(). can use argument in way?

how can apply conditional aggregate function? if that's not possible way?

** str data1 following comment **

 str(data1) 'data.frame':   1906721 obs. of  10 variables:  $ eid              : int  45 45 45 45 45 45 45 45 45 45 ...  $ created          : factor w/ 36204 levels "0000-00-00 00:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...  $ class_id         : int  36 36 36 36 36 36 36 36 36 36 ...  $ min.e.event_time.: factor w/ 16175 levels "2013-04-15 11:17:19",..: 10025 10025 10025 10025 10025 10025 10025 10025 10025 10025 ...  $ lead_date        : factor w/ 11199 levels "2012-10-11 18:39:12",..: 11199 11199 11199 11199 11199 11199 11199 11199 11199 11199 ...  $ camp             : int  98713 59020 75796 99195 76986 57986 54062 80420 55078 70800 ...  $ event_date       : factor w/ 695747 levels "2008-01-18 12:18:01",..: 71975 27451 45235 72491 48792 24606 20021 52261 32169 57764 ...  $ event            : factor w/ 3 levels "click","open",..: 3 3 3 3 3 1 3 2 2 3 ...  $ message_name     : factor w/ 2707 levels ""," 2015-03 cad promotion update",..: 1570 2624 1970 1881 1973 1931 1919 1983 2391 2045 ...  $ subject_lin      : factor w/ 2043 levels ""," christie office holiday hours",..: 311 952 318 309 495 1450 520 298 1333 750 ... 

if have dplyr installed, can following:

library(dplyr)  data2 <- data1 %>%   mutate( event_time_posix = as.posixct(min.e.event_time.,                                          format="%y-%m-%d %h:%m:%s",                                         origin="1970-01-01")) %>%   mutate( lead_time_posix = as.posixct(lead_date,                                         format="%y-%m-%d %h:%m:%s",                                        origin="1970-01-01")) %>%   filter( event_time_posix < lead_time_posix ) %>%   group_by(eid) %>%   summarize( n=n() )   options(dplyr.width=inf)  print(data2) 

Comments

Popular posts from this blog

c++ - Delete matches in OpenCV (Keypoints and descriptors) -

java - Could not locate OpenAL library -

sorting - opencl Bitonic sort with 64 bits keys -