I have a dataset where each individual (id) has an e_date, and since each individual could have more than one e_date, I’m trying to get the earliest date for each individual. So basically I would like to have a dataset with one row per each id showing his earliest e_date value.
I’ve use the aggregate function to find the minimum values, I’ve created a new variable combining the date and the id and last I’ve subset the original dataset based on the one containing the minimums using the new variable created. I’ve come to this:
new <- aggregate(e_date ~ id, data_full, min) data_full["comb"] <- NULL data_full$comb <- paste(data_full$id,data_full$e_date) new["comb"] <- NULL new$comb <- paste(new$lopnr,new$EDATUM) data_fixed <- data_full[which(new$comb %in% data_full$comb),]
The first thing is that the aggregate function doesn’t seems to work at all, it reduces the number of rows but viewing the data I can clearly see that some ids appear more than once with different e_date. Plus, the code gives me different results when I use the as.Date format instead of its original format for the date (integer). I think the answer is simple but I’m struck on this one.
We can use
data.table. Convert the ‘data.frame’ to ‘data.table’ (
setDT(data_full)), grouped by ‘id’, we get the 1st row (
library(data.table) setDT(data_full)[order(e_date), head(.SD, 1L), by = id]
dplyr, after grouping by ‘id’,
arrange the ‘e_date’ (assuming it is of
Date class) and get the first row with
library(dplyr) data_full %>% group_by(id) %>% arrange(e_date) %>% slice(1L)
If we need a
base R option,
ave can be used
data_full[with(data_full, ave(e_date, id, FUN = function(x) rank(x)==1)),]