Builds a duration version of a data frame representing panel data.
Usage
add_duration(
  data,
  y,
  unitID,
  tID,
  freq = "month",
  sort = FALSE,
  ongoing = TRUE,
  slice.last = FALSE
)Arguments
- data
 Data frame representing panel data.
- y
 A binary indicator of the incidence of some event, e.g. a coup.
- unitID
 Name of the variable in the data frame identifying the cross-sectional units, e.g.
"country".- tID
 Name of the variable in the data frame identifying the time unit, preferably as class
Date. E.g."year".- freq
 Frequency at which units are measured in
tID. Currently yearly, monthly, and daily data are supported, i.e."year","month", or"day".- sort
 Sort data by unit and time? Default is
FALSE, i.e. return data in original order.- ongoing
 If
TRUE, successive 1's are considered ongoing events and treated asNAafter the first 1. IfFALSE, successive 1's are all treated as failures.- slice.last
 Set to
TRUEto create a slice of the last time period; used withforecast.spdur. For compatibility with CRISP and ICEWS projects.
Value
Returns the original data frame with 8 duration-specific additional variables:
- failure
 Binary indicator of an event.
- ongoing
 Binary indicator for ongoing events, not counting the initial failure time.
- end.spell
 Binary indicator for the last observation in a spell, either due to censoring or failure.
- cured
 Binary indicator for spells that are coded as cured, or immune from failure. Equal to 1 -
atrisk.- atrisk
 Binary indicator for spells that are coded as at risk for failure. Equal to 1 -
cured.- censor
 Binary indicator for right-censored spells.
- duration
 t, counter for how long a spell has survived without failure.- t.0
 Starting time for period observed during
t, by default equalsduration- 1.
Details
This function processes a panel data frame by creating a failure
variable from y and corresponding duration counter, as well as
risk/immunity indicators. Supported time resolutions are year, month, and
day, and input data should be (dis-)aggregated to one of these levels.
The returned data frame should have the same number of rows at the original.
If y is an indicator of the incidence of some event, rather than an
onset indicator, then ongoing spells of failure beyond the initial event are
coded as NA (e.g. 000111 becomes a spell of 0001 NA NA). This is to preserve
compatibility with the base dataset. Note that the order of rows may be
different though.
There cannot be missing values ("NA") in any of the key variables
y, unitID, or tID; they will stop the function.
Furthermore, series that start with an event, e.g. (100), are treated as
experiencing failure in the first time period. If those events are in fact
ongoing, e.g. the last year of a war that started before the start time of
the dataset, they should be dropped manually before using
buildDuration().
t.0 is the starting time of the period of observation at tID.
It is by default set as duration - 1 and currently only serves as a
placeholder to allow future expansion for varying observation times.
See also
panel_lag for lagging variables in a panel data frame
before building duration data.
Examples
# Yearly data
data <- data.frame(y=c(0,0,0,1,0), 
                   unitID=c(1,1,1,1,1), 
                   tID=c(2000, 2001, 2002, 2003, 2004))
dur.data <- add_duration(data, "y", "unitID", "tID", freq="year")
#> Warning: Converting to 'Date' class with yyyy-06-30
dur.data
#>   y unitID  tID failure ongoing end.spell cured atrisk censor duration t.0
#> 2 0      1 2000       0       0         0     0      1      0        1   0
#> 3 0      1 2001       0       0         0     0      1      0        2   1
#> 4 0      1 2002       0       0         0     0      1      0        3   2
#> 5 1      1 2003       1       0         1     0      1      0        4   3
#> 1 0      1 2004       0       0         1     1      0      1        1   0
