Check set overlap between two state lists / data frames, e.g. prior to merging them.
compare(
df1,
df2,
state1 = "gwcode",
time1 = "year",
state2 = "gwcode",
time2 = "year"
)
report(x)
data frame
data frame
(character(1)
) Name of the country ID var in df1, default
"gwcode"
(character(1)
) Name of the time ID var in df1, default "year"
(character(1)
) Name of the country ID var in df2, default
"gwcode"
(character(1)
) Name of the time ID var in df2, default "year"
a "state_sets" object produced by compare()
This is a helper for interactively debugging data merges for data that may have slightly different state lists. For example, these differences in case sets could be because of country code differences.
# df2 has all countries in 2018 but some values in x1 are missing
df1 <- state_panel(2018, 2018, partial = "any")
df1$x1 <- round(runif(nrow(df1))*5)
df1$x1[sample.int(nrow(df1), size = 20, replace = FALSE)] <- NA
# df2 is missing some countries and also has missing values in x2
df2 <- state_panel(2018, 2018, partial = "any")
df2 <- df2[sample.int(nrow(df2), size = 150), ]
df2$x2 <- round(runif(nrow(df2))*5)
df2$x2[sample.int(nrow(df2), size = 20, replace = FALSE)] <- NA
comp <- compare(df1, df2)
comp
#> # A tibble: 6 × 5
#> case_in_df1 case_in_df2 missval_df1 missval_df2 n
#> <int> <int> <fct> <fct> <int>
#> 1 1 0 0 unknown 42
#> 2 1 0 1 unknown 5
#> 3 1 1 0 0 116
#> 4 1 1 0 1 19
#> 5 1 1 1 0 14
#> 6 1 1 1 1 1
report(comp)
#> 197 total rows
#> 197 rows in df1
#> 150 rows in df2
#>
#> 116 rows match and have no missing values
#> 20-2018, 31-2018, 41-2018, 42-2018, 51-2018, 52-2018, 54-2018, 56-2018, 58-2018, 60-2018, and 106 more
#>
#> 1 rows match but have missing values in both
#> 436-2018
#>
#> 19 rows match but have missing values in df2
#> 53-2018, 70-2018, 80-2018, 93-2018, 130-2018, 135-2018, 160-2018, 210-2018, 232-2018, 452-2018, and 9 more
#>
#> 14 rows match but have missing values in df1
#> 94-2018, 340-2018, 343-2018, 344-2018, 349-2018, 483-2018, 500-2018, 590-2018, 690-2018, 731-2018, and 4 more
#>
#> 42 rows in df1 (no missing values) but not df2
#> 2-2018, 40-2018, 55-2018, 57-2018, 91-2018, 140-2018, 205-2018, 211-2018, 221-2018, 223-2018, and 32 more
#>
#> 5 rows in df1 (with missing values) but not df2
#> 95-2018, 371-2018, 435-2018, 520-2018, 835-2018