Title: | Helpful Functions for Cleaning Surveillance Data |
---|---|
Description: | Helpful functions for the cleaning and manipulation of surveillance data, especially with regards to the creation and validation of panel data from individual level surveillance data. |
Authors: | Richard Aubrey White [aut, cre]
|
Maintainer: | Richard Aubrey White <[email protected]> |
License: | MIT + file LICENSE |
Version: | 2023.5.24 |
Built: | 2025-02-09 03:29:51 UTC |
Source: | https://github.com/csids/cstidy |
Attempts to expand the dataset to include more time
A time series is defined as a unique combination of:
granularity_time
granularity_geo
country_iso3
location_code
border
age
sex
*_id
*_tag
expand_time_to( x, max_isoyear = NULL, max_isoyearweek = NULL, max_date = NULL, ... )
expand_time_to( x, max_isoyear = NULL, max_isoyearweek = NULL, max_date = NULL, ... )
x |
An object of type |
max_isoyear |
Maximum isoyear |
max_isoyearweek |
Maximum isoyearweek |
max_date |
Maximum date |
... |
Not used. |
csfmt_rts_data_v2, a larger dataset that includes more rows corresponding to more time.
Other csfmt_rts_data:
identify_data_structure()
,
remove_class_csfmt_rts_data()
,
set_csfmt_rts_data_v1()
,
set_csfmt_rts_data_v2()
,
unique_time_series()
Generates some test data
generate_test_data(fmt = "csfmt_rts_data_v2")
generate_test_data(fmt = "csfmt_rts_data_v2")
fmt |
Data format ( |
csfmt_rts_data_v2, a dataset containing fake data.
cstidy::generate_test_data("csfmt_rts_data_v2")
cstidy::generate_test_data("csfmt_rts_data_v2")
Provides corresponding healed times (deprecated)
heal_time_csfmt_rts_data_v1(x, cols, granularity_time = "date")
heal_time_csfmt_rts_data_v1(x, cols, granularity_time = "date")
x |
A vector containing either dates, isoyearweek, or isoyear. |
cols |
Columns to restrict the output to. |
granularity_time |
date, isoyearweek, or isoyear, depending on the values contained in x. |
data.table, a dataset with time columns corresponding to the values given in x.
Provides corresponding healed times
heal_time_csfmt_rts_data_v2(x, cols, granularity_time = "date")
heal_time_csfmt_rts_data_v2(x, cols, granularity_time = "date")
x |
A vector containing either dates, isoyearweek, or isoyear. |
cols |
Columns to restrict the output to. |
granularity_time |
date, isoyearweek, or isoyear, depending on the values contained in x. |
data.table, a dataset with time columns corresponding to the values given in x.
Reduces the data structure of a column inside a dataset into something that describes
identify_data_structure(x, col, ...) ## S3 method for class 'csfmt_rts_data_v2' identify_data_structure(x, col, ...) ## S3 method for class ''tbl_Microsoft SQL Server'' identify_data_structure(x, col, ...)
identify_data_structure(x, col, ...) ## S3 method for class 'csfmt_rts_data_v2' identify_data_structure(x, col, ...) ## S3 method for class ''tbl_Microsoft SQL Server'' identify_data_structure(x, col, ...)
x |
An object |
col |
Column name to hash |
... |
Arguments passed to or from other methods |
csfmt_rts_data_structure_hash_v2, a summary object.
Other csfmt_rts_data:
expand_time_to()
,
remove_class_csfmt_rts_data()
,
set_csfmt_rts_data_v1()
,
set_csfmt_rts_data_v2()
,
unique_time_series()
cstidy::generate_test_data() %>% cstidy::set_csfmt_rts_data_v2() %>% cstidy::identify_data_structure("deaths_n") %>% plot()
cstidy::generate_test_data() %>% cstidy::set_csfmt_rts_data_v2() %>% cstidy::identify_data_structure("deaths_n") %>% plot()
This data comes from the Norwegian Surveillance System for Communicable Diseases (MSIS). The date corresponds to when the PCR-test was taken.
nor_covid19_cases_by_time_location_csfmt_rts_v1
nor_covid19_cases_by_time_location_csfmt_rts_v1
A csfmt_rts_data_v1 with 11028 rows and 18 variables:
day/isoweek
nation, county
nor
norge, 11 counties
2020
total
Isoyear of event
Isoweek of event
Isoyearweek of event
Season of event
Seasonweek of event
Calyear of event
Calmonth of event
Calyearmonth of event
Date of event
Number of confirmed covid19 cases
Number of confirmed covid19 cases per 100.000 population
The raw number of cases and cases per 100.000 population are recorded.
This data was extracted on 2022-05-04.
This data was extracted on 2022-05-04.
nor_covid19_icu_and_hospitalization_csfmt_rts_v1
nor_covid19_icu_and_hospitalization_csfmt_rts_v1
A csfmt_rts_data_v1 with 919 rows and 18 variables:
day/isoweek
nation
nor
norge
2020
total
Isoyear of event
Isoweek of event
Isoyearweek of event
Season of event
Seasonweek of event
Calyear of event
Calmonth of event
Calyearmonth of event
Date of event
Number of new admissions to the ICU with a positive PCR test
Number of new hospitalizations with Covid-19 as the primary cause
Remove class csfmt_rts_data_*
remove_class_csfmt_rts_data(x)
remove_class_csfmt_rts_data(x)
x |
data.table |
No return value, called for the side effect of removing the csfmt_rts_data class from x.
Other csfmt_rts_data:
expand_time_to()
,
identify_data_structure()
,
set_csfmt_rts_data_v1()
,
set_csfmt_rts_data_v2()
,
unique_time_series()
x <- cstidy::generate_test_data() %>% cstidy::set_csfmt_rts_data_v2() class(x) cstidy::remove_class_csfmt_rts_data(x) class(x)
x <- cstidy::generate_test_data() %>% cstidy::set_csfmt_rts_data_v2() class(x) cstidy::remove_class_csfmt_rts_data(x) class(x)
set_csfmt_rts_data_v1
converts a data.table
to csfmt_rts_data_v1
by reference.
csfmt_rts_data_v1
creates a new csfmt_rts_data_v1
(not by reference) from either a data.table
or data.frame
.
set_csfmt_rts_data_v1(x, create_unified_columns = TRUE, heal = TRUE) csfmt_rts_data_v1(x, create_unified_columns = TRUE, heal = TRUE)
set_csfmt_rts_data_v1(x, create_unified_columns = TRUE, heal = TRUE) csfmt_rts_data_v1(x, create_unified_columns = TRUE, heal = TRUE)
x |
The data.table to be converted to csfmt_rts_data_v1 |
create_unified_columns |
Do you want it to create unified columns? |
heal |
Do you want to impute missing values on creation? |
An extended data.table
, which has been modified by reference and returned (invisibly).
No return value, called for side effect of replacing the current data.table with a csfmt_rts_data_v1 in place.
Returns a duplicated csfmt_rts_data_v1.
csfmt_rts_data_v1
contains the smart assignment feature for time and geography.
When the variables in bold are assigned using :=
, the listed variables will be automatically imputed.
location_code:
granularity_geo
country_iso3
isoyear:
granularity_time
isoweek
isoyearweek
season
seasonweek
calyear
calmonth
calyearmonth
date
isoyearweek:
granularity_time
isoyear
isoweek
season
seasonweek
calyear
calmonth
calyearmonth
date
date:
granularity_time
isoyear
isoweek
isoyearweek
season
seasonweek
calyear
calmonth
calyearmonth
csfmt_rts_data_v1
contains 16 unified columns:
granularity_time
granularity_geo
country_iso3
location_code
border
age
sex
isoyear
isoweek
isoyearweek
season
seasonweek
calyear
calmonth
calyearmonth
date
Other csfmt_rts_data:
expand_time_to()
,
identify_data_structure()
,
remove_class_csfmt_rts_data()
,
set_csfmt_rts_data_v2()
,
unique_time_series()
set_csfmt_rts_data_v2
converts a data.table
to csfmt_rts_data_v2
by reference.
csfmt_rts_data_v2
creates a new csfmt_rts_data_v2
(not by reference) from either a data.table
or data.frame
.
set_csfmt_rts_data_v2(x, create_unified_columns = TRUE, heal = TRUE) csfmt_rts_data_v2(x, create_unified_columns = TRUE, heal = TRUE)
set_csfmt_rts_data_v2(x, create_unified_columns = TRUE, heal = TRUE) csfmt_rts_data_v2(x, create_unified_columns = TRUE, heal = TRUE)
x |
The data.table to be converted to csfmt_rts_data_v2 |
create_unified_columns |
Do you want it to create unified columns? |
heal |
Do you want to impute missing values on creation? |
For more details see the vignette:
vignette("csfmt_rts_data_v2", package = "cstidy")
An extended data.table
, which has been modified by reference and returned (invisibly).
No return value, called for side effect of replacing the current data.table with a csfmt_rts_data_v2 in place.
Returns a duplicated csfmt_rts_data_v2.
csfmt_rts_data_v2
contains the smart assignment feature for time and geography.
When the variables in bold are assigned using :=
, the listed variables will be automatically imputed.
location_code:
granularity_geo
country_iso3
isoyear:
granularity_time
isoweek
isoyearweek
isoquarter
isoyearquarter
season
seasonweek
calyear
calmonth
calyearmonth
date
isoyearweek:
granularity_time
isoyear
isoweek
isoquarter
isoyearquarter
season
seasonweek
calyear
calmonth
calyearmonth
date
date:
granularity_time
isoyear
isoweek
isoyearweek
isoquarter
isoyearquarter
season
seasonweek
calyear
calmonth
calyearmonth
csfmt_rts_data_v2
contains 16 unified columns:
granularity_time
granularity_geo
country_iso3
location_code
border
age
sex
isoyear
isoweek
isoyearweek
isoquarter
isoyearquarter
season
seasonweek
calyear
calmonth
calyearmonth
date
Other csfmt_rts_data:
expand_time_to()
,
identify_data_structure()
,
remove_class_csfmt_rts_data()
,
set_csfmt_rts_data_v1()
,
unique_time_series()
# Create some fake data as data.table d <- cstidy::generate_test_data(fmt = "csfmt_rts_data_v2") d <- d[1:5] # convert to csfmt_rts_data_v2 by reference cstidy::set_csfmt_rts_data_v2(d, create_unified_columns = TRUE) # d[1, isoyearweek := "2021-01"] d d[2, isoyear := 2019] d d[3, date := as.Date("2020-01-01")] d d[4, c("isoyear", "isoyearweek") := .(2021, "2021-01")] d d[5, c("location_code") := .("norge")] d # Investigating the data structure of one column inside a dataset cstidy::generate_test_data() %>% cstidy::set_csfmt_rts_data_v2() %>% cstidy::identify_data_structure("deaths_n") %>% plot() # Investigating the data structure via summary cstidy::generate_test_data() %>% cstidy::set_csfmt_rts_data_v2() %>% summary()
# Create some fake data as data.table d <- cstidy::generate_test_data(fmt = "csfmt_rts_data_v2") d <- d[1:5] # convert to csfmt_rts_data_v2 by reference cstidy::set_csfmt_rts_data_v2(d, create_unified_columns = TRUE) # d[1, isoyearweek := "2021-01"] d d[2, isoyear := 2019] d d[3, date := as.Date("2020-01-01")] d d[4, c("isoyear", "isoyearweek") := .(2021, "2021-01")] d d[5, c("location_code") := .("norge")] d # Investigating the data structure of one column inside a dataset cstidy::generate_test_data() %>% cstidy::set_csfmt_rts_data_v2() %>% cstidy::identify_data_structure("deaths_n") %>% plot() # Investigating the data structure via summary cstidy::generate_test_data() %>% cstidy::set_csfmt_rts_data_v2() %>% summary()
Attempts to identify the unique time series that exist in this dataset.
A time series is defined as a unique combination of:
granularity_time
granularity_geo
country_iso3
location_code
border
age
sex
*_id
*_tag
unique_time_series(x, set_time_series_id = FALSE, ...)
unique_time_series(x, set_time_series_id = FALSE, ...)
x |
An object of type |
set_time_series_id |
If TRUE, then |
... |
Not used. |
data.table, a dataset that lists all the unique time series in x.
Other csfmt_rts_data:
expand_time_to()
,
identify_data_structure()
,
remove_class_csfmt_rts_data()
,
set_csfmt_rts_data_v1()
,
set_csfmt_rts_data_v2()