--- title: "Introduction" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r echo=FALSE, include=FALSE} library(data.table) library(magrittr) ``` # csfmt_rts_data_v2 csfmt_rts_data_v2 (`vignette("csfmt_rts_data_v2", package = "cstidy")`) is a data format for real-time surveillance. ```{r} d <- cstidy::generate_test_data() cstidy::set_csfmt_rts_data_v2(d) # Looking at the dataset d[] ``` ## Smart assignment `csfmt_rts_data_v2` does smart assignment for time and geography. When the **variables in bold** are assigned using `:=`, the listed variables will be automatically imputed. **location_code**: - granularity_geo - country_iso3 **isoyear**: - granularity_time - isoweek - isoyearweek - season - seasonweek - calyear - calmonth - calyearmonth - date **isoyearweek**: - granularity_time - isoyear - isoweek - season - seasonweek - calyear - calmonth - calyearmonth - date **date**: - granularity_time - isoyear - isoweek - isoyearweek - season - seasonweek - calyear - calmonth - calyearmonth ```{r} d <- cstidy::generate_test_data()[1:5] cstidy::set_csfmt_rts_data_v2(d) # Looking at the dataset d[] # Smart assignment of time columns (note how granularity_time, isoyear, isoyearweek, date all change) d[1,isoyearweek := "2021-01"] d # Smart assignment of time columns (note how granularity_time, isoyear, isoyearweek, date all change) d[2,isoyear := 2019] d # Smart assignment of time columns (note how granularity_time, isoyear, isoyearweek, date all change) d[4:5,date := as.Date("2020-01-01")] d # Smart assignment fails when multiple time columns are set d[1,c("isoyear","isoyearweek") := .(2021,"2021-01")] d # Smart assignment of geo columns d[1,c("location_code") := .("norge")] d # Collapsing down to different levels, and healing the dataset # (so that it can be worked on further with regards to real time surveillance) d[, .(deaths_n = sum(deaths_n), location_code = "norge"), keyby=.(granularity_time)] %>% cstidy::set_csfmt_rts_data_v2(create_unified_columns = FALSE) %>% print() # Collapsing to different levels, and removing the class csfmt_rts_data_v2 because # it is going to be used in new output/analyses d[, .(deaths_n = sum(deaths_n), location_code = "norge"), keyby=.(granularity_time)] %>% cstidy::remove_class_csfmt_rts_data() %>% print() ``` ## Summary We need a way to easily summarize the data structure of a dataset. ```{r} cstidy::generate_test_data() %>% cstidy::set_csfmt_rts_data_v2() %>% summary() ``` ## Identifying data structure of one column We need a way to easily summarize the data structure of one column inside a dataset. ```{r} cstidy::generate_test_data() %>% cstidy::set_csfmt_rts_data_v2() %>% cstidy::identify_data_structure("deaths_n") %>% plot() ```