---
title: "Introduction"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Introduction}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r echo=FALSE, include=FALSE}
library(data.table)
library(magrittr)
```
# csfmt_rts_data_v2
csfmt_rts_data_v2 (`vignette("csfmt_rts_data_v2", package = "cstidy")`) is a data format for real-time surveillance.
```{r}
d <- cstidy::generate_test_data()
cstidy::set_csfmt_rts_data_v2(d)
# Looking at the dataset
d[]
```
## Smart assignment
`csfmt_rts_data_v2` does smart assignment for time and geography.
When the **variables in bold** are assigned using `:=`, the listed variables will be automatically imputed.
**location_code**:
- granularity_geo
- country_iso3
**isoyear**:
- granularity_time
- isoweek
- isoyearweek
- season
- seasonweek
- calyear
- calmonth
- calyearmonth
- date
**isoyearweek**:
- granularity_time
- isoyear
- isoweek
- season
- seasonweek
- calyear
- calmonth
- calyearmonth
- date
**date**:
- granularity_time
- isoyear
- isoweek
- isoyearweek
- season
- seasonweek
- calyear
- calmonth
- calyearmonth
```{r}
d <- cstidy::generate_test_data()[1:5]
cstidy::set_csfmt_rts_data_v2(d)
# Looking at the dataset
d[]
# Smart assignment of time columns (note how granularity_time, isoyear, isoyearweek, date all change)
d[1,isoyearweek := "2021-01"]
d
# Smart assignment of time columns (note how granularity_time, isoyear, isoyearweek, date all change)
d[2,isoyear := 2019]
d
# Smart assignment of time columns (note how granularity_time, isoyear, isoyearweek, date all change)
d[4:5,date := as.Date("2020-01-01")]
d
# Smart assignment fails when multiple time columns are set
d[1,c("isoyear","isoyearweek") := .(2021,"2021-01")]
d
# Smart assignment of geo columns
d[1,c("location_code") := .("norge")]
d
# Collapsing down to different levels, and healing the dataset
# (so that it can be worked on further with regards to real time surveillance)
d[, .(deaths_n = sum(deaths_n), location_code = "norge"), keyby=.(granularity_time)] %>%
cstidy::set_csfmt_rts_data_v2(create_unified_columns = FALSE) %>%
print()
# Collapsing to different levels, and removing the class csfmt_rts_data_v2 because
# it is going to be used in new output/analyses
d[, .(deaths_n = sum(deaths_n), location_code = "norge"), keyby=.(granularity_time)] %>%
cstidy::remove_class_csfmt_rts_data() %>%
print()
```
## Summary
We need a way to easily summarize the data structure of a dataset.
```{r}
cstidy::generate_test_data() %>%
cstidy::set_csfmt_rts_data_v2() %>%
summary()
```
## Identifying data structure of one column
We need a way to easily summarize the data structure of one column inside a dataset.
```{r}
cstidy::generate_test_data() %>%
cstidy::set_csfmt_rts_data_v2() %>%
cstidy::identify_data_structure("deaths_n") %>%
plot()
```