The concept behind org
is straightforward - most
analyses have three main sections:
Each section has unique requirements:
org::initialize_project
This is the main function that sets up your project structure. It
takes 2+ arguments and saves folder locations in
org::project
for use throughout your analysis:
home
: Location of Run.R
and the
R/
folder (accessible via
org::project$home
)results
: Results folder that creates date-based
subfolders (accessible via org::project$results_today
)...
: Additional folders as needed (e.g.,
data_raw
, data_clean
)Run.R
This is your main analysis script that orchestrates the entire workflow:
All code sections should be encapsulated in functions in the
R/
folder. You should not have multiple main files, as this
creates confusion when returning to your code later. However, you can
have versioned files (e.g., Run_v01.R
,
Run_v02.R
) where later versions supersede earlier ones.
R/
DirectoryAll analysis functions should be defined in
org::project$home/R
. The initialize_project
function automatically sources all R scripts in this directory.
Here’s a complete example of how to structure your project:
# Initialize the project
org::initialize_project(
env = .GlobalEnv,
home = "/git/analyses/2019/analysis3/",
results = "/dropbox/analyses_results/2019/analysis3/",
data_raw = "/data/analyses/2019/analysis3/"
)
# Document changes in archived results
txt <- glue::glue("
2019-01-01:
Included:
- Table 1
- Table 2
2019-02-02:
Changed Table 1 from mean -> median
", .trim=FALSE)
org::write_text(
txt = txt,
file = fs::path(org::project$results, "info.txt")
)
# Load required packages
library(data.table)
library(ggplot2)
# Run analysis
d <- clean_data() # Accesses data from org::project$data_raw
table_1(d) # Saves to org::project$results_today
figure_1(d) # Saves to org::project$results_today
figure_2(d) # Saves to org::project$results_today
When writing research articles, you often need multiple versions
(initial submission, resubmissions). org
helps manage this
by using date-based versioning:
Run.R
to
Run_YYYY_MM_DD_submission_1.R
R/
to
R_YYYY_MM_DD_submission_1/
This preserves the code that produced results for each submission, ensuring all changes are deliberate and intentional.
Store your project components in appropriate locations:
# Code (GitHub)
git/
└── analyses/
├── 2018/
│ ├── analysis_1/ # org::project$home
│ │ ├── Run.R
│ │ └── R/
│ │ ├── clean_data.R
│ │ ├── descriptives.R
│ │ ├── analysis.R
│ │ └── figure_1.R
│ └── analysis_2/
└── 2019/
└── analysis_3/
# Results (Dropbox)
dropbox/
└── analyses_results/
├── 2018/
│ ├── analysis_1/ # org::project$results
│ │ ├── 2018-03-12/ # org::project$results_today
│ │ │ ├── table_1.xlsx
│ │ │ └── figure_1.png
│ │ ├── 2018-03-15/
│ │ └── 2018-03-18/
│ └── analysis_2/
└── 2019/
└── analysis_3/
# Data (Local)
data/
└── analyses/
├── 2018/
│ ├── analysis_1/ # org::project$data_raw
│ │ └── data.xlsx
│ └── analysis_2/
└── 2019/
└── analysis_3/
For projects on a shared network drive without GitHub/Dropbox:
project_name/ # org::project$home
├── Run.R
├── R/
│ ├── CleanData.R
│ ├── Descriptives.R
│ ├── Analysis1.R
│ └── Graphs1.R
├── paper/
│ └── paper.Rmd
├── results/ # org::project$results
│ └── 2018-03-12/ # org::project$results_today
│ ├── table1.xlsx
│ └── figure1.png
└── data_raw/ # org::project$data_raw
└── data.xlsx
For projects with limited access:
project_name/ # org::project$home
├── Run.R
├── R/
│ ├── clean_data.R
│ ├── descriptives.R
│ ├── analysis.R
│ └── figure_1.R
├── results/ # org::project$results
│ └── 2018-03-12/ # org::project$results_today
│ ├── table_1.xlsx
│ └── figure_1.png
└── data_raw/ # org::project$data_raw
└── data.xlsx
Understanding path components is important:
Component | Name |
---|---|
/home/richard/test.src | Absolute (file)path |
richard/test.src | Relative (file)path |
/home/richard/ | Absolute (directory) path |
./richard/ | Relative (directory) path |
richard | Directory |
test.src | Filename |
A path specifies a location in a directory structure, while a filename only includes the file name itself. Directories only include directory name information.