In this repository, you can find the analysis code for the experiments in Chapters 3-5 that examined the role of extreme outcomes.
- To run the analysis code you will need R and RStudio installed on your computer
- You will also need to install CmdStanR by following the instructions here
The easiest way to access the analysis code is downloading the .zip
file. To do this:
- Click the "Code" button and then click the "Download ZIP" option
- This will download a
.zip
file containing the code to your computer - Extract the contents of the
.zip
file to a directory of your choice
Alternatively, you can open the Terminal or Command Prompt, navigate to the folder where you want to clone this repository, and run the following command: git clone https://github.com/joelholwerda/thesis-empirical-extremes.git
.
- Open the
00_open_project.Rproj
file. This opens a new session in RStudio and sets the working directory to the correct location - Open the
01_wrangle_data.R
file. This wrangles the raw data and outputs.csv
files loaded in the subsequent models and figures. - Click the "Source" button to run the entire script or highlight sections and press Cmd + Enter
- Open and run the
02_model_data.R
file. This uses brms to fit the Bayesian models and test the reported hypotheses. Some of these models take several minutes to run. See the "Options" section below for ways to speed up this process - Open and run the
03_create_figures.R
file. This creates the figures and saves them as.pdf
files in theoutput/figures
folder - Open and run the
04_alternate_models.R
file. This includes numerous other models that could have been used to analyse our results and examines whether our conclusions were contingent on the reported models
The renv
package is used for dependency management. 01_wrangle_data.R
calls renv::restore()
to install the correct version of each package listed in the renv.lock
file. More information about the renv
package can be found here.
You can change the following options in the 02_model_data.R
file:
- Setting
options(quick_version = TRUE)
allows faster but less precise parameter estimation by reducing the number of samples taken in thebrms
models. Set toFALSE
to reproduce published values - The fitted models are cached in
output/fitted_models
. Settingoptions(overwrite_saved_models = TRUE)
ensures that the models are run every time instead of loading a cached version. In order to save time, this should be set toFALSE
unless changes have been made to the model - Set
run_diagnostics
toTRUE
to create additional diagnostic plots for thebrms
models (e.g., rank histograms, posterior predictive checks). Even ifFALSE
, the Stan warnings (e.g., divergences, rhat) will still be displayed if applicable - The number of cores used for the
brms
models will be the number of available cores minus the value of thereserved_cores
variable (or one if the number of reserved cores is greater than or equal to the number of available cores)
-
The
src
folder contains various functions used to run the analysis -
Information about each experiment is stored in the
src/wrangle/exp_info
folder. This is used to import the raw data usingimport_csv.R
(csv files) andimport_mat.R
(Matlab files) and perform initial wrangling usingwrangle_all_csv.R
andwrangle_all_mat.R
If you have trouble running the code in this repository or have questions, contact me at joeldavidholwerda@gmail.com.