This vignette guides you through using the benchmark function of lpjmlstats in a typical LPJmL development workflow.
Say you have been working on a change in the LPJmL setup, tweaking
some settings in the config or making changes to the source code. A
global run with the new setup has been performed and the output was
sucessfully written to the output directory. At this point, the natural
question arises: ‘How have the modifications affected the results?’ The
benchmarking system helps find an answer by
systematically comparing the outputs of the modified, currently “under
test,” LPJmL setup against the outputs of a stable “baseline” setup.
Results from different baseline LPJmL versions can be found in the
p/projects/lpjml/benchmark_run_outputs/
directory on the
cluster.
This package is a component of the PIAM cluster module. It is
recommended to load this module via module load piam
(on
the hpc 2024
source /p/system/modulefiles/defaults/piam/module_load_piam
)
before starting the R session, ensuring that all necessary packages are
loaded. After loading the PIAM module you may need to temporarily
disable personal R packages to avoid conflicts using
liboff
. Remember to re-enable them with libon
when finished with the benchmarking.
In this vignette we will evaluate how the results of LPJmL changed, going from version 5.7.1 to version 5.8.6.
Start by loading the package.
The basic usage of the benchmarking function is simple. Enter the paths of the under test and baseline output directories, along with some meta-information.
benchmark_result <-
benchmark(
baseline_dir = "/p/projects/lpjml/benchmark_run_outputs/5.7.1/output", # nolint
# it is recommended to use absolute paths
under_test_dirs = "/p/projects/lpjml/benchmark_run_outputs/5.8.6/output", # nolint
author = "David",
description = "Evaluate the changes from version 5.7.1 to 5.8.6",
output_file = "benchmark_vignette.pdf"
)
This will create a PDF report in the working directory and stores the
numerical results in the benchmark_result
object. The
process usually takes several minutes. It will summarize and compare the
typically most relevant LPJmL outputs. Note that the console output
highlights the currently processed output variable in blue. After the
core numerical benchmarking is completed, it will start to generate the
PDF report, which visualizes the numerical results in different plots
and tables.
Use ?benchmark
to see how the output directory for the
PDF report as well as other function arguments can be defined. In the
details section of the function documentation you can also see the
requirements for the function to work properly.
One of the most important other arguments that can be passed to the
benchmark function is the settings
R list. It defines the
full protocol of the benchmarking process, including which outputs are
considered, how they are processed, compared and visualized. If no
settings are passed, the default as given in the
default_settings
list is used. Here is an excerpt of how
that list is defined in R\default_settings.R
:
default_settings <- list(
vegc = c(GlobSumTimeAvgTable, GlobSumAnnAvgTimeseries, TimeAvgMap),
soilc = c(GlobSumTimeAvgTable, GlobSumAnnAvgTimeseries, TimeAvgMap),
litc = c(GlobSumTimeAvgTable, GlobSumAnnAvgTimeseries, TimeAvgMap),
...
)
The settings list reads as follows: Each line corresponds to an output variable. The left hand side of the equality sign defines which output is processed and the right hand side defines how this output is processed.
More specifically the right hand side defines which metrics are used
for the evaluation of the output. A metric is a generic pipeline to
summarize, compare and visualize output data. It is generic because the
same metric can be applied to different outputs. The metrics are the
cornerstone of the benchmarking process. See ?Metric
for
technical details.
The metric GlobSumTimeAvgTable
globally sums up all
cellular values weighted by reference area, and then averages over time.
The resulting scalar values are compared and displayed in a table. The
metric GlobSumAnnAvgTimeseries
also computes a weighted
sum, but only averages over each year (if the output is not already
given anually). The resulting time series are displayed in a plot.
Finally, TimeAvgMap
only averages over time, and plots the
differences from under-test to baseline as maps. Type
?GlobSumTimeAvgTable
for details.
Often it is needed to change these settings, for example when new outputs should be evaluated. Read the benchmark-change-settings vignette for a tutorial.
Benchmarking multiple under test runs against a baseline is also possible. For example, it could be interesting to compare the 5.8.1 version with several older versions.
benchmark(
"/p/projects/lpjml/benchmark_run_outputs/5.8.1/output", # nolint
list(
"/p/projects/lpjml/benchmark_run_outputs/5.7.7/output", # nolint
# note that older versions than 5.7.1 can at the moment not be benchmarked due
# to missing terr_area output
"/p/projects/lpjml/benchmark_run_outputs/5.7.1/output" # nolint
),
author = "David",
description = "Check how newer lpjml versions compare to older versions",
output_file = "benchmark_vignette_multiple_ut.pdf"
)
Comparing more than two under test to the baseline may result in display problems in the PDF report.
You can also give short identifiers for the simulation that will be used throughout at this point:
benchmark(
list(my_baseline = "/p/projects/lpjml/benchmark_run_outputs/5.8.1/output"), # nolint
list(
my_test1 = "/p/projects/lpjml/benchmark_run_outputs/5.7.7/output", # nolint
# note that older versions than 5.7.1 can at the moment not be benchmarked due
# to missing terr_area output
my_test2 = "/p/projects/lpjml/benchmark_run_outputs/5.7.1/output" # nolint
),
...
)
The PDF report offers a visualization of the numerical benchmarking results. It is structured by the different metrics used in the settings. Each metric has its own section where the result from all outputs using this metric are displayed.
The front page shows some meta information on the benchmarking. Of particular importance is the simulation table, that maps short simulation identifiers to the runs considered in the benchmarking. Usually these identifier are abbreviations of the simulation names, as defined in the lpjml config. If these names are not unique, abbreviations of the simulation paths are used. The identifiers are employed because the simulation names and paths are often quite long, making it more convenient to use short abbreviations to refer to the different runs in the report, especially when comparing the baseline with multiple under test runs.
The benchmarking function also outputs its numerical results as a benchmark result object. Accessing this data is a bit technical, so skip this section if you are only interested in the PDF report.
The benchmark result behaves similarly to a nested R list. The top
layer is again structured by the different metrics. The data that is
generated by a metric is stored in its var_grp_list attribute. It
contains the results for all the variables that the metric evaluated as
a list of so called variable groups.
Each variable group consists of the processed baseline and (possibly
multiple) under test outputs of that variable, as well as the
comparisons of all under test outputs against the baseline. See
?Metric
for details on how the metrics organize their
processed data.
Say we are interested in how the time series of global vegetation carbon has changed in the newer version. The data from the baseline run can be accessed via
Note that this is an LPJmLData object of lpjmlkit
.
Evaluating an LPJmLData object in the R Console will provide you with a
lot of useful meta data, about the origin and processing history of the
object. Note that subset is TRUE
and space aggregation is
weighted_sum
, which reflects the processing steps done by
the benchmarking function.
You will encounter several elements of the lpjmlkit
package in lpjmlstats
- in fact lpjmlstats
completly builds upon lpjmlkit
.
This also means that we can employ the functionality of this latter
package to work with the benchmark results. For example, to output the
first 5 years of vegc_old_ts
we use
For under test results the simulation identifiers are used to access the data. This is needed to differentiate the under test outputs if multiple runs are benchmarked against a baseline run.
The simulation table that relates the identifiers to simulation names, as well as other meta data can be accessed via:
Note that the simulation with the new version has the identifier
5.8.
. Thus, to access the new vegc time series of the new
LPJmL version we use
To check the difference between the two time series we use
Observe that the newer version produced less vegetation carbon in the first five years.
The data from the benchmarking object in combination with the functionality of lpjmlkit is also useful to make comparisons to literature data. Say we have a reference dataset of global soil nitrogen given as a terra object. Again the tools from lpjmlkit come in handy to convert the benchmark data to terra:
soiln_lpjmldata <- benchmark_result$TimeAvgMap$var_grp_list$soiln$under_test$`5.8.`
soiln_terra <- lpjmlkit::as_terra(soiln_lpjmldata)
soiln_terra
# read in validation data terra
# ...
Another convenient feature of the benchmark data object is that all
plot functions used for the report generation can also be called
individually from here. For example the table of the
GlobSumTimeAvgTable
metric can be generated via
This enables a terminal-only benchmarking without generating plots or
reports. See ?benchmark
for how to skip the PDF report
generation in the benchmarking.
Sometimes individual plots of the benchmarking may be needed in another context. The following chunk generates the vegc plot.
With the exception of the table, all plots generated in the benchmarking are ggplot objects, which enables modifications of all visual plot elements. For example a different title can be added to the litter carbon plot with the following code.
vegc_map <- benchmark_result$TimeAvgMap$plot()[[3]] # litc is the third output of the settings
vegc_map + ggplot2::ggtitle("Difference in litter carbon; lpjml 5.8.6 - lpjml 5.7.1")
Maybe the preceding discussion convinced you that working with the benchmark object is fun and useful. It provides a lot of interfaces to further processing steps and functionality. It may thus be valuable to save the object for later analysis. This last step of the tutorial can be done for example with
This file can also be used to recreate the benchmark report. See
?create_pdf_report
for details.