LPJmL Data 💾 is
an lpjmlkit module that groups around the data class
LPJmLData and aims to facilitate reading and processing of
LPJmL inputs and outputs by combining the raw data with available meta
data (meta files, header files or manually) to avoid a large overhead.
It further enables easy subsetting, transformations and basic statistics
of the data and allows export to common data formats.
Example data files can be downloaded from https://doi.org/10.5281/zenodo.12915168
LPJmL Data first requires reading LPJmL input or output data into R by applying the read_io function (1). The returned object is of class LPJmLData (2), for which basic statistics can be calculated (3), the inner data can be modified (4), or exported (5) to common data formats.
read_io is a generic function to read LPJmL input and output files. It
currently supports four different file formats, “meta”, “clm”, “cdf” and
“raw”:
"output_metafile" : true in your LPJmL run
configuration to generate output files in “meta” format. LPJmL input
files can also be created in “meta” format."fmt" : "clm" in your LPJmL run configuration.
Some optional meta data (e.g. band_names) need to be
specified manually while the basic information about file structure is
derived automatically from the file header.# Read monthly runoff data with header.
runoff <- read_io("./output/runoff.clm",
# If the clm version is lower than 4 set nbands and nstep
# manually so that month dimension is recognized correctly.
nbands = 1,
nstep = 12,
# Useful additional information that is not needed to read the
# Data.
variable = "runoff",
descr = "monthly runoff",
unit = "mm/month")"fmt" : "cdf" in
your LPJmL run configuration. Theoretically, all meta data (such as
band_names) can be extracted from cdf files, however there
is no universal netcdf standard that everyone adheres to. This means,
that dimensions (such as latitude and longitude bands as well as time)
are not always called the same. In such cases lpjmlkit tries to guess
the correct dimension, but this can fail. If a cdf with leapdays is read
in, February 29 will simply be deleted, to make sure the data is conform
with LPJmL, where the year always has 365 days. Timestamps are taken
directly from the netcdf, so might be different from LPJmL defaults
(e.g. end-of-month instead of mid-month).# Read yearly npp output file.
npp <- read_io("./output/npp.nc") |>
# to convert from lat-lon into the lpjml grid, attach a grid and transform
lpjmlkit::add_grid(grid_file) |>
lpjmlkit::transform(to = "cell")"fmt" : "raw" in your LPJmL run configuration). These
files include no meta data about their structure or contents and should
therefore be combined with the "output_metafile" : true
setting to generate a corresponding “meta” file. Otherwise, all meta
data need to be specified by the user. Historically, some LPJmL input
files use “raw” format.# Read monthly runoff raw data.
runoff <- read_io("./output/runoff.bin",
# Specify all meta data if they differ from the function
# default values.
...)
read_io returns an object of an R6 class LPJmLData with two
main attributes, $data and $meta:
class base::array - returns
the data array with default dimensions “cell”, “time” and “band”runoff$data
# , , band = 1
#
# time
# cell 1901-01-31 1901-02-28 1901-03-31 1901-04-30
# 0 2.427786e+02 1.265680e+02 2.279087e+02 2.027685e+02
# 1 4.189225e-14 1.032507e-16 0.000000e+00 0.000000e+00
# 2 3.860512e-14 0.000000e+00 0.000000e+00 0.000000e+00
# 3 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
# 4 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
# 5 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00LPJmLMetaData
- returns the corresponding meta data
(e.g. runoff$meta$unit)runoff$meta
# $sim_name "lu_cf"
# $source "LPJmL C Version 5.3.001"
# $history "./LPJmL_internal/bin/lpjml ./configurations/config_lu_cf.json"
# $variable "runoff"
# $descr "monthly runoff"
# $unit "mm/month"
# $nbands 1
# $nyear 111
# $firstyear 1901
# $lastyear 2011
# $nstep 12
# $timestep 1
# $ncell 67420
# $firstcell 0
# $cellsize_lon 0.5
# $cellsize_lat 0.5
# $datatype "float"
# $scalar 1
# $order "cellseq"
# $bigendian FALSE
# $format "raw"
# $filename "runoff.bin"
To get an overview of the data, LPJmLData supports the
usage of the base functions: length(), dim(),
dimension(), summary() and
plot(). More methods can be added in the
future.
# Self print; also via print(runoff).
runoff
# $meta |>
# .$sim_name "lu_cf"
# .$variable "runoff"
# .$descr "monthly runoff"
# .$unit "mm/month"
# .$nbands 1
# .$nyear 111
# .$nstep 12
# .$timestep 1
# .$ncell 67420
# .$cellsize_lon 0.5
# .$cellsize_lat 0.5
# Note: not printing all meta data, use $meta to get all.
# $data |>
# dimnames() |>
# .$cell "0" "1" "2" "3" ... "67419"
# .$time ""1901-01-31" "1901-02-28" "1901-03-31" "1901-04-30" ... "2011-12-31"
# .$band "1"
# $summary()
# 1
# Min. : 0.0000
# 1st Qu.: 0.0619
# Median : 4.4320
# Mean : 28.7658
# 3rd Qu.: 27.5627
# Max. :2840.9602
# Note: summary is not weighted by grid area.
# Return the dimension length of $data array; dimnames function is also available.
dim(runoff)
# cell time band
# 67420 1332 1
# Plot as maps or time series, depending on the dimensions.
plot(runoff)
Each LPJmLData object comes with a bundle of methods to modify its
state: add_grid(), transform() and
subset().
add_grid() Adds a
$grid attribute (as an LPJmLData object) to the object,
providing the spatial reference (longitude and latitude) for each
cell.# Object- oriented (R6 class) notation (assigning grid directly to runoff)
runoff$add_grid()
# Common R notation (overwriting the original object)
runoff <- add_grid(runoff)
# Use the read_io arguments if a grid file cannot be detected automatically.
runoff <- add_grid(runoff, "./output/grid.clm")transform() the $data
dimensions.$grid attribute (see add_grid above). Any
transformation does not change the contents of the data, only the
structure.# Transform into lon and lat dimensions. If add_grid has not been executed
# before it is called implicitly.
runoff <- transform(runoff, to = "lon_lat")
runoff
# [...]
# $data |>
# dimnames() |>
# .$lat "-55.75" "-55.25" "-54.75" "-54.25" ... "83.75"
# .$lon "-179.75" "-179.25" "-178.75" "-178.25" ... "179.75"
# .$time "1901-01-31" "1901-02-28" "1901-03-31" "1901-04-30" ... "2011-12-31"
# .$band "1"
# [...]
# Transform into year and month dimensions (day not available for monthly
# runoff)
runoff <- transform(runoff, to = "year_month_day")
runoff
# [...]
# $data |>
# dimnames() |>
# .$lat "-55.75" "-55.25" "-54.75" "-54.25" ... "83.75"
# .$lon "-179.75" "-179.25" "-178.75" "-178.25" ... "179.75"
# .$month "1" "2" "3" "4" ... "12"
# .$year "1901" "1902" "1903" "1904" ... "2011"
# .$band "1"
# [...]
# Transform back to original dimensions.
runoff <- transform(runoff, to = c("cell", "time"))
runoff
# [...]
# $data |>
# dimnames() |>
# .$cell "0" "1" "2" "3" ... "67419"
# .$time "1901-01-31" "1901-02-28" "1901-03-31" "1901-04-30" ... "2100-12-31"
# .$band "1"
# [...]subset() the
$data.$data dimensions as keys and names or indices as values
to subset $data. $meta data are adjusted
according to the subset. Applying a subset changes the contents of the
data and cannot be reversed.# Subset by dimnames (character string).
runoff <- subset(runoff, time = "1991-05-31")
runoff
# $meta |>
# .$nyear 1
# .$ncell 67420
# .$subset TRUE
# [...]
# Note: not printing all meta data, use $meta to get all.
# $data |>
# dimnames() |>
# .$cell "0" "1" "2" "3" ... "67419"
# .$time "1991-05-31"
# .$band "1"
# [...]
# Subset by indices
runoff <- subset(runoff, cell = 28697:28700)
runoff
# $meta |>
# .$nyear 1
# .$ncell 4
# .$subset TRUE
# [...]
# Note: not printing all meta data, use $meta to get all.
# $data |>
# dimnames() |>
# .$cell "28696" "28697" "28698" "28699"
# .$time "1991-05-31"
# .$band "1"
# [...]
Finally, LPJmLData objects can be exported into common R data formats:
array, tibble, raster and
terra.
More export methods can be added in the future.
as_array() Export $data
as an array. In addition to simply returning the $data
element of an LPJmLData object, as_array provides
functionalities to subset and aggregate $data. Subsetting
is conducted before aggregation.# Export as an array with subset of first 6 time steps and aggregation along
# the dimension cell (mean).
as_array(runoff,
subset = list(time = 1:6),
aggregate = list(cell = mean))
# band
# time 1
# 1901-01-31 19.49611
# 1901-02-28 20.28368
# 1901-03-31 27.93595
# 1901-04-30 36.90505
# 1901-05-31 39.38885
# 1901-06-30 32.80252as_tibble() Export $data
as a tibble
object, providing the same additional subsetting and aggregation
functionality as as_array.# Export as a tibble with subset of first 6 time steps
as_tibble(runoff, subset = list(time = 1:6))
# # A tibble: 404,520 × 4
# cell time band value
# <fct> <fct> <fct> <dbl>
# 1 0 1901-01-31 1 184.
# 2 1 1901-01-31 1 0
# 3 2 1901-01-31 1 0
# 4 3 1901-01-31 1 0
# 5 4 1901-01-31 1 0
# 6 5 1901-01-31 1 0
# 7 6 1901-01-31 1 0
# 8 7 1901-01-31 1 0
# 9 8 1901-01-31 1 0
# 10 9 1901-01-31 1 0
# # … with 404,510 more rowsas_raster() /
as_terra() Export $data as a
raster
or a terra object
(successor of raster), providing the same additional subsetting and
aggregation functionality as as_array(). as_raster()
returns a RasterLayer for a single data field and a RasterBrick if the
result contains more than one band or more than one time step.# Export the first time step as a RasterLayer object from the raster package.
as_raster(runoff, subset = list(time = 1))
# class : RasterLayer
# dimensions : 280, 720, 201600 (nrow, ncol, ncell)
# resolution : 0.5, 0.5 (x, y)
# extent : -180, 180, -56, 84 (xmin, xmax, ymin, ymax)
# crs : +proj=longlat +datum=WGS84 +no_defs # nolint:commented_code_linter.
# source : memory
# names : runoff
# values : -1.682581e-13, 671.8747 (min, max)
# Export the first time step as a terra SpatRaster object.
as_terra(runoff, subset = list(time = 1))
# class : SpatRaster
# dimensions : 280, 720, 1 (nrow, ncol, nlyr)
# resolution : 0.5, 0.5 (x, y)
# extent : -180, 180, -56, 84 (xmin, xmax, ymin, ymax)
# coord. ref. : lon/lat WGS 84 (EPSG:4326)
# source : memory
# name : runoff
# min value : -1.682581e-13
# max value : 6.718747e+02
# unit : mm/month # nolint:commented_code_linter.
# Export the first 4 times step as a RasterBrick object.
as_raster(runoff, subset = list(time = 1:4))
# class : RasterBrick
# dimensions : 280, 720, 201600, 4 (nrow, ncol, ncell, nlayers)
# resolution : 0.5, 0.5 (x, y)
# extent : -180, 180, -56, 84 (xmin, xmax, ymin, ymax)
# crs : +proj=longlat +datum=WGS84 +no_defs # nolint:commented_code_linter.
# source : memory
# names : X1901.01.31, X1901.02.28, X1901.03.31, X1901.04.30
# min values : -1.682581e-13, -1.750495e-13, -2.918900e-13, -1.516298e-13
# max values : 671.8747, 785.2363, 828.2853, 987.4359
# Export the first 4 time steps as a terra SpatRaster object.
as_terra(runoff, subset = list(time = 1:4))
# class : SpatRaster
# dimensions : 280, 720, 4 (nrow, ncol, nlyr)
# resolution : 0.5, 0.5 (x, y)
# extent : -180, 180, -56, 84 (xmin, xmax, ymin, ymax)
# coord. ref. : lon/lat WGS 84 (EPSG:4326)
# source : memory
# names : 1901-01-31, 1901-02-28, 1901-03-31, 1901-04-30
# min values : -1.682581e-13, -1.750495e-13, -2.918900e-13, -1.516298e-13
# max values : 6.718747e+02, 7.852363e+02, 8.282853e+02, 9.874359e+02
# unit : mm/month, mm/month, mm/month, mm/month
# time (days) : 1901-01-31 to 1901-04-30
More helpful functionality included with LPJmL Data:
read_meta() to read meta information from meta and
header files as LPJmLMetaData objects.
LPJmLMetaData are usually attached to an
LPJmLData object but can also be used to gain information
about an LPJmL input or output file without reading the data.
LPJmLMetaData objects can be exported as
as_list and as_header to create header objects
or write header files.
read_header(), write_header(),
get_headersize(), get_datatype() provide
low-level interaction with LPJmL input and output files primarily in
“clm” format.
npp <- read_io(filename = "./output/npp.bin.json",
subset = list(year = as.character(1970:2011)))
# Transform "time" into "year" and "month" dimensions.
npp$transform(to = "year_month_day")
# Plot timeseries with aggregated cell and month dimensions. Note that spatial
# aggregation across cells is not area-weighted.
plot(npp,
aggregate = list(cell = mean, month = sum))
# Also available as data array.
global_npp_trend <- as_array(npp,
aggregate = list(cell = mean, month = sum))
runoff <- read_io(filename = "./output/runoff.bin.json",
subset = list(year = as.character(2002:2011)))
# Usage of pipe operator operator |> (|> via package magrittr R version < 4.1)
runoff |>
# Transform the time and space dimensions ...
transform(to = c("year_month_day", "lon_lat")) |>
# ... to subset summer months as well as northern hemisphere (positive)
# latitudes.
subset(month = 6:9,
lat = as.character(seq(0.25, 83.75, by = 0.5))) |>
# for plotting sum up summer month and take the average over the years
plot(aggregate = list(year = mean, month = sum))
# Coordinates for cells around Potsdam.
coordinates <- tibble::tibble(lat = as.character(c(52.25, 52.400922, 53.25)),
lon = as.character(c(12.75, 13.03638, 12.75)))
# Complete pipe notation, from reading to plotting data.
read_io(
filename = glue("./cftfrac.bin.json"),
subset = list(year = as.character(2000:2018))
) |>
transform(to = "lon_lat") |>
# Special case for subsetting of lat and lon pairs
subset(coords = coordinates) |>
# Mean across spatial dimensions
plot(aggregate = list(lon = mean, lat = mean))
LPJmLData and LPJmLMetaData objects are
closed environments, each of an R6 class, that function as a data
container.Otherwise, my_copy and lpjml_data point to
the same environment, and any subsetting or transformation methods
applied to my_copy will also affect
lpjml_data.
Do not try to manually overwrite either the $data or
any $meta data attributes within LPJmLData
objects. It is either not possible or can mess up the integrity of the
object. Methods surrounded by double underscores
($.__<method>__) or attributes surrounded by
underscores ($._<attribute>_) are only for low-level
package development and should not be used by users for their data
handling.
When performance is important, choose R6 method notation
runoff$transform(to = "lon_lat") over common R notation
transform(runoff, to = "lon_lat").
The “meta” format is only supported by recent LPJmL versions.
When comparing older (< LPJmL version 5.3) output data with LPJmL 5.3
output data it can be useful to combine meta
("output_metafile" : true) with the header file format
("fmt": "clm"), which has been supported since LPJmL
version 4, for simplification of process pipelines.