Title: | Tools for data.table objects in the REMIND context |
---|---|
Description: | Helper functions for REMIND-related tasks with data.table objects, e.g., interpolation and (dis-)aggregation. |
Authors: | Alois Dirnaichner [aut, cre] |
Maintainer: | Alois Dirnaichner <[email protected]> |
License: | GPL-3 |
Version: | 0.6.0 |
Built: | 2024-10-15 04:00:24 UTC |
Source: | https://github.com/pik-piam/rmndt |
Helper functions for REMIND-related tasks with data.table objects, e.g., interpolation and (dis-)aggregation.
Maintainer: Alois Dirnaichner [email protected]
Aggregate values in a data.table object using a mapping. If no weight is given, the value for the aggregated categories is the sum of the parts. Otherwise, the weight is used to calculate a weighted average accross the parts.
aggregate_dt( data, mapping, fewcol = "region", yearcol = "year", manycol = "iso", datacols = "data", valuecol = "value", weights = NULL, weightcol = "weight" )
aggregate_dt( data, mapping, fewcol = "region", yearcol = "year", manycol = "iso", datacols = "data", valuecol = "value", weights = NULL, weightcol = "weight" )
data |
a magpie object. |
mapping |
a mapping between the aggregated categories in the data and ISO3 countrycodes. *All* regions in 'data' have to be part of the mapping. |
fewcol |
name of the column containing aggregated categories. Default is "region". |
yearcol |
name of the column containing time step info. Default is "year". |
manycol |
name of the column containing dis-aggregated categories. Default is "iso". |
datacols |
index columns that label categories which have to be treated seperately when aggregating with a weight. |
valuecol |
name of the column with the value to aggregate, default is 'value'. |
weights |
table with weights for a (weighted average) aggregation, the name of the column with the aggregated categories has to be 'manycol'. If columns (other than the column with the aggregated category) of the 'weights' coincide with columns of the data, the respective columns are considered when joining. |
weightcol |
column with the weights for aggregation, default is 'weight'. |
Internal function to apply the weights and perform some checks.
apply_weights( data, mapping, weights, fewcol, manycol, valuecol, datacols, weightcol )
apply_weights( data, mapping, weights, fewcol, manycol, valuecol, datacols, weightcol )
data |
a data.table. |
mapping |
a mapping between the aggregated categories and their parts. *All* aggregated categories in 'data' have to be part of the mapping. |
weights |
table with weights for disaggregation, the name of the column with the aggregated categories has to be 'manycol'. If columns (other than the column with the aggregated category) of the 'weights' coincide with columns of the data, the respective columns are considered when joining. |
fewcol |
name of the column containing aggregated categories. Default is "region". |
manycol |
name of the column containing dis-aggregated categories. Default is "iso". |
valuecol |
name of the column with the actual value to disaggregate, default is 'value'. |
datacols |
index columns that label categories which have to be treated seperately when dis-aggregating with a weight. |
weightcol |
column with the weights for the dis-aggregation, default is 'weight'. |
Similar to, but not quite like, 'stats::approx'. Does only support constant extrapolation and linear interpolation. The resulting 'data.table' only contains the range provided by 'xdata' along 'xcol'. Without extrapolation, 'xcol' in the resulting 'data.table' may not cover the range given by 'xdata'.
approx_dt( dt, xdata, xcol, ycol, idxcols = NULL, keepna = FALSE, extrapolate = FALSE )
approx_dt( dt, xdata, xcol, ycol, idxcols = NULL, keepna = FALSE, extrapolate = FALSE )
dt |
a data.table. |
xdata |
the range to interpolate to. This is the range the result will have along the dimension 'xcol'. |
xcol |
name of the column for interpolation. |
ycol |
name of the column that contains the value to be interpolated. |
idxcols |
columns that identify a row (besides xcol), i.e., the remaining index dimensions. |
keepna |
keep NA values for rows that can not be interpolated (since they are outside of [min(xcol), max(xcol)]), default is FALSE. |
extrapolate |
use the closest values to fill 'ycol' outside of the interpolation domain, default is FALSE. This will also work if there is only one value along 'ycol', i.e., no interpolation is taking place. |
a data.table with the range given by 'xdata' along 'xcol'. Columns not given in 'idxcols' will be kept but NAs will appear on extrapolated and interpolated rows.
dt <- as.data.table(ChickWeight) ## delete all values but 1 dt[Chick == 1 & Time > 0, weight := NA] ## delete all values but 2 dt[Chick == 2 & Time > 2, weight := NA] ## extrapolation from 1 value approx_dt(dt, 0:21, "Time", "weight", idxcols=c("Chick", "Diet"), extrapolate = TRUE)[Chick == 1] ## extrapolation and interpolation approx_dt(dt, 0:21, "Time", "weight", idxcols=c("Chick", "Diet"), extrapolate = TRUE)[Chick == 2] ## column not in idxcols approx_dt(dt, 0:21, "Time", "weight", idxcols="Chick", extrapolate = TRUE)[Chick == 2] dt <- as.data.table(ChickWeight) ## interpolation only approx_dt(dt, 0:21, "Time", "weight", idxcols=c("Chick", "Diet"))[Chick == 2]
dt <- as.data.table(ChickWeight) ## delete all values but 1 dt[Chick == 1 & Time > 0, weight := NA] ## delete all values but 2 dt[Chick == 2 & Time > 2, weight := NA] ## extrapolation from 1 value approx_dt(dt, 0:21, "Time", "weight", idxcols=c("Chick", "Diet"), extrapolate = TRUE)[Chick == 1] ## extrapolation and interpolation approx_dt(dt, 0:21, "Time", "weight", idxcols=c("Chick", "Diet"), extrapolate = TRUE)[Chick == 2] ## column not in idxcols approx_dt(dt, 0:21, "Time", "weight", idxcols="Chick", extrapolate = TRUE)[Chick == 2] dt <- as.data.table(ChickWeight) ## interpolation only approx_dt(dt, 0:21, "Time", "weight", idxcols=c("Chick", "Diet"))[Chick == 2]
Disaggregate data in a data.table object using a mapping. If no weights are given, the value for the aggregated categories is used on the disaggregated ones. If a weight is given, the values from the aggregated categories are distributed according to the weights.
disaggregate_dt( data, mapping, fewcol = "region", manycol = "iso", valuecol = "value", datacols = "data", weights = NULL, weightcol = "weight" )
disaggregate_dt( data, mapping, fewcol = "region", manycol = "iso", valuecol = "value", datacols = "data", weights = NULL, weightcol = "weight" )
data |
a data.table. |
mapping |
a mapping between the aggregated categories and their parts. *All* aggregated categories in 'data' have to be part of the mapping. |
fewcol |
name of the column containing aggregated categories. Default is "region". |
manycol |
name of the column containing dis-aggregated categories. Default is "iso". |
valuecol |
name of the column with the actual value to disaggregate, default is 'value'. |
datacols |
index columns that label categories which have to be treated seperately when dis-aggregating with a weight. |
weights |
table with weights for disaggregation, the name of the column with the aggregated categories has to be 'manycol'. If columns (other than the column with the aggregated category) of the 'weights' coincide with columns of the data, the respective columns are considered when joining. |
weightcol |
column with the weights for the dis-aggregation, default is 'weight'. |
Load a magpie object as data.table object with given colnames. Replaces years by numeric values, removing the leading y.
magpie2dt( data, regioncol = NULL, yearcol = NULL, datacols = NULL, valcol = "value" )
magpie2dt( data, regioncol = NULL, yearcol = NULL, datacols = NULL, valcol = "value" )
data |
a magpie object. |
regioncol |
name of the column containing REMIND regions, default is "region". |
yearcol |
name of the column containing the year, default is "year". |
datacols |
the names of the data dimension(s) of the magpie object. If no value is given, the name provided in the magpie object is used. |
valcol |
column to host actual value, default is "value" |
## Not run: require(magpie) dt <- magpie2dt(population_magpie) ## End(Not run)
## Not run: require(magpie) dt <- magpie2dt(population_magpie) ## End(Not run)
REMIND style output files are semi-colon separated CSVs with a trailing semi-colon at the end of each row. The following structure is assumed: Columns "Model", "Scenario", "Region", "Variable", "Unit" and an arbitrary number of year colums (convertable to numeric).
readMIF(mif)
readMIF(mif)
mif |
A REMIND output file (.MIF) |
## Not run: dt <- readMIF("REMIND_generic_default.mif") ## End(Not run)
## Not run: dt <- readMIF("REMIND_generic_default.mif") ## End(Not run)
A random REMIND FE trajectory.
REMIND_FinalEnergy
REMIND_FinalEnergy
A data frame with 2011 rows and 6 columns:
REMIND time step
REMIND-12 region
Secondary energy identifier
Final energy identifier
Conversion technology identifier
The data column, EJ/yr
REMIND GDP Trajectories for ISO countries
REMIND_GDP
REMIND_GDP
A data frame with 9462 rows and 4 columns:
ISO3 country code
SSP GDP variant, this is gdp_SSP2 in this case
GDP in US$2010
Year
The mapping between REMIND-12 regions and ISO countries
REMIND_RegionMap
REMIND_RegionMap
A data frame with 249 rows and 3 columns:
Name of the country
ISO3 country code
REMIND region name
This assumes a *long* format with a single value column, dcasts the data.table to wide format, executes the calulation(s), melts back to long format and returns the resulting data.table with the additional column(s).
varcalc_dt(dt, varcol, valcol, expr, ...)
varcalc_dt(dt, varcol, valcol, expr, ...)
dt |
data.table, long format |
varcol |
name of the column with the variable |
valcol |
name of the column with the value |
expr |
vector of expressions to be handed to j in data.table, as strings, e.g., "a := b/c" |
... |
other arguments are passed on to the data.table call where 'expr' is evaluated. Most likely you want to pass the 'by=' parameter for group-by calls, see examples. |
Note that the data.table should have at least three columns, i.e., the variable, the value and one id column.
mt_dt <- as.data.table(mtcars, keep.rownames = TRUE) ## to long mt1 <- melt(mt_dt, id.vars=c("rn", "cyl")) varcalc_dt(mt1, "variable", "value", c("`spec. hp` := wt/hp", "wsum := sum(wt)"), by="cyl")
mt_dt <- as.data.table(mtcars, keep.rownames = TRUE) ## to long mt1 <- melt(mt_dt, id.vars=c("rn", "cyl")) varcalc_dt(mt1, "variable", "value", c("`spec. hp` := wt/hp", "wsum := sum(wt)"), by="cyl")
Note that these files are semi-colon separated CSVs with a trailing semi-colon at the end of each entry. Required columns are "Model", "Scenario", "Region", "Variable", "Unit" and an arbitrary number of year colums (should be convertable to numeric).
writeMIF(dt, destination, append = FALSE, ...)
writeMIF(dt, destination, append = FALSE, ...)
dt |
a data.table in the correct format. |
destination |
path to the resulting MIF file |
append |
append to an existing MIF file? |
... |
other parameters are passed on to data.table::fwrite |
NAs are written to the file as "N/A".
## Not run: writeMIF(dt, "REMIND_generic_default.mif") ## End(Not run)
## Not run: writeMIF(dt, "REMIND_generic_default.mif") ## End(Not run)