Fitting curves to data (in magclass format) and plotting the curves to nice graphs and pdf

Purpose and Functionality

With this package you can fit a curve to your data. It is a wrapper around the optim() function from R’s stats package and offers an easy-to-use interface to data in magclass format. Use the main function emulator() of this package to fit a curve of the form you can provide. In this example we use y = a + b * x ^c, where x and y are the data you provide and a,b, and c are the coefficients the emulator() function will calculate using the optim() function from R’s stats package. The `emulator()´ function generates a separate fit for each region and each year. The resulting curves together with the raw data will be plotted and saved to figures and compiled in a pdf. See at the bottom how to define your own functional form.

Input data

The raw input data for which the `emulator()´ function will calculate the fits needs to be provided in the magclass format. It needs to have at least three variables:

The emulator() function requires the data to be of a particular format. There are two alternative formats allowed. The following Examples section explains the differences and gives examples how to apply the emulator function.

Examples

In some cases MAgPIE gets infeasible in certain years. Since the data from this year and the years after is not meaningful the emulator() function offers the option to ignore these data. To do so it needs information about which years are infeasible. Our example data already contains this information for the region GLO since MAgPIE does solve all regions at once and not individually. However, the emulator() function performs the fits for each year and each region individually. Therefore, we need to transfer the global modelstatus from GLO to all regions.

library(remulator)
library(magclass)
emudata[,,"Modelstatus (-)"] <- emudata["GLO",,"Modelstatus (-)"]

Alternative A: Calculate emulator for one single case consisting of all scenarios present in the data

This requires the data to have the structure as displayed below: region, year, scenario, model, variable.

str(emudata)
# Formal class 'magpie' [package "magclass"] with 1 slot
#   ..@ .Data: num [1:11, 1:11, 1:438] 0.228 NA NA NA 6.439 ...
#   .. ..- attr(*, "dimnames")=List of 3
#   .. .. ..$ region                 : chr [1:11] "AFR" "CPA" "EUR" "FSU" ...
#   .. .. ..$ year                   : chr [1:11] "y2005" "y2015" "y2025" "y2035" ...
#   .. .. ..$ scenario.model.variable: chr [1:438] "CDL_base-base-1.MAgPIE.Primary Energy|Biomass|Energy Crops (EJ/yr)" "CDL_base-base-10.MAgPIE.Primary Energy|Biomass|Energy Crops (EJ/yr)" "CDL_base-base-11.MAgPIE.Primary Energy|Biomass|Energy Crops (EJ/yr)" "CDL_base-base-12.MAgPIE.Primary Energy|Biomass|Energy Crops (EJ/yr)" ...

Note: the data has no “sample” dimension (see alternative B), just scenario, model, and variable. All scenarios in your data will be lumped together and the fit will be performed on all data points from all scenarios.

When calling the emulator() funtion you need to provide

  • the input data set,
  • the name of the variables (present in your data) that will be treated as x and y in the fit, and
  • the name of the variable that holds the model status.

There are more arguments to the emulator() function, all of which have defaults (please see ?emulator). All results will be saved to output_path (which is “emulator” by default). Optionally you can pass further arguments to the optim() function, such as lower or upper bounds on the fit coefficients. The emulator() function performs the following steps:

  • it sorts out data of infeasible years and all years following an infeasible year
  • for each region and each year it fits a curve of the form y = a + b * x ^c, where x and y are the data you provide and a,b, and c are the coefficients the emulator() function will calculate using the optim() function from R’s stats package
  • using the fit coefficients from above it will calculate continous curves for plotting
  • it plots the curves and saves them to png files
  • it additionally compiles the pngs to a pdf file (if calc_pdf=TRUE)
emulator(data=emudata,
         name_x="Primary Energy|Biomass|Energy Crops (EJ/yr)",
         name_y="Price|Primary Energy|Biomass (US$2005/GJ)",
         name_modelstat="Modelstatus (-)",
         treat_as_feasible = c(2,7),
         output_path = "single_case",
         lower=c(0,0,1))

Alternative B: Calculate emulator for multiple cases. Distinguish cases by scenarios present in the data.

If your data contains the additional dimension named “sample”, the emulator() function will distinguish the data by the scenario names and perform the fit based on the data points of all samples per scenario. Thus, there will be fits for each scenario. This requires the data to have the following structure including a “sample” dimension

# Formal class 'magpie' [package "magclass"] with 1 slot
# ..@ .Data: num [1:11, 1:11, 1:21462] 1.013 0.119 0.0419 0.1572 0.3047 ...
# .. ..- attr(*, "dimnames")=List of 3
# .. .. ..$ region                        : chr [1:11] "AFR" "CPA" "EUR" "FSU" ...
# .. .. ..$ year                          : chr [1:11] "y2005" "y2015" "y2025" "y2035" ...
# .. .. ..$ scenario.sample.model.variable: chr [1:21462] "CDL_base-base.1.MAgPIE.Emissions|BC|Land Use (Mt BC/yr)" 

We use the data from above that has no native sample dimension and add the missing sample dimension by separating the numbers from the scenario names and putting them into the new “sample” dimension. This results in two scenarios with 73 samples each.

Magclass objects distinguish dimensions by dots in the variable names. Thus, we add a new dimension to magclass object by inserting a dot. In this case we do this by replacing the dash in the scenario name with a dot, e.g. from “-63” to “.63” (see the gsub command below).

library(magclass)
# Add dimension by replacing dashes with dots
getNames(emudata,dim="scenario") <- 
  gsub("-([0-9]{1,2}$)",".\\1",getNames(emudata,dim="scenario"))
# Naming the dimensions
getSets(emudata) <- c("region","year","scenario","sample","model","variable")

We call the emulator() function the same way (except that for our example we choose a different output directory). The function will automatically detect the additional sample dimension and perform all procedures for each scenario:

emulator(data=emudata,
         name_x="Primary Energy|Biomass|Energy Crops (EJ/yr)",
         name_y="Price|Primary Energy|Biomass (US$2005/GJ)",
         name_modelstat="Modelstatus (-)",
         treat_as_feasible = c(2,7),
         output_path = "multi_case",
         lower=c(0,0,1))

The outputs from alternative A and B are different: the output of A (single case) consists of only one scenario named “default”. Since in B the fits have been performed separately for the two scenarios the output folder contains two sub folders, one for each scenario.

How to define your own function to be fitted

By default the function that is fitted to the data points is y = a + b * x ^c. You can define your own function and pass it to the emulator() function. Please use the following syntax to define your function: each fit coefficient is given by param[[i]] with i being the ordinal number of the coefficient (starting with 1). The definition of the default function y = a + b * x ^c looks as follows:

beautiful_shape <- function(param,x)return(param[[1]] + param[[2]] * x ^param[[3]])

A linear fit would look like

beautiful_shape <- function(param,x)return(param[[1]] + param[[2]] * x)

Then pass your function to the emulator() function. NOTE: If you provide your own function you need to provide an initial value for each of the fit coefficients using the parameter initial_values, in this case with two parameters initial_values=c(0,1)!

emulator(data=emudata,
         name_x="Primary Energy|Biomass|Energy Crops (EJ/yr)",
         name_y="Price|Primary Energy|Biomass (US$2005/GJ)",
         name_modelstat="Modelstatus (-)",
         treat_as_feasible = c(2,7),
         userfun=beautiful_shape,
         initial_values=c(0,1),
         output_path = "multi_case",
         lower=c(0,0))

Advanced

Replace flat fits

In some cases the fitting algorithm fails and generates flat fits. To replace these flat fits with fits from other years use the replace_flat_fits() function. The example below shows how to apply the function:

setwd("~/Documents/0_GIT/magpie4-model/")
library(magclass)
library(remulator)
replace_flat_fits("output/emulator/SSP2-BASE-Base/linear/data_postfit_SSP2-BASE-Base.Rdata")
replace_flat_fits("output/emulator/SSP2-26/linear/data_postfit_SSP2-26.Rdata")
replace_flat_fits("output/emulator/SSP2-Ref/linear/data_postfit_SSP2-Ref.Rdata")

Compare supplycurves

Use the plot_compare_supplycurves() function to compare multiple supplycurves. See example below:

setwd("~/Documents/0_GIT/magpie4-model/output/emulator")
plot_compare_supplycurves(folders=c("SSP2-Ref","SSP2-26","SSP2-26-statfood"),pdfname = "comparison.pdf")