---
title: "Fitting curves to data (in magclass format) and plotting the curves to nice graphs and pdf"
author: "David Klein"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Fitting curves to data (in magclass format) and plotting the curves to nice graphs and pdf}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, echo = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
```

# Purpose and Functionality
With this package you can fit a curve to your data. It is a wrapper around the `optim()` function from R's stats package and offers an easy-to-use interface to data in magclass format. Use the main function `emulator()` of this package to fit a curve of the form you can provide. In this example we use `y = a + b * x ^c`, where `x` and `y` are the data you provide and `a`,`b`, and `c` are the coefficients the `emulator()` function will calculate using the `optim()` function from R's stats package. The `emulator()´ function generates a separate fit for each region and each year. The resulting curves together with the raw data will be plotted and saved to figures and compiled in a pdf. See at the bottom how to define your own functional form.

# Input data
The raw input data for which the `emulator()´ function will calculate the fits needs to be provided in the magclass format. It needs to have at least three variables:

* data that will be used for `x` and `y` in the fit
* a variable that provides the codes of the GAMS model status (<https://www.gams.com/24.8/docs/userguides/mccarl/modelstat_tmodstat.htm>). Data from infeasible years and their successors will be excluded from the fit.

The `emulator()` function requires the data to be of a particular format. There are two alternative formats allowed. The following Examples section explains the differences and gives examples how to apply the emulator function.

# Examples

In some cases MAgPIE gets infeasible in certain years. Since the data from this year and the years after is not meaningful the `emulator()` function offers the option to ignore these data. To do so it needs information about which years are infeasible. Our example data already contains this information for the region `GLO` since MAgPIE does solve all regions at once and not individually. However, the `emulator()` function performs the fits for each year and each region individually. Therefore, we need to transfer the global modelstatus from `GLO` to all regions.

```{r, echo=TRUE, eval=FALSE}
library(remulator)
library(magclass)
emudata[,,"Modelstatus (-)"] <- emudata["GLO",,"Modelstatus (-)"]
```

## Alternative A: Calculate emulator for one single case consisting of all scenarios present in the data
This requires the data to have the structure as displayed below: region, year, scenario, model, variable.

```{r, echo = TRUE, eval=FALSE}
str(emudata)
# Formal class 'magpie' [package "magclass"] with 1 slot
#   ..@ .Data: num [1:11, 1:11, 1:438] 0.228 NA NA NA 6.439 ...
#   .. ..- attr(*, "dimnames")=List of 3
#   .. .. ..$ region                 : chr [1:11] "AFR" "CPA" "EUR" "FSU" ...
#   .. .. ..$ year                   : chr [1:11] "y2005" "y2015" "y2025" "y2035" ...
#   .. .. ..$ scenario.model.variable: chr [1:438] "CDL_base-base-1.MAgPIE.Primary Energy|Biomass|Energy Crops (EJ/yr)" "CDL_base-base-10.MAgPIE.Primary Energy|Biomass|Energy Crops (EJ/yr)" "CDL_base-base-11.MAgPIE.Primary Energy|Biomass|Energy Crops (EJ/yr)" "CDL_base-base-12.MAgPIE.Primary Energy|Biomass|Energy Crops (EJ/yr)" ...
```

Note: the data has no "sample" dimension (see alternative B), just scenario, model, and variable. All scenarios in your data will be lumped together and the fit will be performed on all data points from all scenarios. 

When calling the `emulator()` funtion you need to provide

* the input data set, 
* the name of the variables (present in your data) that will be treated as `x` and `y` in the fit, and 
* the name of the variable that holds the model status. 

There are more arguments to the `emulator()` function, all of which have defaults (please see `?emulator`). All results will be saved to `output_path` (which is "emulator" by default). Optionally you can pass further arguments to the `optim()` function, such as lower or upper bounds on the fit coefficients. The `emulator()` function performs the following steps:

* it sorts out data of infeasible years and all years following an infeasible year
* for each region and each year it fits a curve of the form `y = a + b * x ^c`, where `x` and `y` are the data you provide and `a`,`b`, and `c` are the coefficients the `emulator()` function will calculate using the `optim()` function from R's stats package
* using the fit coefficients from above it will calculate continous curves for plotting
* it plots the curves and saves them to png files
* it additionally compiles the pngs to a pdf file (if `calc_pdf=TRUE`)

```{r, echo = TRUE, eval=FALSE}
emulator(data=emudata,
         name_x="Primary Energy|Biomass|Energy Crops (EJ/yr)",
         name_y="Price|Primary Energy|Biomass (US$2005/GJ)",
         name_modelstat="Modelstatus (-)",
         treat_as_feasible = c(2,7),
         output_path = "single_case",
         lower=c(0,0,1))
```

## Alternative B: Calculate emulator for multiple cases. Distinguish cases by scenarios present in the data.
If your data contains the additional dimension named "sample", the `emulator()` function will distinguish the data by the scenario names and perform the fit based on the data points of all samples per scenario. Thus, there will be fits for each scenario. This requires the data to have the following structure including a "sample" dimension

```{r, echo = TRUE}
# Formal class 'magpie' [package "magclass"] with 1 slot
# ..@ .Data: num [1:11, 1:11, 1:21462] 1.013 0.119 0.0419 0.1572 0.3047 ...
# .. ..- attr(*, "dimnames")=List of 3
# .. .. ..$ region                        : chr [1:11] "AFR" "CPA" "EUR" "FSU" ...
# .. .. ..$ year                          : chr [1:11] "y2005" "y2015" "y2025" "y2035" ...
# .. .. ..$ scenario.sample.model.variable: chr [1:21462] "CDL_base-base.1.MAgPIE.Emissions|BC|Land Use (Mt BC/yr)" 
```

We use the data from above that has no native sample dimension and add the missing sample dimension by separating the numbers from the 
scenario names and putting them into the new "sample" dimension. This results in two scenarios with 73 samples each.

Magclass objects distinguish dimensions by dots in the variable names. Thus, we add a new dimension to magclass object by inserting a dot. In this case we do this by replacing the dash in the scenario name with a dot, e.g. from "-63" to ".63" (see the gsub command below).

```{r, echo = TRUE,eval=FALSE}
library(magclass)
# Add dimension by replacing dashes with dots
getNames(emudata,dim="scenario") <- 
  gsub("-([0-9]{1,2}$)",".\\1",getNames(emudata,dim="scenario"))
# Naming the dimensions
getSets(emudata) <- c("region","year","scenario","sample","model","variable")
```

We call the `emulator()` function the same way (except that for our example we choose a different output directory). The function will automatically detect the additional sample dimension and perform all procedures for each scenario:

```{r, echo = TRUE, eval=FALSE}
emulator(data=emudata,
         name_x="Primary Energy|Biomass|Energy Crops (EJ/yr)",
         name_y="Price|Primary Energy|Biomass (US$2005/GJ)",
         name_modelstat="Modelstatus (-)",
         treat_as_feasible = c(2,7),
         output_path = "multi_case",
         lower=c(0,0,1))
```

The outputs from alternative A and B are different: the output of A (single case) consists of only one scenario named "default". Since in B the fits have been performed separately for the two scenarios the output folder contains two sub folders, one for each scenario.

## How to define your own function to be fitted

By default the function that is fitted to the data points is `y = a + b * x ^c`. You can define your own function and pass it to the `emulator()` function. Please use the following syntax to define your function: each fit coefficient is given by `param[[i]]` with `i` being the ordinal number of the coefficient (starting with 1). The definition of the default function `y = a + b * x ^c` looks as follows:

```{r, echo = TRUE, eval=FALSE}
beautiful_shape <- function(param,x)return(param[[1]] + param[[2]] * x ^param[[3]])
```

A linear fit would look like

```{r, echo = TRUE, eval=FALSE}
beautiful_shape <- function(param,x)return(param[[1]] + param[[2]] * x)
```

Then pass your function to the `emulator()` function. NOTE: If you provide your own function you need to provide an initial value for each of the fit coefficients using the parameter `initial_values`, in this case with two parameters  `initial_values=c(0,1)`!

```{r, echo = TRUE, eval=FALSE}
emulator(data=emudata,
         name_x="Primary Energy|Biomass|Energy Crops (EJ/yr)",
         name_y="Price|Primary Energy|Biomass (US$2005/GJ)",
         name_modelstat="Modelstatus (-)",
         treat_as_feasible = c(2,7),
         userfun=beautiful_shape,
         initial_values=c(0,1),
         output_path = "multi_case",
         lower=c(0,0))
```

# Advanced

## Replace flat fits

In some cases the fitting algorithm fails and generates flat fits. To replace these flat fits with fits from other years use the `replace_flat_fits()` function. The example below shows how to apply the function:

```{r, eval=FALSE}
setwd("~/Documents/0_GIT/magpie4-model/")
library(magclass)
library(remulator)
replace_flat_fits("output/emulator/SSP2-BASE-Base/linear/data_postfit_SSP2-BASE-Base.Rdata")
replace_flat_fits("output/emulator/SSP2-26/linear/data_postfit_SSP2-26.Rdata")
replace_flat_fits("output/emulator/SSP2-Ref/linear/data_postfit_SSP2-Ref.Rdata")

```

## Compare supplycurves

Use the `plot_compare_supplycurves()` function to compare multiple supplycurves. See example below:

```{r, eval=FALSE}
setwd("~/Documents/0_GIT/magpie4-model/output/emulator")
plot_compare_supplycurves(folders=c("SSP2-Ref","SSP2-26","SSP2-26-statfood"),pdfname = "comparison.pdf")
```