3.1. The cfg-file¶
The main model settings need to be specified in a configuration file (
This file looks like this.
[general] input_dir=./path/to/input_data output_dir=./path/to/store/output # 1: all data. 2: leave-one-out model. 3: single variable model. 4: dubbelsteenmodel # Note that only 1 supports sensitivity_analysis model=1 verbose=True [settings] # start year y_start=2000 # end year y_end=2012 [PROJ_files] # cfg-files proj_nr_1=./path/to/projection/settings_proj.cfg [pre_calc] # if nothing is specified, the XY array will be stored in output_dir # if XY already pre-calculated, then provide path to npy-file XY= [extent] shp=folder/with/polygons.shp [conflict] # either specify path to file or state 'download' to download latest PRIO/UCDP dataset conflict_file=folder/with/conflict_data.csv min_nr_casualties=1 # 1=state-based armed conflict. 2=non-state conflict. 3=one-sided violence type_of_violence=1,2,3 [climate] shp=folder/with/climate_zones.shp # define either one or more classes (use abbreviations!) or specify nothing for not filtering zones= code2class=folder/with/classification_codes.txt [data] # specify the path to the nc-file, whether the variable shall be log-transformed (True, False), and which statistical function should be applied # these three settings need to be separated by a comma # NOTE: variable name here needs to be identical with variable name in nc-file # NOTE: only statistical functions supported by rasterstats are valid precipitation=folder/with/precipitation_data.nc,False,mean temperature=folder/with/temperature_data.nc,False,mean population=folder/with/population_data.nc,True,sum [machine_learning] # choose from: MinMaxScaler, StandardScaler, RobustScaler, QuantileTransformer scaler=QuantileTransformer # choose from: NuSVC, KNeighborsClassifier, RFClassifier model=RFClassifier train_fraction=0.7 # number of repetitions n_runs=10
All paths for
output_dir, and in
[PROJ_files] are relative to the location of the cfg-file.
Empty spaces should be avoided in the cfg-file, besides for those lines commented out with ‘#’.
3.2. The sections¶
Here, the different sections are explained briefly.
input_dir: (relative) path to the directory where the input data is stored. This requires all input data to be stored in one main folder, sub-folders are possible.
output_dir: (relative) path to the directory where output will be stored.
If the folder does not exist yet, it will be created.
CoPro will automatically create the sub-folders
_REF for output for the reference run, and
_PROJ for output from the (various) projection runs.
model: the type of simulation to be run can be specified here. Currently, for different models are available:
‘all data’: all variable values are used to fit the model and predict results.
‘leave one out’: values of each variable are left out once, resulting in n-1 runs with n being the number of variables. This model can be used to identify the relative influence of one variable within the variable set/
‘single variables’: each variable is used as sole predictor once. With this model, the explanatory power of each variable on its own can be assessed.
‘dubbelsteen’: the relation between variables and conflict are abolished by shuffling the binary conflict data randomly. By doing so, the lower boundary of the model can be estimated.
All model types except ‘all_data’ will be deprecated in a future release.
verbose: if True, additional messages will be printed.
y_start: the start year of the reference run.
y_end: the end year of the reference run.
The period between
y_end will be used to train and test the model.
y_proj: the end year of the projection run.
The period between
y_proj will be used to make annual projections.
A key section. Here, one (slightly different) cfg-file per projection needs to be provided. This way, multiple projection runs can be defined from within the “main” cfg-file.
The conversion is that the projection name is defined as value here. For example, the projections “SSP1” and “SSP2” would be defined as
A cfg-file for a projection is shorter than the main cfg-file used as command line argument and looks like this:
[general] input_dir=./path/to/input_data verbose=True [settings] # year for which projection is to be made y_proj=2050 [data] # specify the path to the nc-file, whether the variable shall be log-transformed (True, False), and which statistical function should be applied # these three settings need to be separated by a comma # NOTE: variable name here needs to be identical with variable name in nc-file # NOTE: only statistical functions supported by rasterstats are valid precipitation=folder/with/precipitation_data.nc,False,mean temperature=folder/with/temperature_data.nc,False,mean population=folder/with/population_data.nc,True,sum
XY: if the XY-data was already pre-computed in a previous run and stored as npy-file, it can be specified here and will be loaded from file to save time.
If nothing is specified, the model will save the XY-data by default to the output directory as
shp: the provided shape-file defines the boundaries for which the model is applied.
At the same time, it also defines at which aggregation level the output is determined.
The shp-file can contain multiple polygons covering the study area. Their size defines the output aggregation level. It is also possible to provide only one polygon, but model behaviour is not well tested for this case.
conflict_file: path to the csv-file containing the conflict dataset.
It is also possible to define
download, then the latest conflict dataset (currently version 20.1) is downloaded and used as input.
min_nr_casualties: minimum number of reported casualties required for a conflict to be considered in the model.
type_of_violence: the types of violence to be considered can be specified here.
Multiple values can be specified. Types of violence are:
‘state-based armed conflict’: a contested incompatibility that concerns government and/or territory where the use of armed force between two parties, of which at least one is the government of a state, results in at least 25 battle-related deaths in one calendar year.
‘non-state conflict’: the use of armed force between two organized armed groups, neither of which is the government of a state, which results in at least 25 battle-related deaths in a year.
‘one-sided violence’: the deliberate use of armed force by the government of a state or by a formally organized group against civilians which results in at least 25 deaths in a year.
CoPro currently only works with UCDP data.
shp: the provided shape-file defines the areas of the different Köppen-Geiger climate zones.
zones: abbreviations of the climate zones to be considered in the model.
Can either be ‘None’ or one or multiple abbreviations.
code2class: converting the abbreviations to class-numbers used in the shp-file.
The code2class-file should not be altered!
In this section, all variables to be used in the model need to be provided.
The paths are relative to
Only netCDF-files with annual data are supported.
The main convention is that the name of the file agrees with the variable name in the file.
For example, if the variable
precipitation is provided in a nc-file, this should be noted as follows
CoPro furthermore requires information whether the values sampled from a file are ought to be log-transformed.
Besides, it is possible to define a statistical function that is applied when sampling from file per polygon of the
CoPro makes use of the
zonal_stats function available within rasterstats.
To determine the log-scaled mean value of precipitation per polygon, the following notation is required:
scaler: the scaling algorithm used to scale the variable values to comparable scales.
Currently supported are
model: the machine learning algorithm to be applied.
Currently supported are
train_fraction: the fraction of the XY-data to be used to train the model.
The remaining data (1-train_fraction) will be used to predict and evaluate the model.
n_runs: the number of classifiers to use.