4. Output¶
4.1. Output folder structure¶
All output is stored in the output folder as specified in the configurations-file (cfg-file) under [general].
[general]
output_dir=./path/to/store/output
By default, CoPro creates two sub-folders: _REF
and _PROJ
. In the latter, another sub-folder will be created per projection defined in the cfg-file.
In the example below, this would be the folders /_PROJ/SSP1
and /_PROJ/SSP2
.
[PROJ_files]
SSP1=/path/to/ssp1.cfg
SSP2=/path/to/ssp2.cfg
4.2. List of output files¶
Important
Not all model types provide the output mentioned below. If the ‘leave-one-out’ or ‘single variable’ model are selected, only the metrics are stored to a csv-file.
4.2.1. _REF¶
In addition to the output files listed below, the cfg-file is automatically copied to the _REF folder.
selected_polygons.shp
: Shapefile containing all remaining polygons after selection procedure.
selected_conflicts.shp
: Shapefile containing all remaining conflict points after selection procedure,
XY.npy
: NumPy-array containing geometry, ID, and scaled data of sample (X) and target data (Y).
Can be provided in cfg-file to safe time in next run; file can be loaded with numpy.load().
raw_output_data.npy
: NumPy-array containing each single prediction made in the reference run.
Will contain multiple predictions per polygon. File can be loaded with numpy.load().
evaluation_metrics.csv
: Various evaluation metrics determined per repetition of the split-sample tests.
File can e.g. be loaded with pandas.read_csv().
feature_importance.csv
: Importance of each model variable in making projections.
This is a property of RF Classifiers and thus only obtainable if RF Classifier is used.
permutation_importance.csv
: Mean permutation importance per model variable.
Computed with sklearn.inspection.permutation_importance.
ROC_data_tprs.csv
and ROC_data_aucs.csv
: False-positive rates respectively Area-under-curve values per repetition of the split-sample test.
Files can e.g. be loaded with pandas.read_csv() and can be used to later plot ROC-curve.
output_for_REF.geojson
: GeoJSON-file containing resulting conflict risk estimates per polygon based on out-of-sample projections of _REF run.
4.2.1.1. Conflict risk per polygon¶
At the end of all model repetitions, the resulting raw_output_data.npy
file contains multiple out-of-sample predictions per polygon.
By aggregating results per polygon, it is possible to assess model output spatially as stored in output_for_REF.geojson
.
The main output metrics are calculated per polygon and saved to output_per_polygon.shp
:
nr_predictions: the number of predictions made;
nr_correct_predictions: the number of correct predictions made;
nr_observed_conflicts: the number of observed conflict events;
nr_predicted_conflicts: the number of predicted conflicts;
min_prob_1: minimum probability of conflict in all repetitions;
probability_of_conflict (POC): probability of conflict averaged over all repetitions;
max_prob_1: maximum probability of conflict in all repetitions;
fraction_correct_predictions (FOP): ratio of the number of correct predictions over the total number of predictions made;
chance_of_conflict: ratio of the number of conflict predictions over the total number of predictions made.
4.2.2. _PROJ¶
Per projection, CoPro creates one output file per projection year.
output_in_<YEAR>
: GeoJSON-file containing model output per polygon averaged over all classifier instances per YEAR of the projection.
The number of instances is set with n_runs
in [machine_learning]
section.
4.2.2.1. Conflict risk per polygon¶
During the projection run, each classifier instances produces its own output per YEAR.
CoPro merges these outputs into one output_in_<YEAR>.geojson
file.
As there are no observations available for the projection period, the output metrics differ from the reference run:
nr_predictions: the number of predictions made, ie. number of classifier instances;
nr_predicted_conflicts: the number of predicted conflicts.
min_prob_1: minimum probability of conflict in all outputs of classifier instances.
probability_of_conflict (POC): probability of conflict averaged over all outputs of classifier instances.
max_prob_1: maximum probability of conflict in all outputs of classifier instances;
chance_of_conflict: ratio of the number of conflict predictions over the total number of predictions made.