glue-factory-custom/docs/evaluation.md

5.2 KiB

Evaluation

Glue Factory is designed for simple and tight integration between training and evaluation. All benchmarks are designed around one principle: only evaluate on cached results. This enforces reproducible baselines. Therefore, we first export model predictions for each dataset (export), and evaluate the cached results in a second pass (evaluation).

Running an evaluation

We currently provide evaluation scripts for MegaDepth-1500, HPatches, and ETH3D. You can run them with:

python -m gluefactory.eval.<benchmark_name> --conf "a name in gluefactory/configs/ or path" --checkpoint "and/or a checkpoint name"

Each evaluation run is assigned a tag, which can (optionally) be customized from the command line with --tag <your_tag>.

To overwrite an experiment, add --overwrite. To only overwrite the results of the evaluation loop, add --overwrite_eval. We perform config checks to warn the user about non-conforming configurations between runs.

The following files are written to outputs/results/<benchmark_name>/<tag>:

conf.yaml  # the config which was used
predictions.h5  # cached predictions
results.h5  # Results for each data point in eval, in the format <metric_name>: List[float]
summaries.json  # Aggregated results for the entire dataset <agg_metric_name>: float
<plots>  # some benchmarks add plots as png files here

Some datasets further output plots (add --plot to the command line).

[Configuration]

Each evaluation has 3 main configurations:

data: 
    ...  # How to load the data. The user can overwrite this only during "export". The defaults are used in "evaluation".
model:
    ...  # model configuration: this is only required for "export".
eval: 
    ...  # configuration for the "evaluation" loop, e.g. pose estimators and ransac thresholds.

The default configurations can be found in the respective evaluation scripts, e.g. MegaDepth1500.

To run an evaluation with a custom config, we expect them to be in the following format (example):

model:
    ... # <your model configs>
benchmarks:
    <benchmark_name1>:
        data:
            ... # <your data configs for "export">
        model:
            ... # <your benchmark-specific model configs>
        eval:
            ... # <your evaluation configs, e.g. pose estimators>
    <benchmark_name2>:
        ... # <same structure as above>

The configs are then merged in the following order (taking megadepth1500 as an example):

data: 
    default < custom.benchmarks.megadepth1500.data
model:
    default < custom.model < custom.benchmarks.megadepth1500.model
eval: 
    default < custom.benchmarks.megadepth1500.eval

You can then use the command line to further customize this configuration.

Robust estimators

Gluefactory offers a flexible interface to state-of-the-art robust estimators for points and lines. You can configure the estimator in the benchmarks with the following config structure:

eval:
    estimator: <estimator_name>  # poselib, opencv, pycolmap, ...
    ransac_th: 0.5  # run evaluation on fixed threshold
    #or
    ransac_th: [0.5, 1.0, 1.5]  # test on multiple thresholds, autoselect best
    <extra configs for the estimator, e.g. max iters, ...>

For convenience, most benchmarks convert eval.ransac_th=-1 to a default range of thresholds.

[!NOTE] Gluefactory follows the corner convention of COLMAP, i.e. the top-left corner of the top-left pixel is (0, 0).

Visualization

We provide a powerful, interactive visualization tool for our benchmarks, based on matplotlib. You can run the visualization (after running the evaluations) with:

python -m gluefactory.eval.inspect <benchmark_name> <experiment_name1> <experiment_name2> ...

This prints the summaries of each experiment on the respective benchmark and visualizes the data as a scatter plot, where each point is the result of from a experiment on a specific data point in the dataset.

  • Clicking on one of the data points opens a new frame showing the prediction on this specific data point for all experiments listed.
  • You can customize the x / y axis from the navigation bar or by clicking x or y.
  • Hiting diff_only computes the difference between <experiment_name1> and all other experiments.
  • Hovering over a point shows lines to the results of other experiments on the same data.
  • You can switch the visualization (matches, keypoints, ...) from the navigation bar or by clicking shift+r.
  • Clicking t prints a summary of the eval on this data point.
  • Hitting the left or right arrows circles between data points. shift+left opens an extra window.

When working on a remote machine (e.g. over ssh), the plots can be forwarded to the browser with the option --backend webagg. Note that you need to refresh the page everytime you load a new figure (e.g. when clicking on a scatter point). This part requires some more work, and we would highly appreciate any contributions!