5.2 KiB

Raw Blame History

Evaluation

Glue Factory is designed for simple and tight integration between training and evaluation. All benchmarks are designed around one principle: only evaluate on cached results. This enforces reproducible baselines. Therefore, we first export model predictions for each dataset (export), and evaluate the cached results in a second pass (evaluation).

Running an evaluation

We currently provide evaluation scripts for MegaDepth-1500, HPatches, and ETH3D. You can run them with:

python -m gluefactory.eval.<benchmark_name> --conf "a name in gluefactory/configs/ or path" --checkpoint "and/or a checkpoint name"

Each evaluation run is assigned a tag, which can (optionally) be customized from the command line with --tag <your_tag>.

To overwrite an experiment, add --overwrite. To only overwrite the results of the evaluation loop, add --overwrite_eval. We perform config checks to warn the user about non-conforming configurations between runs.

The following files are written to outputs/results/<benchmark_name>/<tag>:

conf.yaml  # the config which was used
predictions.h5  # cached predictions
results.h5  # Results for each data point in eval, in the format <metric_name>: List[float]
summaries.json  # Aggregated results for the entire dataset <agg_metric_name>: float
<plots>  # some benchmarks add plots as png files here

Some datasets further output plots (add --plot to the command line).

[Configuration]

Each evaluation has 3 main configurations:

data: 
    ...  # How to load the data. The user can overwrite this only during "export". The defaults are used in "evaluation".
model:
    ...  # model configuration: this is only required for "export".
eval: 
    ...  # configuration for the "evaluation" loop, e.g. pose estimators and ransac thresholds.

The default configurations can be found in the respective evaluation scripts, e.g. MegaDepth1500.

To run an evaluation with a custom config, we expect them to be in the following format (example):

model:
    ... # <your model configs>
benchmarks:
    <benchmark_name1>:
        data:
            ... # <your data configs for "export">
        model:
            ... # <your benchmark-specific model configs>
        eval:
            ... # <your evaluation configs, e.g. pose estimators>
    <benchmark_name2>:
        ... # <same structure as above>

The configs are then merged in the following order (taking megadepth1500 as an example):

data: 
    default < custom.benchmarks.megadepth1500.data
model:
    default < custom.model < custom.benchmarks.megadepth1500.model
eval: 
    default < custom.benchmarks.megadepth1500.eval

You can then use the command line to further customize this configuration.

Robust estimators

Gluefactory offers a flexible interface to state-of-the-art robust estimators for points and lines. You can configure the estimator in the benchmarks with the following config structure:

eval:
    estimator: <estimator_name>  # poselib, opencv, pycolmap, ...
    ransac_th: 0.5  # run evaluation on fixed threshold
    #or
    ransac_th: [0.5, 1.0, 1.5]  # test on multiple thresholds, autoselect best
    <extra configs for the estimator, e.g. max iters, ...>

For convenience, most benchmarks convert eval.ransac_th=-1 to a default range of thresholds.

[!NOTE] Gluefactory follows the corner convention of COLMAP, i.e. the top-left corner of the top-left pixel is (0, 0).

Visualization

We provide a powerful, interactive visualization tool for our benchmarks, based on matplotlib. You can run the visualization (after running the evaluations) with:

python -m gluefactory.eval.inspect <benchmark_name> <experiment_name1> <experiment_name2> ...

This prints the summaries of each experiment on the respective benchmark and visualizes the data as a scatter plot, where each point is the result of from a experiment on a specific data point in the dataset.

Clicking on one of the data points opens a new frame showing the prediction on this specific data point for all experiments listed.
You can customize the x / y axis from the navigation bar or by clicking x or y.
Hiting diff_only computes the difference between <experiment_name1> and all other experiments.
Hovering over a point shows lines to the results of other experiments on the same data.
You can switch the visualization (matches, keypoints, ...) from the navigation bar or by clicking shift+r.
Clicking t prints a summary of the eval on this data point.
Hitting the left or right arrows circles between data points. shift+left opens an extra window.

When working on a remote machine (e.g. over ssh), the plots can be forwarded to the browser with the option --backend webagg. Note that you need to refresh the page everytime you load a new figure (e.g. when clicking on a scatter point). This part requires some more work, and we would highly appreciate any contributions!

5.2 KiB Raw Blame History

Evaluation

Running an evaluation

Robust estimators

Visualization

5.2 KiB

Raw Blame History