122 lines
5.2 KiB
Markdown
122 lines
5.2 KiB
Markdown
# Evaluation
|
|
|
|
Glue Factory is designed for simple and tight integration between training and evaluation.
|
|
All benchmarks are designed around one principle: only evaluate on cached results.
|
|
This enforces reproducible baselines.
|
|
Therefore, we first export model predictions for each dataset (`export`), and evaluate the cached results in a second pass (`evaluation`).
|
|
|
|
### Running an evaluation
|
|
|
|
We currently provide evaluation scripts for [MegaDepth-1500](../gluefactory/eval/megadepth1500.py), [HPatches](../gluefactory/eval/hpatches.py), and [ETH3D](../gluefactory/eval/eth3d.py).
|
|
You can run them with:
|
|
|
|
```bash
|
|
python -m gluefactory.eval.<benchmark_name> --conf "a name in gluefactory/configs/ or path" --checkpoint "and/or a checkpoint name"
|
|
```
|
|
|
|
Each evaluation run is assigned a `tag`, which can (optionally) be customized from the command line with `--tag <your_tag>`.
|
|
|
|
To overwrite an experiment, add `--overwrite`. To only overwrite the results of the evaluation loop, add `--overwrite_eval`. We perform config checks to warn the user about non-conforming configurations between runs.
|
|
|
|
The following files are written to `outputs/results/<benchmark_name>/<tag>`:
|
|
|
|
```yaml
|
|
conf.yaml # the config which was used
|
|
predictions.h5 # cached predictions
|
|
results.h5 # Results for each data point in eval, in the format <metric_name>: List[float]
|
|
summaries.json # Aggregated results for the entire dataset <agg_metric_name>: float
|
|
<plots> # some benchmarks add plots as png files here
|
|
```
|
|
|
|
Some datasets further output plots (add `--plot` to the command line).
|
|
|
|
<details>
|
|
<summary>[Configuration]</summary>
|
|
|
|
Each evaluation has 3 main configurations:
|
|
|
|
```yaml
|
|
data:
|
|
... # How to load the data. The user can overwrite this only during "export". The defaults are used in "evaluation".
|
|
model:
|
|
... # model configuration: this is only required for "export".
|
|
eval:
|
|
... # configuration for the "evaluation" loop, e.g. pose estimators and ransac thresholds.
|
|
```
|
|
|
|
The default configurations can be found in the respective evaluation scripts, e.g. [MegaDepth1500](../gluefactory/eval/megadepth1500.py).
|
|
|
|
To run an evaluation with a custom config, we expect them to be in the following format ([example](../gluefactory/configs/superpoint+lightglue.yaml)):
|
|
|
|
```yaml
|
|
model:
|
|
... # <your model configs>
|
|
benchmarks:
|
|
<benchmark_name1>:
|
|
data:
|
|
... # <your data configs for "export">
|
|
model:
|
|
... # <your benchmark-specific model configs>
|
|
eval:
|
|
... # <your evaluation configs, e.g. pose estimators>
|
|
<benchmark_name2>:
|
|
... # <same structure as above>
|
|
```
|
|
|
|
The configs are then merged in the following order (taking megadepth1500 as an example):
|
|
|
|
```yaml
|
|
data:
|
|
default < custom.benchmarks.megadepth1500.data
|
|
model:
|
|
default < custom.model < custom.benchmarks.megadepth1500.model
|
|
eval:
|
|
default < custom.benchmarks.megadepth1500.eval
|
|
```
|
|
|
|
You can then use the command line to further customize this configuration.
|
|
|
|
</details>
|
|
|
|
### Robust estimators
|
|
Gluefactory offers a flexible interface to state-of-the-art [robust estimators](../gluefactory/robust_estimators/) for points and lines.
|
|
You can configure the estimator in the benchmarks with the following config structure:
|
|
|
|
```yaml
|
|
eval:
|
|
estimator: <estimator_name> # poselib, opencv, pycolmap, ...
|
|
ransac_th: 0.5 # run evaluation on fixed threshold
|
|
#or
|
|
ransac_th: [0.5, 1.0, 1.5] # test on multiple thresholds, autoselect best
|
|
<extra configs for the estimator, e.g. max iters, ...>
|
|
```
|
|
|
|
For convenience, most benchmarks convert `eval.ransac_th=-1` to a default range of thresholds.
|
|
|
|
> [!NOTE]
|
|
> Gluefactory follows the corner convention of COLMAP, i.e. the top-left corner of the top-left pixel is (0, 0).
|
|
|
|
### Visualization
|
|
|
|
We provide a powerful, interactive visualization tool for our benchmarks, based on matplotlib.
|
|
You can run the visualization (after running the evaluations) with:
|
|
```bash
|
|
python -m gluefactory.eval.inspect <benchmark_name> <experiment_name1> <experiment_name2> ...
|
|
```
|
|
|
|
This prints the summaries of each experiment on the respective benchmark and visualizes the data as a scatter plot, where each point is the result of from a experiment on a specific data point in the dataset.
|
|
|
|
<details>
|
|
|
|
- Clicking on one of the data points opens a new frame showing the prediction on this specific data point for all experiments listed.
|
|
- You can customize the x / y axis from the navigation bar or by clicking `x` or `y`.
|
|
- Hiting `diff_only` computes the difference between `<experiment_name1>` and all other experiments.
|
|
- Hovering over a point shows lines to the results of other experiments on the same data.
|
|
- You can switch the visualization (matches, keypoints, ...) from the navigation bar or by clicking `shift+r`.
|
|
- Clicking `t` prints a summary of the eval on this data point.
|
|
- Hitting the `left` or `right` arrows circles between data points. `shift+left` opens an extra window.
|
|
|
|
When working on a remote machine (e.g. over ssh), the plots can be forwarded to the browser with the option `--backend webagg`. Note that you need to refresh the page everytime you load a new figure (e.g. when clicking on a scatter point). This part requires some more work, and we would highly appreciate any contributions!
|
|
|
|
</details>
|