The increasing accuracy reports of metric monocular depth estimation models leads to a growing interest from the automotive domain. Current model evaluations do not provide deeper insights on the models performance, also in relation to safety critical or unseen classes.
We propose a novel metric that leverages three components, the class wise component, edge and corner image feature component, and a global consistency retaining component. Classes are furtherweighted on their distance in the scene and on criticality for automotive applications.
In the evaluation, we present the benefits of our metric through comparison to classical metrics, class-wise analytics and theretrieval of critical situations. The results show that our metric provides deeper insights into model results, while fulfilling safety criticalrequirements.
The top row shows the original image and its segmentation mask from the German Outdoor and Offroad Dataset (GOOSE) . The bottom row presents the depth maps predicted by the highest-ranking models from our evaluation, as per our metric (left) and the Mean Absolute Error (right).
We introduce a novel depth estimation metric designed for comprehensive scene evaluation. This metric operates at three distinct levels of granularity, which are divided into individual components:
Although the individual weighting can be dependent on the specific scenario, we propose the overall combination of components as
\[ L = \gamma \cdot \text{E}_{\text{class}} + \gamma \cdot \text{E}_{\text{feature}} + \gamma \cdot \text{E}_{\text{global}} \] with \( \gamma = 1 \) allowing a near metric offset evaluation while incorporating the class and distance weightings.
This component measures the metric error of each object class individually, such as cars, trucks, buildings, and poles. This approach provides detailed insights into how different models handle various object classes, improving the understanding of model performance on previously uncommon scenarios.
The importance of classes can vary highly between frames and situations. With a focus on classifications masks, a single mask may encompass multiple car instances at varying distances within the scene. Treating these instances similarly throughout different scenes complicates the interpretation of the metric. Therefore weighting the classes is necessary.
Consequently, we propose a distance-based intra-class weighting \( w_{\text{dist}} \), based on the distances within each scene. We define this as:
\[ w_{\text{dist}} = \frac{d_{\text{class}} - \min(D_{\text{classes}})}{\max(D_{\text{classes}}) - \min(D_{\text{classes}})} \]
\[ \text{with } d_{\text{class}} = d_{\text{scene-max}} - d_{\text{class-min}} \]
where \( \text{d}_{\text{scene-max}} \) describes the maximum distance within the entire scene and \( \text{d}_{\text{class-min}} \) the minimum distance within a class.
Since the class importance heavily relies on the use case at hand, the specific
weighting of the classes can be chosen individually. As our focus is the use of
MMDE models in automotive applications respectively automotive safety, we
provide an in-depth weight setup in respect thereof.
Therefore, we leverage accident
data and use the distribution between the accident opponent. We source our data
from the German In-Depth Accident Study (GIDAS) database.
The following table shows the distribution of first accident opponents which we use to
weight the class importance.
Main Class | Sub Class | Distribution |
---|---|---|
Car-to-Vehicle | 62,06% | |
Car | 50,04% | |
Motorcycle | 7,38% | |
Truck & Van & Bus | 3,73% | |
Trains | 0,63% | |
Other Motorized Vehicle | 0,27% | |
Car-To-VRU | 30% | |
Bicycles | 21,95% | |
Pedestrian | 8,05% | |
Car-To-Object | 7,94% | |
Pole/tree | 3,24% | |
Guardrail | 1,17% | |
Ditch / Embankment | 1,07% | |
Road / Terrain | 1,04% | |
Other Object | 0,75% | |
Wall / bridge | 0,56% | |
Bush / Fence | 0,11% |
The final class-based component is calculated using MAE, the intra-class weight \( w_{\text{dist}} \), and the inter-class weight \( w_{\text{class}} \).
\[ \text {E}_{\text {class}} = \sum_{c=1}^{C} w_{\text{class}} \cdot w_{\text{dist}} \cdot \text {MAE}(I) \]
Achieving an error \( \text{E}_{\text{class}} \) that incorporates how important a class is in general and also how relevant this class is in the respective image situation.Another important factor for a qualitative depth map is preserving fine details in the prediction. These details serve multiple purposes, such as better differentiation between individual objects or considering unique - and often relevant - shape changes such as trailer hitches or opened doors on cars.
For the task of extracting possibly relevant features, we apply several classical methods on the unmasked input image. We implement multiple corner detection algorithms, e.g., Harris, given the proven robustness of corner features for computer vision tasks such as feature matching. To further evaluate class-specific differences in the models in question we mask the edge depth map with the previously defined classes
The importance of edge features is dependent on the distance to the capture point, which are scaled by \( w_{\text{dist}} \):
\[ \text {E}_{\text{feature}} = \sum_{c=1}^{C} w_{\text{class}} \cdot w_{\text{dist}} \cdot \text {MAE}(I_{cf}) \]
As we aim for a comprehensive evaluation we further examine the global consistency of the generated depth map. In addition, this also covers situations in which no labels or masks for certain objects are provided, as well as global scaling issues not represented in the other components. Therefore we simply calculate Eglobal the MAE between the predicted and ground truth depth.
We compare the results of over 25 GOOSE dataset scenes with classical errors against our metric. While both provide comprehensive insights into the model performances, ours offers a more nuanced interpretation.
Model | Variant | MAE | RMSE | Abs-Rel | Ours |
---|---|---|---|---|---|
AdaBins | KITTI | 13.3 | 25.21 | 0.33 | 20.65 |
DepthAnything V2 | ViT L | 8.39 | 16.56 | 0.3 | 14.47 |
EcoDepth | - | 10.25 | 20.51 | 0.28 | 17.43 |
Marigold | - | 12.70 | 20.38 | 0.65 | 17.72 |
Metric3D V2 | ViT G2 | 6.47 | 14.44 | 0.2 | 11.57 |
PatchFusion | DA V1 ViT L | 15.05 | 24.33 | 0.55 | 23.32 |
UniDepth V1 | ConvNext L | 8.26 | 16.7 | 0.24 | 14.19 |
UniDepth V2 | ViT L | 8.57 | 20 | 0.27 | 14.24 |
ZoeDepth | NYU + KITTI | 9.51 | 19.32 | 0.27 | 16.22 |
Authors | Farooq Bhat, Shariq and Alhashim, Ibraheem and Wonka, Peter |
GitHub | https://github.com/shariqfarooq123/AdaBins |
Paper | https://arxiv.org/pdf/2011.14141 |
Authors | Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang |
GitHub | https://github.com/DepthAnything/Depth-Anything-V2 |
Paper | https://arxiv.org/pdf/2406.09414 |
Authors | Suraj Patni and Aradhye Agarwal and Chetan Arora |
GitHub | https://github.com/aradhye2002/ecodepth |
Paper | https://arxiv.org/pdf/2403.18807 |
Authors | Bingxin Ke and Anton Obukhov and Shengyu Huang and Nando Metzger and Rodrigo Caye Daudt and Konrad Schindler |
GitHub | https://github.com/prs-eth/marigold |
Paper | https://arxiv.org/pdf/2312.02145 |
Authors | Mu Hu and Wei Yin and Chi Zhang and Zhipeng Cai and Xiaoxiao Long and Hao Chen and Kaixuan Wang and Gang Yu and Chunhua Shen and Shaojie Shen |
GitHub | https://github.com/YvanYin/Metric3D |
Paper | https://arxiv.org/pdf/2404.15506 |
Authors | Zhenyu Li and Shariq Farooq Bhat and Peter Wonka |
GitHub | https://github.com/zhyever/PatchFusion |
Paper | https://arxiv.org/pdf/2312.02284 |
Authors | Luigi Piccinelli and Yung-Hsu Yang and Christos Sakaridis and Mattia Segu and Siyuan Li and Luc Van Gool and Fisher Yu |
GitHub | https://github.com/lpiccinelli-eth/UniDepth |
Paper | https://arxiv.org/pdf/2403.18913 |
Authors | Shariq Farooq Bhat and Reiner Birkl and Diana Wofk and Peter Wonka and Matthias Müller |
GitHub | https://github.com/isl-org/ZoeDepth |
Paper | https://arxiv.org/pdf/2302.12288 |
@article{ca_mmde,
title={Introducing a Class-Aware Metric for Monocular Depth Estimation: An Automotive Perspective},
author={Tim Bader and Leon Eisemann and Adrian Pogorzelski and Namrata Jangid and Attila-Balazs Kis},
year={2024},
url={https://arxiv.org/abs/2409.04086},
}