Class-Aware Metric for Monocular Depth Estimation

The increasing accuracy reports of metric monocular depth estimation models leads to a growing interest from the automotive domain. Current model evaluations do not provide deeper insights on the models performance, also in relation to safety critical or unseen classes.

We propose a novel metric that leverages three components, the class wise component, edge and corner image feature component, and a global consistency retaining component. Classes are furtherweighted on their distance in the scene and on criticality for automotive applications.

In the evaluation, we present the benefits of our metric through comparison to classical metrics, class-wise analytics and theretrieval of critical situations. The results show that our metric provides deeper insights into model results, while fulfilling safety criticalrequirements.

The top row shows the original image and its segmentation mask from the German Outdoor and Offroad Dataset (GOOSE) . The bottom row presents the depth maps predicted by the highest-ranking models from our evaluation, as per our metric (left) and the Mean Absolute Error (right).

We introduce a novel depth estimation metric designed for comprehensive scene evaluation. This metric operates at three distinct levels of granularity, which are divided into individual components:

Class-Based Component \( E_{\text{class}} \): Enables insights into the model's performance across a variety of classes, including potential out-of-distribution classes.

Feature Component \( E_{\text{feature}} \): Employs techniques such as edge or corner detection filters to evaluate the model's ability to accurately represent object features.

Global Consistency Component \( E_{\text{global}} \): Integrates standard depth estimation evaluation methods to ensure overall consistency.

Although the individual weighting can be dependent on the specific scenario, we propose the overall combination of components as

\[ L = \gamma \cdot \text{E}_{\text{class}} + \gamma \cdot \text{E}_{\text{feature}} + \gamma \cdot \text{E}_{\text{global}} \] with \( \gamma = 1 \) allowing a near metric offset evaluation while incorporating the class and distance weightings.

This component measures the metric error of each object class individually, such as cars, trucks, buildings, and poles. This approach provides detailed insights into how different models handle various object classes, improving the understanding of model performance on previously uncommon scenarios.

Intra-Class Weighting

The importance of classes can vary highly between frames and situations. With a focus on classifications masks, a single mask may encompass multiple car instances at varying distances within the scene. Treating these instances similarly throughout different scenes complicates the interpretation of the metric. Therefore weighting the classes is necessary.

Consequently, we propose a distance-based intra-class weighting \( w_{\text{dist}} \), based on the distances within each scene. We define this as:

\[ w_{\text{dist}} = \frac{d_{\text{class}} - \min(D_{\text{classes}})}{\max(D_{\text{classes}}) - \min(D_{\text{classes}})} \]

\[ \text{with } d_{\text{class}} = d_{\text{scene-max}} - d_{\text{class-min}} \]

where \( \text{d}_{\text{scene-max}} \) describes the maximum distance within the entire scene and \( \text{d}_{\text{class-min}} \) the minimum distance within a class.

Automotive Inter-Class Weighting

Since the class importance heavily relies on the use case at hand, the specific weighting of the classes can be chosen individually. As our focus is the use of MMDE models in automotive applications respectively automotive safety, we provide an in-depth weight setup in respect thereof.
Therefore, we leverage accident data and use the distribution between the accident opponent. We source our data from the German In-Depth Accident Study (GIDAS) database. The following table shows the distribution of first accident opponents which we use to weight the class importance.

Main Class	Sub Class	Distribution
Car-to-Vehicle		62,06%
	Car	50,04%
	Motorcycle	7,38%
	Truck & Van & Bus	3,73%
	Trains	0,63%
	Other Motorized Vehicle	0,27%
Car-To-VRU		30%
	Bicycles	21,95%
	Pedestrian	8,05%
Car-To-Object		7,94%
	Pole/tree	3,24%
	Guardrail	1,17%
	Ditch / Embankment	1,07%
	Road / Terrain	1,04%
	Other Object	0,75%
	Wall / bridge	0,56%
	Bush / Fence	0,11%

Component Result

The final class-based component is calculated using MAE, the intra-class weight \( w_{\text{dist}} \), and the inter-class weight \( w_{\text{class}} \).

\[ \text {E}_{\text {class}} = \sum_{c=1}^{C} w_{\text{class}} \cdot w_{\text{dist}} \cdot \text {MAE}(I) \]

Achieving an error \( \text{E}_{\text{class}} \) that incorporates how important a class is in general and also how relevant this class is in the respective image situation.

Another important factor for a qualitative depth map is preserving fine details in the prediction. These details serve multiple purposes, such as better differentiation between individual objects or considering unique - and often relevant - shape changes such as trailer hitches or opened doors on cars.

For the task of extracting possibly relevant features, we apply several classical methods on the unmasked input image. We implement multiple corner detection algorithms, e.g., Harris, given the proven robustness of corner features for computer vision tasks such as feature matching. To further evaluate class-specific differences in the models in question we mask the edge depth map with the previously defined classes

The importance of edge features is dependent on the distance to the capture point, which are scaled by \( w_{\text{dist}} \):

\[ \text {E}_{\text{feature}} = \sum_{c=1}^{C} w_{\text{class}} \cdot w_{\text{dist}} \cdot \text {MAE}(I_{cf}) \]

As we aim for a comprehensive evaluation we further examine the global consistency of the generated depth map. In addition, this also covers situations in which no labels or masks for certain objects are provided, as well as global scaling issues not represented in the other components. Therefore we simply calculate Eglobal the MAE between the predicted and ground truth depth.

We compare the results of over 25 GOOSE dataset scenes with classical errors against our metric. While both provide comprehensive insights into the model performances, ours offers a more nuanced interpretation.

Model	Variant	MAE	RMSE	Abs-Rel	Ours
AdaBins	KITTI	13.3	25.21	0.33	20.65
DepthAnything V2	ViT L	8.39	16.56	0.3	14.47
EcoDepth	-	10.25	20.51	0.28	17.43
Marigold	-	12.70	20.38	0.65	17.72
Metric3D V2	ViT G2	6.47	14.44	0.2	11.57
PatchFusion	DA V1 ViT L	15.05	24.33	0.55	23.32
UniDepth V1	ConvNext L	8.26	16.7	0.24	14.19
UniDepth V2	ViT L	8.57	20	0.27	14.24
ZoeDepth	NYU + KITTI	9.51	19.32	0.27	16.22

Citation

@article{ca_mmde,
      title={Introducing a Class-Aware Metric for Monocular Depth Estimation: An Automotive Perspective}, 
      author={Tim Bader and Leon Eisemann and Adrian Pogorzelski and Namrata Jangid and Attila-Balazs Kis},
      year={2024},
      url={https://arxiv.org/abs/2409.04086}, 
}

Authors	Farooq Bhat, Shariq and Alhashim, Ibraheem and Wonka, Peter
GitHub	https://github.com/shariqfarooq123/AdaBins
Paper	https://arxiv.org/pdf/2011.14141

Authors	Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang
GitHub	https://github.com/DepthAnything/Depth-Anything-V2
Paper	https://arxiv.org/pdf/2406.09414

Authors	Suraj Patni and Aradhye Agarwal and Chetan Arora
GitHub	https://github.com/aradhye2002/ecodepth
Paper	https://arxiv.org/pdf/2403.18807

Authors	Bingxin Ke and Anton Obukhov and Shengyu Huang and Nando Metzger and Rodrigo Caye Daudt and Konrad Schindler
GitHub	https://github.com/prs-eth/marigold
Paper	https://arxiv.org/pdf/2312.02145

Authors	Mu Hu and Wei Yin and Chi Zhang and Zhipeng Cai and Xiaoxiao Long and Hao Chen and Kaixuan Wang and Gang Yu and Chunhua Shen and Shaojie Shen
GitHub	https://github.com/YvanYin/Metric3D
Paper	https://arxiv.org/pdf/2404.15506

Introducing a Class-Aware Metric for Monocular Depth Estimation:
An Automotive Perspective

Overview

Gallery

How it works

1. Class-Based Component

Intra-Class Weighting

Automotive Inter-Class Weighting

Component Result

2. Local Feature Component

3. Global Consistency Component

Benchmark

Model Zoo

AdaBins

Depth Anything V2

ECoDepth

Marigold

Metric3D V2

PatchFusion

UniDepth

ZoeDepth

Citation

Authors	Zhenyu Li and Shariq Farooq Bhat and Peter Wonka
GitHub	https://github.com/zhyever/PatchFusion
Paper	https://arxiv.org/pdf/2312.02284

Authors	Luigi Piccinelli and Yung-Hsu Yang and Christos Sakaridis and Mattia Segu and Siyuan Li and Luc Van Gool and Fisher Yu
GitHub	https://github.com/lpiccinelli-eth/UniDepth
Paper	https://arxiv.org/pdf/2403.18913

Authors	Shariq Farooq Bhat and Reiner Birkl and Diana Wofk and Peter Wonka and Matthias Müller
GitHub	https://github.com/isl-org/ZoeDepth
Paper	https://arxiv.org/pdf/2302.12288