research-article

Mobile AR Depth Estimation: Challenges & Prospects

Authors:
Ashkan Ganj

Worcester Polytechnic Institute, Worcester, MA, USA

Worcester Polytechnic Institute, Worcester, MA, USA

https://orcid.org/0009-0006-3490-0471
View Profile

,
Yiqin Zhao

Worcester Polytechnic Institute, Worcester, United States of America

Worcester Polytechnic Institute, Worcester, United States of America

https://orcid.org/0000-0003-1044-4732
View Profile

,
Hang Su

Nvidia Research, Worcester, USA

Nvidia Research, Worcester, USA

https://orcid.org/0000-0001-8770-8754
View Profile

,
Tian Guo

Worcester Polytechnic Institute, Worcester, United States of America

Worcester Polytechnic Institute, Worcester, United States of America

https://orcid.org/0000-0003-0060-2266
View Profile

HOTMOBILE '24: Proceedings of the 25th International Workshop on Mobile Computing Systems and ApplicationsFebruary 2024Pages 21–26https://doi.org/10.1145/3638550.3641122

Published:28 February 2024Publication History

HOTMOBILE '24: Proceedings of the 25th International Workshop on Mobile Computing Systems and Applications

Pages 21–26

ABSTRACT

Accurate metric depth can help achieve more realistic user interactions such as object placement and occlusion detection in mobile augmented reality (AR). However, it can be challenging to obtain metricly accurate depth estimation in practice. We tested four different state-of-the-art (SOTA) monocular depth estimation models on a newly introduced dataset (ARKitScenes) and observed obvious performance gaps on this real-world mobile dataset. We categorize the challenges to hardware, data, and model-related challenges and propose promising future directions, including (i) using more hardware-related information from the mobile device's camera and other available sensors, (ii) capturing high-quality data to reflect real-world AR scenarios, and (iii) designing a model architecture to utilize the new information.

References

Apple. https://developer.apple.com/augmented-reality/, 2017.Google Scholar
G. Baruch, Z. Chen, A. Dehghan, T. Dimry, Y. Feigin, P. Fu, T. Gebauer, B. Joffe, D. Kurz, A. Schwartz, and E. Shulman. ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data. In NeurIPS Datasets and Benchmarks Track, 2021.Google Scholar
S. F. Bhat, I. Alhashim, and P. Wonka. Localbins: Improving depth estimation by learning local distributions. In ECCV, 2022.Google ScholarDigital Library
S. F. Bhat, R. Birkl, D. Wofk, P. Wonka, and M. Müller. Zoedepth: Zero-shot transfer by combining relative and metric depth. arXiv:2302.12288.Google Scholar
R. Birkl, D. Wofk, and M. Müller. Midas v3.1--a model zoo for robust monocular relative depth estimation. arXiv:2307.14460, 2023.Google Scholar
G. Brazil, A. Kumar, J. Straub, N. Ravi, J. Johnson, and G. Gkioxari. Omni3D: A large benchmark and model for 3D object detection in the wild. In CVPR, 2023.Google ScholarCross Ref
J. Cho, D. Min, Y. Kim, and K. Sohn. DIML/CVL RGB-D Dataset: 2M RGB-D Images of Natural Indoor and Outdoor Scenes. arXiv: 2110.11590, 2021.Google Scholar
A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In CVPR, 2017.Google ScholarCross Ref
S. Farooq Bhat, I. Alhashim, and P. Wonka. AdaBins: Depth Estimation Using Adaptive Bins. In CVPR, 2021.Google Scholar
Y. Fujimura, M. Iiyama, T. Funatomi, and Y. Mukaigawa. Deep depth from focal stack with defocus model for camera-setting invariance. arXiv:2202.13055, 2022.Google Scholar
A. Ganj, Y. Zhao, F. Galbiati, and T. Guo. Toward Scalable and Controllable AR Experimentation. In ImmerCom, 2023.Google ScholarDigital Library
A. Geiger, P. Lenz, C. Stiller, and R. Urtasun. Vision meets robotics: The kitti dataset. IJRR, 2013.Google ScholarDigital Library
V. Guizilini, I. Vasiljevic, D. Chen, R. Ambrus, and A. Gaidon. Towards zero-shot scale-aware monocular depth estimation. In ICCV, 2023.Google ScholarCross Ref
S. Hwang, J. Lee, W. J. Kim, S. Woo, K. Lee, and S. Lee. Lidar depth completion using color-embedded information via knowledge distillation. IEEE Transactions on Intelligent Transportation Systems, 2022.Google Scholar
Intel. https://www.intelrealsense.com/wp-content/uploads/2023/07/Intel-RealSense-D400-Series-Datasheet-July-2023.pdf, 2023.Google Scholar
M. Maximov, K. Galim, and L. Leal-Taixe. Focus on defocus: Bridging the synthetic to real domain gap for depth estimation. In CVPR, 2020.Google ScholarCross Ref
P. K. Nathan Silberman, Derek Hoiem and R. Fergus. Indoor segmentation and support inference from rgbd images. In ECCV, 2012.Google ScholarDigital Library
M. Norman, V. Kellen, S. Smallen, B. DeMeulle, S. Strande, E. Lazowska, N. Alterman, R. Fatland, S. Stone, A. Tan, K. Yelick, E. Van Dusen, and J. Mitchell. Cloudbank: Managed services to simplify cloud access for computer science research and education. In Practice and Experience in Advanced Research Computing, PEARC '21, New York, NY, USA, 2021. Association for Computing Machinery.Google ScholarDigital Library
R. Ranftl, A. Bochkovskiy, and V. Koltun. Vision transformers for dense prediction. In ICCV, 2021.Google ScholarCross Ref
R. Ranftl, K. Lasinger, D. Hafner, K. Schindler, and V. Koltun. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. TPAMI, 2020.Google Scholar
M. Sayed, J. Gibson, J. Watson, V. Prisacariu, M. Firman, and C. Godard. Simplerecon: 3d reconstruction without 3d convolutions. In ECCV, 2022.Google ScholarDigital Library
J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers. A benchmark for the evaluation of rgb-d slam systems. In IROS, 2012.Google ScholarCross Ref
F. Tapia Benavides, A. Ignatov, and R. Timofte. Phonedepth: A dataset for monocular depth estimation on mobile devices. In CVPRW, 2022.Google ScholarCross Ref
N.-H. Wang, R. Wang, Y.-L. Liu, Y.-H. Huang, Y.-L. Chang, C.-P. Chen, and K. Jou. Bridging unsupervised and supervised depth from focus via all-in-focus supervision. In ICCV, 2021.Google ScholarCross Ref
C.-Y. Wu, J. Wang, M. Hall, U. Neumann, and S. Su. Toward practical monocular indoor depth estimation. In CVPR, 2022.Google ScholarCross Ref
W. Yin, C. Zhang, H. Chen, Z. Cai, G. Yu, K. Wang, X. Chen, and C. Shen. Metric3d: Towards zero-shot metric 3d prediction from a single image. 2023.Google Scholar
J. Zhang, H. Yang, J. Ren, D. Zhang, B. He, T. Cao, Y. Li, Y. Zhang, and Y. Liu. Mobidepth: Real-time depth estimation using on-device dual cameras. MobiCom, 2022.Google ScholarDigital Library
Y. Zhang, T. Scargill, A. Vaishnav, G. Premsankar, M. Di Francesco, and M. Gorlatova. Indepth: Real-time depth inpainting for mobile augmented reality. IMWUT, 2022.Google ScholarDigital Library

Index Terms

Mobile AR Depth Estimation: Challenges & Prospects
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Computer graphics
    1. Graphics systems and interfaces
      1. Virtual reality
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction paradigms
      1. Mixed / augmented reality
      2. Virtual reality

Index terms have been assigned to the content through auto-classification.

Recommendations

3D Virtual Tracing and Depth Perception Problem on Mobile AR
CHI EA '16: Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems

Mobile Augmented Reality (AR) is most commonly implemented using a camera and a flat screen. Such implementation removes binocular disparity from users' observation. To compensate, people use alternative depth cues (e.g. depth ordering). However, these ...
Read More
A User-Perspective View for Mobile AR Systems Using Discrete Depth Segmentation
CW '15: Proceedings of the 2015 International Conference on Cyberworlds (CW)

In this paper we present a methodology for creating a user-perspective view for mobile augmented reality (AR) systems. The most common video-see-through style for mobile AR systems is a device-perspective view, and the methods suggested for user-...
Read More
Toward Next-Gen Mobile AR Games

Mobile augmented reality games offer a new and rich game experience allowing players to move and interact in their physical environment with 3D content. The authors review existing approaches to mobile AR games and identify two major trends: small, user-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
HOTMOBILE '24: Proceedings of the 25th International Workshop on Mobile Computing Systems and Applications
February 2024
167 pages
ISBN:9798400704970
DOI:10.1145/3638550
Chair:
Nigel Davies,
Program Chair:
Chenren Xu
Copyright © 2024 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 February 2024
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate96of345submissions,28%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 112
  Total Downloads
- Downloads (Last 12 months)112
- Downloads (Last 6 weeks)30
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Mobile AR Depth Estimation: Challenges & Prospects

HOTMOBILE '24: Proceedings of the 25th International Workshop on Mobile Computing Systems and Applications

ABSTRACT

References

Cited By

Index Terms

Recommendations

3D Virtual Tracing and Depth Perception Problem on Mobile AR

A User-Perspective View for Mobile AR Systems Using Discrete Depth Segmentation

Toward Next-Gen Mobile AR Games