ABSTRACT
Accurate metric depth can help achieve more realistic user interactions such as object placement and occlusion detection in mobile augmented reality (AR). However, it can be challenging to obtain metricly accurate depth estimation in practice. We tested four different state-of-the-art (SOTA) monocular depth estimation models on a newly introduced dataset (ARKitScenes) and observed obvious performance gaps on this real-world mobile dataset. We categorize the challenges to hardware, data, and model-related challenges and propose promising future directions, including (i) using more hardware-related information from the mobile device's camera and other available sensors, (ii) capturing high-quality data to reflect real-world AR scenarios, and (iii) designing a model architecture to utilize the new information.
- Apple. https://developer.apple.com/augmented-reality/, 2017.Google Scholar
- G. Baruch, Z. Chen, A. Dehghan, T. Dimry, Y. Feigin, P. Fu, T. Gebauer, B. Joffe, D. Kurz, A. Schwartz, and E. Shulman. ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data. In NeurIPS Datasets and Benchmarks Track, 2021.Google Scholar
- S. F. Bhat, I. Alhashim, and P. Wonka. Localbins: Improving depth estimation by learning local distributions. In ECCV, 2022.Google ScholarDigital Library
- S. F. Bhat, R. Birkl, D. Wofk, P. Wonka, and M. Müller. Zoedepth: Zero-shot transfer by combining relative and metric depth. arXiv:2302.12288.Google Scholar
- R. Birkl, D. Wofk, and M. Müller. Midas v3.1--a model zoo for robust monocular relative depth estimation. arXiv:2307.14460, 2023.Google Scholar
- G. Brazil, A. Kumar, J. Straub, N. Ravi, J. Johnson, and G. Gkioxari. Omni3D: A large benchmark and model for 3D object detection in the wild. In CVPR, 2023.Google ScholarCross Ref
- J. Cho, D. Min, Y. Kim, and K. Sohn. DIML/CVL RGB-D Dataset: 2M RGB-D Images of Natural Indoor and Outdoor Scenes. arXiv: 2110.11590, 2021.Google Scholar
- A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In CVPR, 2017.Google ScholarCross Ref
- S. Farooq Bhat, I. Alhashim, and P. Wonka. AdaBins: Depth Estimation Using Adaptive Bins. In CVPR, 2021.Google Scholar
- Y. Fujimura, M. Iiyama, T. Funatomi, and Y. Mukaigawa. Deep depth from focal stack with defocus model for camera-setting invariance. arXiv:2202.13055, 2022.Google Scholar
- A. Ganj, Y. Zhao, F. Galbiati, and T. Guo. Toward Scalable and Controllable AR Experimentation. In ImmerCom, 2023.Google ScholarDigital Library
- A. Geiger, P. Lenz, C. Stiller, and R. Urtasun. Vision meets robotics: The kitti dataset. IJRR, 2013.Google ScholarDigital Library
- V. Guizilini, I. Vasiljevic, D. Chen, R. Ambrus, and A. Gaidon. Towards zero-shot scale-aware monocular depth estimation. In ICCV, 2023.Google ScholarCross Ref
- S. Hwang, J. Lee, W. J. Kim, S. Woo, K. Lee, and S. Lee. Lidar depth completion using color-embedded information via knowledge distillation. IEEE Transactions on Intelligent Transportation Systems, 2022.Google Scholar
- Intel. https://www.intelrealsense.com/wp-content/uploads/2023/07/Intel-RealSense-D400-Series-Datasheet-July-2023.pdf, 2023.Google Scholar
- M. Maximov, K. Galim, and L. Leal-Taixe. Focus on defocus: Bridging the synthetic to real domain gap for depth estimation. In CVPR, 2020.Google ScholarCross Ref
- P. K. Nathan Silberman, Derek Hoiem and R. Fergus. Indoor segmentation and support inference from rgbd images. In ECCV, 2012.Google ScholarDigital Library
- M. Norman, V. Kellen, S. Smallen, B. DeMeulle, S. Strande, E. Lazowska, N. Alterman, R. Fatland, S. Stone, A. Tan, K. Yelick, E. Van Dusen, and J. Mitchell. Cloudbank: Managed services to simplify cloud access for computer science research and education. In Practice and Experience in Advanced Research Computing, PEARC '21, New York, NY, USA, 2021. Association for Computing Machinery.Google ScholarDigital Library
- R. Ranftl, A. Bochkovskiy, and V. Koltun. Vision transformers for dense prediction. In ICCV, 2021.Google ScholarCross Ref
- R. Ranftl, K. Lasinger, D. Hafner, K. Schindler, and V. Koltun. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. TPAMI, 2020.Google Scholar
- M. Sayed, J. Gibson, J. Watson, V. Prisacariu, M. Firman, and C. Godard. Simplerecon: 3d reconstruction without 3d convolutions. In ECCV, 2022.Google ScholarDigital Library
- J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers. A benchmark for the evaluation of rgb-d slam systems. In IROS, 2012.Google ScholarCross Ref
- F. Tapia Benavides, A. Ignatov, and R. Timofte. Phonedepth: A dataset for monocular depth estimation on mobile devices. In CVPRW, 2022.Google ScholarCross Ref
- N.-H. Wang, R. Wang, Y.-L. Liu, Y.-H. Huang, Y.-L. Chang, C.-P. Chen, and K. Jou. Bridging unsupervised and supervised depth from focus via all-in-focus supervision. In ICCV, 2021.Google ScholarCross Ref
- C.-Y. Wu, J. Wang, M. Hall, U. Neumann, and S. Su. Toward practical monocular indoor depth estimation. In CVPR, 2022.Google ScholarCross Ref
- W. Yin, C. Zhang, H. Chen, Z. Cai, G. Yu, K. Wang, X. Chen, and C. Shen. Metric3d: Towards zero-shot metric 3d prediction from a single image. 2023.Google Scholar
- J. Zhang, H. Yang, J. Ren, D. Zhang, B. He, T. Cao, Y. Li, Y. Zhang, and Y. Liu. Mobidepth: Real-time depth estimation using on-device dual cameras. MobiCom, 2022.Google ScholarDigital Library
- Y. Zhang, T. Scargill, A. Vaishnav, G. Premsankar, M. Di Francesco, and M. Gorlatova. Indepth: Real-time depth inpainting for mobile augmented reality. IMWUT, 2022.Google ScholarDigital Library
Index Terms
- Mobile AR Depth Estimation: Challenges & Prospects
Recommendations
3D Virtual Tracing and Depth Perception Problem on Mobile AR
CHI EA '16: Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing SystemsMobile Augmented Reality (AR) is most commonly implemented using a camera and a flat screen. Such implementation removes binocular disparity from users' observation. To compensate, people use alternative depth cues (e.g. depth ordering). However, these ...
A User-Perspective View for Mobile AR Systems Using Discrete Depth Segmentation
CW '15: Proceedings of the 2015 International Conference on Cyberworlds (CW)In this paper we present a methodology for creating a user-perspective view for mobile augmented reality (AR) systems. The most common video-see-through style for mobile AR systems is a device-perspective view, and the methods suggested for user-...
Toward Next-Gen Mobile AR Games
Mobile augmented reality games offer a new and rich game experience allowing players to move and interact in their physical environment with 3D content. The authors review existing approaches to mobile AR games and identify two major trends: small, user-...
Comments