Monocular depth estimation poses a fundamental problem in many tasks. Although recent convolutional neural network-based methods can achieve high accuracy with very deep networks and complex architectures to exploit different cues and features, doing so not only increases the vulnerability of the model, but also increases the difficulty of convergence. Moreover, recent depth estimation methods for indoor environments are impractical for outdoor environments. In this work, we aim to develop a simple deep network structure to improve model effectiveness for depth estimation. We apply a dual attention module that can be inserted into any type of network to improve the power of representation, and additionally propose a training strategy which combines transfer learning and ordinal regression to improve training convergence. Even with a simple end-to-end encoder-decoder type of network architecture, we are able to achieve state-of-the-art performance on two of the biggest datasets for indoor and outdoor depth estimation: NYU Depth v2 and KITTI.
ASJC Scopus subject areas