Automatic mispronunciation detection and diagnosis are two critical and integral components of a computer-assisted pronunciation training (CAPT) system, collectively facilitating second-language (L2) learners to pinpoint erroneous pronunciations in a given utterance so as to improve their spoken proficiency. In this chapter, we will first briefly introduce the latest trends and developments in mispronunciation detection and diagnosis with state-of-the-art automatic speech recognition (ASR) methodologies, especially those using deep neural network based acoustic models. Afterward, we present an effective training approach that estimates the deep neural network based acoustic models involved in the mispronunciation detection process by optimizing an objective directly linked to the ultimate performance evaluation metric. We also investigate the extent to which the subsequent mispronunciation diagnosis process can benefit from the use of these specifically trained acoustic models. For this purpose, we recast mispronunciation diagnosis as a classification problem and a set of indicative features are derived. A series of experiments on a Mandarin Chinese mispronunciation detection and diagnosis task are conducted to evaluate the performance merits of such an approach.