Because unprecedented volumes of multimedia data associated with spoken documents have been made available to the public, spoken document retrieval (SDR) has become an important research area in the past decades. Recently, representation learning has emerged as an active research topic in many machine learning applications owing largely to its excellent performance. In the context of natural language processing, the pioneering work can date back to the word embedding methods. However, learning of paragraph (or sentence and document) representations is more reasonable and suitable for some tasks, such as information retrieval and document summarization. Nevertheless, as far as we are aware, there is relatively less work focusing on launching paragraph embedding methods into SDR. Motivated by these observations, this paper proposes a novel paragraph embedding method, named the locality-preserving essence vector (LPEV) model. LPEV is designed with consideration to two aspects. First, the model aims at not only distilling the most representative information from a paragraph but also getting rid of the general background information. Second, inspired by the local invariance perspective, which is a celebrated principle used in manifold learning techniques, LPEV also manages to preserve semantic locality in the learned low-dimensional embedding space for producing more informative and discriminative vector representations of paragraphs. On top of the proposed framework, a series of empirical SDR experiments conducted on the TDT-2 (Topic Detection and Tracking) collection demonstrate the good efficacy of our SDR methods as compared to existing strong baselines.