We address an important issue of fully low-cost and low-complex video compression for use in resource-extremely limited sensors/devices. Conventional motion estimation-based video compression or distributed video coding (DVC) techniques all rely on the high-cost mechanism, namely, sensing/sampling and compression are disjointedly performed, resulting in unnecessary consumption of resources. That is, most acquired raw video data will be discarded in the (possibly) complex compression stage. In this paper, we propose a dictionary learning-based distributed compressive video sensing (DCVS) framework to "directly" acquire compressed video data. Embedded in the compressive sensing (CS)-based single-pixel camera architecture, DCVS can compressively sense each video frame in a distributed manner. At DCVS decoder, video reconstruction can be formulated as an l1-minimization problem via solving the sparse coefficients with respect to some basis functions. We investigate adaptive dictionary/basis learning for each frame based on the training samples extracted from previous reconstructed neighboring frames and argue that much better basis can be obtained to represent the frame, compared to fixed basis-based representation and recent popular "CS-based DVC" approaches without relying on dictionary learning.