With the trends of DIY movements and the maker economy, great needs for the applications of low volume automation (LVA) are predictable. To do so, a learning from demonstration (LfD) problem is addressed in this paper, where robots are taught through demonstrated actions to manipulate a coffee maker. The system is developed using YOLO deep learning architecture to recognize objects such as cups, coffee capsules. Employing a Kinect RGB-D camera, the robot is capable of obtaining the coordinates of the objects and the corresponding moving trajectories. Integrating the above two techniques, the robot is able to recognize the demonstrated actions to establish an action database comprising several sub-actions such as moving a cup, pouring coffee, trigger the coffee machine, etc. Finally, the manipulation of the robot is made by following the order of demonstrated actions. As a result, a vision-based LfD system is established, allowing the robot to learn from human demonstrations and act accordingly.