With the trends of DIY movements and maker economy, great needs for the applications of low volume automation are expected. Various approaches have been proposed by moving from low level coding to more intuitive methods for robots to perform tasks. In this article, a vision-based learning from demonstration (LfD) system is proposed, where a mimic robot is developed to learn from human demonstrations to manipulate a coffee maker. Employing two cameras as sensing devices, the proposed LfD system integrates object detection and multiaction recognition to build an action base, where object recognition is achieved by You only look once (YOLO) deep learning architecture, while multiaction recognition is carried out by an Inflated 3D ConvNet (I3D) deep learning network followed by a proposed statistically fragmented approach to enhance the action recognition results. Based on the sequential order of actions in the action base, tasks demonstrated by humans can be reproduced by the mimicking robot. To validate the proposed approach, manipulation of a coffee maker is adopted as an example, where the proposed system is capable of reproducing the tasks demonstrated by a human under different circumstances without knowledge in computer programming or robotics.
ASJC Scopus subject areas