This paper presents a novel pipelined architecture for fast competitive learning (CL). It is used as a hardware accelerator in a system on programmable chip (SOPC) for reducing the computational time. In the architecture, a novel codeword swapping scheme is adopted so that both neuron competition processes for different training vectors can be operated concurrently. The neuron updating process is based on a hardware divider with simple table lookup operations. The divider performs finite precision calculation for area cost reduction at the expense of slight degradation in training performance. Experimental results show that the CPU time is lower than that of other hardware or software implementations running the CL training program with or without the support of custom hardware.