Digital Signal Processor (DSP) has been widely used in processing video and audio streaming data. Due to the huge amount of streaming data, increasing throughput is the key issue in designing DSP architecture. One way to increase the throughput of a DSP is to increase the instruction level parallelism. To increase the instruction level parallelism, many architectures have been proposed and can be classified into two main approaches, the superscaler and the VLIW architectures. Among the hardware architectures, the VLIW attracts a lot of attention due to its simple hardware complexity. However, the VLIW architecture suffers from the explosion of instruction memories due to the overhead of instruction grouping. In this paper, we propose a novel DSP architecture which contains three pipelines and performs dynamic instruction grouping by hardware. The experimental results show that our architecture can reduce 13% of memory requiremnt on average while maintaining the same performance.