Recent advancements in AI have been significantly influenced by the Transformer architecture, a key component in large models across various fields like language, vision, audio, and biology. However, the complexity of the Transformer’s attention mechanism limits its application in processing long sequences. Even sophisticated models like GPT-4 struggle with this limitation.
Breakthrough with StripedHyena
To address these challenges, Together Research recently open-sourced StripedHyena, a language model boasting a novel architecture optimized for long contexts. StripedHyena can handle up to 128k tokens and has demonstrated improvements over the Transformer architecture in both training and inference performance. It’s the first model to match the performance of the best open-source Transformer models for both short and long contexts.
Hybrid Architecture of StripedHyena
StripedHyena incorporates a hybrid architecture, combining multi-head, grouped-query attention with gated convolutions within Hyena blocks. This design differs from the traditional decoder-only Transformer models. It decodes with constant memory in Hyena blocks through the representation of convolutions as state-space models or truncated filters. This architecture results in lower latency, faster decoding, and higher throughput compared to Transformers.
Training and Efficiency Gains
StripedHyena outperforms traditional Transformers in end-to-end training for sequences of 32k, 64k, and 128k tokens, with speed improvements of 30%, 50%, and over 100%, respectively. In terms of memory efficiency, it reduces memory usage by more than 50% during autoregressive generation compared to Transformers.
Comparative Performance with Attention Mechanism
StripedHyena achieves a significant reduction in the quality gap with large-scale attention, offering similar perplexity and downstream performance with less computational cost, and without the need for mixed attention.
Applications Beyond Language Processing
StripedHyena’s versatility extends to image recognition. Researchers have tested its applicability in replacing attention in visual Transformers (ViT), showing comparable accuracy in image classification tasks on the ImageNet-1k dataset.
StripedHyena represents a significant step forward in AI architecture, offering a more efficient alternative to the Transformer model, especially in handling long sequences. Its hybrid structure and enhanced performance in training and inference make it a promising tool for a wide range of applications in language and vision processing.
Image source: Shutterstock
Credit: Source link