NVIDIA has announced the development of a new fine-tuning method called DoRA (Weight-Decomposed Low-Rank Adaptation), which offers a high-performing alternative to the widely used Low-Rank Adaptation (LoRA). According to the NVIDIA Technical Blog, DoRA enhances both the learning capacity and stability of LoRA without introducing any additional inference overhead.
Advantages of DoRA
DoRA has demonstrated significant performance improvements across various large language models (LLMs) and vision language models (VLMs). For instance, in common-sense reasoning tasks, DoRA outperformed LoRA with improvements such as +3.7 points on Llama 7B and +4.4 points on Llama 3 8B. Additionally, DoRA showed better results in multi-turn benchmarks, image/video-text understanding, and visual instruction tuning tasks.
This innovative method has been accepted as an oral paper at ICML 2024, marking its credibility and potential impact in the field of machine learning.
Mechanics of DoRA
DoRA operates by decomposing the pretrained weight into its magnitude and directional components, fine-tuning both. The method leverages LoRA for directional adaptation, ensuring efficient fine-tuning. After the training process, DoRA merges the fine-tuned components back into the pretrained weight, avoiding any additional latency during inference.
Visualizations of the magnitude and directional differences between DoRA and pretrained weights reveal that DoRA makes substantial directional adjustments with minimal changes in magnitude, closely resembling full fine-tuning (FT) learning patterns.
Performance Across Models
In various performance benchmarks, DoRA consistently outperforms LoRA. For example, in large language models, DoRA significantly enhances commonsense reasoning abilities and conversation/instruction-following capabilities. In vision language models, DoRA shows superior results in image-text and video-text understanding, as well as visual instruction tuning tasks.
Large Language Models
Comparative studies highlight that DoRA surpasses LoRA in commonsense reasoning benchmarks and multi-turn benchmarks. In tests, DoRA achieved higher average scores across various datasets, indicating its robust performance.
Vision Language Models
DoRA also excels in vision language models, outperforming LoRA in tasks like image-text understanding, video-text understanding, and visual instruction tuning. The method’s efficacy is evident in higher average scores across multiple benchmarks.
Compression-Aware LLMs
DoRA can be integrated into the QLoRA framework, enhancing the accuracy of low-bit pretrained models. Collaborative efforts with Answer.AI on the QDoRA project showed that QDoRA outperforms both FT and QLoRA on Llama 2 and Llama 3 models.
Text-to-Image Generation
DoRA’s application extends to text-to-image personalization with DreamBooth, yielding significantly better results than LoRA in challenging datasets like 3D Icon and Lego sets.
Implications and Future Applications
DoRA is poised to become a default choice for fine-tuning AI models, compatible with LoRA and its variants. Its efficiency and effectiveness make it a valuable tool for adapting foundation models to various applications, including NVIDIA Metropolis, NVIDIA NeMo, NVIDIA NIM, and NVIDIA TensorRT.
For more detailed information, visit the NVIDIA Technical Blog.
Image source: Shutterstock
Credit: Source link