Exploring Quantum Based Adapter Methods and Reparameterization for Transformer Models
By Shaurya, Joshua
Recent deep learning research has widely popularized the transformer model. Transformers have now ubiquitously been implemented in state-of-the-art models like GPT, Whisper, DALLE, GoogleViT, Facebook DETR, among many others. However, the true success of these models has to do with their adaptability to different downstream tasks. Traditional fine-tuning approaches involve re-training all of the weights of the model, which can increase inference latency and decrease efficiency due to the millions or even billions of parameters potentially involved, making full fine-tuning prohibitive. To avoid full fine-tuning, researchers developed Low Rank Adaptation (LoRA), which greatly reduces the number of parameters necessary to train. We propose a novel method of LoRA that takes advantage of a quantum circuit module that applies U3 transformations to inputs treated as normalized quantum superposition state vectors. In doing so, we aim to achieve superior accuracy or faster convergence to a reasonable accuracy in comparison to other state-of-the-art fine-tuning methods. Our new LoRA method can be applied to any pre-existing deep learning architectures that utilize weight matrices, as a method of parameter efficient fine tuning. This includes large language models, diffusers, and more. The research can empower others to create deep learning models specific to their own tasks efficiently while avoiding expensive training. In the future, we propose exploring the use of other gates and operators, as well as photonic operations that are not necessarily spin based.