Revolutionary On-Device AI Training Method Developed

Published 10 months ago

Researchers from MIT, the MIT-IBM Watson AI Lab, and others have developed a technique to efficiently adapt deep-learning models to new sensor data directly on edge devices, like smartphones, which typically lack the capacity for such processes. This breakthrough method is called PockEngine.

PockEngine: Enhancing On-Device Machine Learning

PockEngine determines which parts of large machine-learning models need updating to improve accuracy and performs most computations while the model is being prepared, before actual runtime. This minimizes computational overhead and speeds up the fine-tuning process.

The researchers found that PockEngine significantly improved on-device training speeds, performing up to 15 times faster on some hardware platforms, without causing any decrease in model accuracy. Moreover, PockEngine’s fine-tuning method enabled a popular AI chatbot to answer complex questions more accurately.

The Inner Workings of PockEngine

Deep-learning models are based on neural networks, comprising many interconnected layers of nodes that process data to make a prediction. During the model’s run, a process known as inference, data input is passed from layer to layer until the prediction is output.

During training and fine-tuning, the model undergoes a process known as backpropagation, where the output is compared to the correct answer, and the model is run in reverse. Each layer is updated as the model’s output gets closer to the correct answer, making fine-tuning more memory-demanding than inference.

PockEngine capitalizes on the fact that not all layers in the neural network are important for improving accuracy. The system fine-tunes each layer, one at a time, on a certain task and measures the accuracy improvement after each individual layer. It identifies the contribution of each layer and automatically determines the percentage of each layer that needs fine-tuning.

Paring Down Models for Efficiency

Conventionally, the backpropagation graph is generated during runtime, which involves a great deal of computation. PockEngine does this during compile time, while the model is being prepared for deployment. It removes unnecessary layers or pieces of layers, creating a pared-down graph of the model to be used during runtime. It then performs other optimizations on this graph to further improve efficiency.

When PockEngine was applied to deep-learning models on different edge devices, it performed on-device training up to 15 times faster, without any drop in accuracy. It also significantly reduced the amount of memory required for fine-tuning.

Practical Applications of PockEngine

When applied to the large language model Llama-V2, PockEngine efficiently handled the fine-tuning process and learned to interact with users. For instance, Llama-V2 models that were fine-tuned using PockEngine answered complex questions correctly, while those that weren’t fine-tuned failed to do so.

In the future, researchers plan to use PockEngine to fine-tune even larger models designed to process text and images together. The technology is expected to address growing efficiency challenges posed by the adoption of large AI models across diverse applications in many different industries. It holds promise for edge applications that incorporate larger models, as well as for lowering the cost of maintaining and updating large AI models in the cloud.