Tokyo Tech researchers have created a novel accelerator chip called “Hiddenite” that can achieve state-of-the-art accuracy in the calculation of sparse “hidden neural networks” with fewer processing overhead.
The Hiddenite chip dramatically decreases external memory access for improved computational efficiency by using the proposed on-chip model construction, which is a combination of weight creation and “supermask” expansion.
Deep neural networks (DNNs) are a complex machine learning architecture for AI (artificial learning) that require a large number of parameters to learn to predict outcomes. DNNs, on the other hand, can be “pruned,” reducing the computational load and model size.
The “lottery ticket theory” took the machine learning community by storm a few years ago. The hypothesis indicated that after training, a randomly started DNN contains subnetworks with accuracy comparable to the original DNN.
For successful optimization, the larger the network, the more “lottery tickets” there are. As a result, “pruned” sparse neural networks can attain accuracies comparable to more complicated, “dense” networks, lowering total processing costs and power consumption.
The hidden neural network (HNN) algorithm employs AND logic (where the output is only high when all the inputs are high) on the initialized random weights and a “binary mask” called a “supermask” to locate such subnetworks. The supermask denotes the unselected and chosen connections as 0 and 1, respectively, as specified by the top-k percent highest scores.
From a software standpoint, the HNN aids in the reduction of computing efficiency. The computing of neural networks, on the other hand, necessitates hardware advances. Traditional DNN accelerators are fast, but they don’t account for the power consumption caused by external memory access.
The first two factors are what set the Hiddenite chip apart from existing DNN inference accelerators. Moreover, we also introduced a new training method for hidden neural networks, called ‘score distillation,’ in which the conventional knowledge distillation weights are distilled into the scores because hidden neural networks never update the weights. The accuracy using score distillation is comparable to the binary model while being half the size of the binary model.Masato Motomura
Professors Jaehoon Yu and Masato Motomura of Tokyo Institute of Technology (Tokyo Tech) have invented a novel accelerator chip called “Hiddenite” that can calculate hidden neural networks with dramatically reduced power consumption.
“Reducing the external memory access is the key to reducing power consumption. Currently, achieving high inference accuracy requires large models. But this increases external memory access to load model parameters. Our main motivation behind the development of Hiddenite was to reduce this external memory access,” explains Prof. Motomura.
Their research will be presented at the International Solid-State Circuits Conference (ISSCC) 2022, an international conference displaying the pinnacles of integrated circuit achievement.
Hidden Neural Network Inference Tensor Engine (Hiddenite) is the first HNN inference chip. Hiddenite’s architecture has three advantages for reducing external memory access and increasing energy efficiency. The first is that it has on-chip weight generation, which allows you to re-generate weights with a random number generator.
This eliminates the requirement to access and store the weights in external memory. The availability of “on-chip supermask expansion,” which decreases the amount of supermasks that must be loaded by the accelerator, is the second benefit.
The Hiddenite chip’s high-density four-dimensional (4D) parallel processor, which maximizes data re-use during the computational process and thus improves efficiency, is the chip’s third enhancement.
“The first two factors are what set the Hiddenite chip apart from existing DNN inference accelerators,” reveals Prof. Motomura. “Moreover, we also introduced a new training method for hidden neural networks, called ‘score distillation,’ in which the conventional knowledge distillation weights are distilled into the scores because hidden neural networks never update the weights. The accuracy using score distillation is comparable to the binary model while being half the size of the binary model.”
The team used the Taiwan Semiconductor Manufacturing Company’s (TSMC) 40nm technology to develop, build, and measure a prototype chip based on the hiddenite architecture.
The chip is only 3mm x 3mm in size and can simultaneously do 4,096 MAC (multiply-and-accumulate) operations. It achieves cutting-edge computing efficiency of up to 34.8 trillion or tera operations per second (TOPS) per Watt of power while halving the amount of model transfer compared to binarized networks.
These results, as well as their successful demonstration in an actual silicon chip, are sure to usher in a new era of machine learning, paving the way for faster, more efficient, and ultimately more environmentally friendly computing.