Technology

Accelerating Machine Learning

Accelerating Machine Learning

Machine learning is similar to erosion. Now that organizations have experienced some artificial intelligence success, IT teams are under pressure to scale AI projects more quickly. Experts offer advice on how to accelerate AI adoption and results across the enterprise.

Data is thrown at a mathematical model like grains of sand across a rocky terrain. Some of those grains simply float along with little or no consequence. However, some of them leave their imprint by testing, hardening, and ultimately reshaping the landscape in response to inherent patterns and fluctuations that emerge over time.

Effective? Yes. Efficient? Not so much.

Rick Blum, the Robert W. Wieseman Professor of Electrical and Computer Engineering at Lehigh University, is working to improve the efficiency of distributed learning techniques, which are becoming increasingly important in modern artificial intelligence (AI) and machine learning (ML). In essence, his goal is to hurl far fewer grains of data while maintaining the overall impact.

The practice of using algorithms to parse data, learns from it, and then makes a determination or prediction about something in the world. Distributed optimization problems appear in a variety of scenarios that typically rely on wireless communications. The fundamental challenges are latency, scalability, and privacy.

Professor Rick Blum

It’s difficult to pin down a single definition of machine learning (ML), because different people will give you different answers. Nvidia defines it as “the practice of using algorithms to parse data, learn from it, and then make a determination or prediction about something in the world.” McKinsey&Company agrees with Nvidia, stating that ML is “based on algorithms that can learn from data without relying on rules-based programming.” Stanford suggests that ML is “the science of getting computers to act without being explicitly programmed.”

In the paper “Distributed Learning With Sparsified Gradient Differences,” published in a special ML-focused issue of the IEEE Journal of Selected Topics in Signal Processing, Blum and collaborators propose the use of “Gradient Descent method with Sparsification and Error Correction,” or GD-SEC, to improve the communications efficiency of machine learning conducted in a “worker-server” wireless architecture.

Accelerating the pace of machine learning

Regardless of the definition you use, the goal of machine learning at its most basic level is to adapt to new data independently and make decisions and recommendations based on thousands of calculations and analyses. It is accomplished by imbuing artificial intelligence machines or systems with the ability to learn from the data that is fed to them. Human intervention is minimal as the systems learn, identify patterns, and make decisions. Machines, in theory, improve accuracy and efficiency while eliminating (or greatly reducing) the possibility of human error.

“Distributed optimization problems appear in a variety of scenarios that typically rely on wireless communications,” he says. “The fundamental challenges are latency, scalability, and privacy.”

“To solve this problem, various distributed optimization algorithms have been developed,” he continues, “and one primary method is to use classical GD in a worker-server architecture.” In this environment, after aggregating data from all workers, the central server updates the model’s parameters and broadcasts the updated parameters back to the workers. The fact that each worker must transmit all of its data all of the time, however, limits overall performance. This can be on the order of 200 MB from each worker device at each iteration when training a deep neural network. This communication step has the potential to be a significant bottleneck in overall performance, particularly in federated learning and edge AI systems.”

Through the use of GD-SEC, Blum explains, communication requirements are significantly reduced. The technique employs a data compression approach where each worker sets small magnitude gradient components to zero — the signal-processing equivalent of not sweating the small stuff. The worker then only transmits to the server the remaining non-zero components. In other words, meaningful, usable data are the only packets launched at the model.

“Current methods create a situation in which each worker has an expensive computational cost; GD-SEC is relatively cheap because only one GD step is required at each round,” Blum says.

The nearly infinite amount of available data, affordable data storage, and the advancement of less expensive and more powerful processing have fueled the growth of ML. Many industries are now developing more robust models capable of analyzing larger and more complex data sets while providing faster, more accurate results on massive scales. Organizations can use machine learning tools to identify profitable opportunities and potential risks more quickly.