The world is run by software. It manages automobile engines, iPhones, and nuclear weapons. However, there is a lack of programmers everywhere. Don’t you think it would be convenient if anyone could describe what they want a program to perform, and a computer could translate it into lines of code?
According to a recent study, the AlphaCode artificial intelligence (AI) technology is moving humans one step closer to that objective. The technology, developed by DeepMind, a division of Alphabet (Google’s parent company), may someday help experienced programmers, but it can’t likely take their job.
“It’s very impressive, the performance they’re able to achieve on some pretty challenging problems,” says Armando Solar-Lezama, head of the computer assisted programming group at the Massachusetts Institute of Technology.
AlphaCode improves upon Codex, a method introduced in 2021 by nonprofit research center OpenAI, which had previously set the bar for authoring AI code. The lab has already created GPT-3, a “big language model” that has been trained on billions of words from digital books, Wikipedia articles, and other internet text sites. GPT-3 is good at mimicking and reading human text. OpenAI developed Codex by fine-tuning GPT-3 using more than 100 gigabytes of code from the Internet software repository Github. When given a commonplace description of what it must accomplish, such as counting the vowels in a string of text, the software can generate code. But when faced with challenging tasks, it performs poorly.
The designers of AlphaCode concentrated on resolving such challenging issues. They began by feeding a big language model many gigabytes of code from GitHub, similar to the Codex researchers, only to get it acquainted with coding syntax and conventions. Then, using tens of thousands of issues gathered from programming contests, they trained it to convert problem descriptions into code. A task might, for instance, instruct a computer to count the number of binary strings (sequences of ones and zeros) of length n that don’t contain any consecutive zeros.
AlphaCode produces potential code solutions (in Python or C++) in response to a brand-new challenge and eliminates the subpar ones. However, DeepMind used AlphaCode to generate up to more than 1 million candidates, whereas academics have previously used models like Codex to generate tens or hundreds of candidates.
AlphaCode first keeps just the 1% of programs that successfully complete test cases that go along with issues. It clusters the keepers according to how closely their outputs resemble fabricated inputs in order to further reduce the field. Then, starting with the largest cluster, it sends programs from each cluster one at a time until it decides on a successful one or reaches ten submissions (about the maximum that humans submit in the competitions). It can test a variety of programming strategies because submissions come from various clusters. According to Kevin Ellis, a computer scientist at Cornell University who specializes in AI coding, that is the stage in AlphaCode’s method that is the most novel.
According to DeepMind’s report published this week in Science, AlphaCode completed 34% of the tasks given to it after training. (On comparable benchmarks, Codex recorded a single-digit success rate.)
DeepMind engaged AlphaCode into online coding contests to better evaluate its abilities. The system excelled in competitions with at least 5000 participants, outperforming 45.7% of programmers. The researchers observed no significant code or logic duplication when they compared the programs to those in the training database. Ellis was surprised by the inventiveness it inspired.
“It continues to be impressive how well machine-learning methods do when you scale them up,” he says. The results are “stunning,” adds Wojciech Zaremba, a co-founder of OpenAI and co-author of their Codex paper.
According to Yujia Li, a computer scientist at DeepMind and co-author of the paper, AI coding may have uses beyond winning tournaments. It might do routine software tasks, freeing developers to work at a higher or more abstract level, or it might assist non-programmers in developing straightforward applications.
David Choi, another study author at DeepMind, imagines running the model in reverse: translating code into explanations of what it’s doing, which could benefit programmers trying to understand others’ code. “There are a lot more things you can do with models that understand code in general,” he says.
DeepMind’s current goal is to lower the system’s mistakes. Li claims that even while AlphaCode creates functioning programs, it occasionally commits errors like establishing a variable but not using it.
There are further issues. Tens of billions of trillions of operations are needed for each problem in AlphaCode, which is computational capacity that only the biggest tech companies possess. Additionally, the issues from the online programming contests it resolved were specific and contained. However, handling huge code packages across several locations is a common requirement of real-world programming, according to Solar-Lezama, which necessitates a more comprehensive grasp of the software.
The study also highlights the potential danger of software that iteratively improves itself over time. Such self-improvement, according to some researchers, might result in a superintelligent AI that rules the globe. Even though it may seem unlikely, academics still want guardrails and built-in checks and balances to be implemented in the field of AI code.
“Even if this kind of technology becomes supersuccessful, you would want to treat it the same way you treat a programmer within an organization,” Solar-Lezama says. “You never want an organization where a single programmer could bring the whole organization down.”