In terms of Accounting, ChatGPT is still no match for Humans

Human accountants have several advantages, including critical thinking, professional judgment, and the ability to apply accounting principles to unusual situations. They can navigate complex scenarios, interpret regulations with discretion, and provide strategic advice tailored to a company’s specific needs. Furthermore, when making financial decisions, human accountants can take into account broader business and economic contexts.

On accounting assessments, ChatGPT competed against students. Students scored an overall average of 76.7%, compared to 47.4% for ChatGPT. ChatGPT outperformed the student average on 11.3% of questions, excelling in AIS and auditing. However, the AI bot performed worse on tax, financial, and managerial assessments, possibly due to ChatGPT’s difficulty with the mathematical processes required for the latter.

GPT-4, OpenAI’s newest AI chatbot product, was released last month. According to OpenAI, the bot, which uses machine learning to generate natural language text, passed the bar exam with a score in the 90th percentile, 13 of 15 AP exams, and the GRE Verbal test with a nearly perfect score.

Inquiring minds at BYU and 186 other universities were curious to see how OpenAI’s technology performed on accounting exams. As a result, they tested the original version, ChatGPT. According to the researchers, while there is still work to be done in the realm of accounting, it is a game changer that will change the way everyone teaches and learns for the better.

When this technology first came out, everyone was worried that students could now use it to cheat. However, there have always been opportunities to cheat. So, for us, we’re focusing on what we can do with this technology now that we couldn’t do before to improve the teaching and learning processes for faculty and students.
David Wood

“When this technology first came out, everyone was worried that students could now use it to cheat,” said lead study author David Wood, a BYU accounting professor. “However, there have always been opportunities to cheat. So, for us, we’re focusing on what we can do with this technology now that we couldn’t do before to improve the teaching and learning processes for faculty and students. It was eye-opening to try it out.”

Since its debut in November 2022, ChatGPT has become the fastest growing technology platform ever, reaching 100 million users in under two months. In response to intense debate about how models like ChatGPT should factor into education, Wood decided to recruit as many professors as possible to see how the AI fared against actual university accounting students.

His social media co-author recruiting pitch went viral: 327 co-authors from 186 educational institutions in 14 countries took part in the study, contributing 25,181 classroom accounting exam questions. They also enlisted the help of undergrad BYU students (including Wood’s daughter, Jessica) to feed ChatGPT with another 2,268 textbook test bank questions. The questions ranged in difficulty and type (true/false, multiple choice, short answer, etc.) and covered accounting information systems (AIS), auditing, financial accounting, managerial accounting, and tax.

ChatGPT is still no match for humans when it comes to accounting

Although ChatGPT performed admirably, the students outperformed it. Students scored an overall average of 76.7%, compared to 47.4% for ChatGPT. ChatGPT outperformed the student average on 11.3% of questions, excelling in AIS and auditing. However, the AI bot performed worse on tax, financial, and managerial assessments, possibly due to ChatGPT’s difficulty with the mathematical processes required for the latter.

ChatGPT performed better on true/false questions (68.7% correct) and multiple-choice questions (59.5%), but struggled with short-answer questions (28.7% to 39.1%). Higher-order questions were generally more difficult for ChatGPT to answer. Indeed, ChatGPT would occasionally provide authoritative written descriptions for incorrect answers, or answer the same question in different ways.

“It’s not perfect; you’re not going to be using it for everything,” said Jessica Wood, currently a freshman at BYU. “Trying to learn solely by using ChatGPT is a fool’s errand.”

The researchers also uncovered some other fascinating trends through the study, including:

ChatGPT doesn’t always recognize when it is doing math and makes nonsensical errors such as adding two numbers in a subtraction problem, or dividing numbers incorrectly.
ChatGPT often provides explanations for its answers, even if they are incorrect. Other times, ChatGPT’s descriptions are accurate, but it will then proceed to select the wrong multiple-choice answer.
ChatGPT sometimes makes up facts. For example, when providing a reference, it generates a real-looking reference that is completely fabricated. The work and sometimes the authors do not even exist.

Having said that, the authors fully expect GPT-4 to improve exponentially on the accounting questions posed in their study, as well as the issues raised above. What they find most promising is how the chatbot can help improve teaching and learning, such as the ability to design and test assignments or be used to draft portions of a project.

“It’s an opportunity to reflect on whether we are teaching value-added information or not,” said study coauthor and BYU accounting professor Melissa Larson. “This is a disruption, and we need to figure out what we’re going to do next.” Of course, I’ll continue to have TAs, but this will force us to use them in new ways.”