Technology

Bot-assisted Brainstorming

Bot-assisted Brainstorming

Experts in electronic nanomaterials have speculated on how recent breakthroughs in artificial intelligence (AI) and machine learning (ML) could improve scientific brainstorming and ideation. To accomplish this, they created a chatbot that is knowledgeable on the kind of science he has been involved in.

A scientist has just completed a scientific paper. She recognizes that her work could benefit from a different perspective. Did she miss something? Perhaps there is a use of her study that she hasn’t considered. A second set of eyes would be ideal, but even the most helpful collaborators may not have the time to read all of the necessary background literature to catch up.

Kevin Yager — leader of the electronic nanomaterials group at the Center for Functional Nanomaterials (CFN), a U.S. Department of Energy (DOE) Office of Science User Facility at DOE’s Brookhaven National Laboratory — has imagined how recent advances in artificial intelligence (AI) and machine learning (ML) could aid scientific brainstorming and ideation. To accomplish this, he has developed a chatbot with knowledge in the kinds of science he’s been engaged in.

Rapid breakthroughs in AI and machine learning have enabled programs to create creative text and useful software code. These general-purpose chatbots have recently piqued the public’s interest. Existing chatbots lack comprehensive knowledge of scientific sub-domains since they are built on huge, diversified language models. Yager’s bot is versed in areas of nanomaterial science that other bots are not because it uses a document-retrieval strategy. The details of this effort, as well as how other scientists might use this AI partner for their own research, were just published in Digital Discovery.

A challenge that’s common with language models is that sometimes they ‘hallucinate’ plausible sounding but untrue things. This has been a core issue to resolve for a chatbot used in research as opposed to one doing something like writing poetry.

Kevin Yager

Rise of the Robots

“For a long time, CFN has been researching new ways to use AI/ML to accelerate nanomaterial discovery.” It is currently assisting us in quickly identifying, cataloging, and selecting samples, automating tests, controlling equipment, and discovering new materials. Esther Tsai, a CFN scientist in the electronic nanomaterials division, is working on an AI companion to aid in the speeding up of materials research activities at the National Synchrotron Light Source II (NSLS-II).” Another DOE Office of Science User Facility at Brookhaven Lab is NSLS-II.

There has been a lot of work done at CFN on AI/ML that can help drive experiments through the use of automation, controls, robotics, and analysis, but having a program that was adept with scientific literature was something that researchers hadn’t looked into as much. Being able to swiftly document, comprehend, and communicate information about an experiment can be beneficial in a variety of ways, from breaking down language barriers to saving time by summarizing larger pieces of work.

Watching Your Language

The software requires domain-specific text to develop a specialist chatbot – language extracted from areas the bot is intended to focus on. The text in this situation is scientific articles. Domain-specific material assists the AI model in understanding new vocabulary and meanings, as well as introducing it to cutting-edge scientific concepts. Most crucially, this curated set of documents allows the AI model to base its reasoning on reliable facts.

AI models are trained on existing text to mimic genuine human language, allowing them to understand the structure of language, memorize numerous facts, and build primitive thinking. Rather than retraining the AI model on nanoscience content, Yager provided it the capacity to search for relevant information in a handpicked set of articles. Providing it with a data library was only half of the struggle. The bot would require a means to decipher the relevant context in order to use this text correctly and effectively.

“A challenge that’s common with language models is that sometimes they ‘hallucinate’ plausible sounding but untrue things,” explained Yager. “This has been a core issue to resolve for a chatbot used in research as opposed to one doing something like writing poetry. We don’t want it to fabricate facts or citations. This needed to be addressed. The solution for this was something we call ’embedding,’ a way of categorizing and linking information quickly behind the scenes.”

Embedding is the process of converting words and phrases into numerical numbers. The resulting “embedding vector” quantifies the text’s significance. When a user asks the chatbot a question, the vector value is also given to the ML embedding model. This vector is used to search through a pre-computed database of text chunks inserted in scientific journals. The bot then uses text fragments it discovers that are semantically connected to the inquiry to gain a better understanding of the context.

The user’s inquiry and text fragments are integrated into a “prompt” that is sent to a large language model, which is a massive software that generates text modeled on natural human language and provides the final response. The embedding guarantees that the material being extracted is relevant to the user’s query. The chatbot generates factual and sourced answers by giving text chunks from the body of trusted documents.

“The program needs to be like a reference librarian,” said Yager. “It needs to heavily rely on the documents to provide sourced answers. It needs to be able to accurately interpret what people are asking and be able to effectively piece together the context of those questions to retrieve the most relevant information. While the responses may not be perfect yet, it’s already able to answer challenging questions and trigger some interesting thoughts while planning new projects and research.”

Bots Empowering Humans

CFN is creating AI/ML systems as tools to free up human researchers’ time to work on more complex and fascinating subjects while computers automate tedious jobs in the background. There are many unknowns about this new method of working, but these issues are the starting point for vital talks scientists are currently having to guarantee AI/ML use is safe and ethical.

“A domain-specific chatbot like this could relieve a scientist’s workload in a variety of ways.” “Classifying and organizing documents, summarizing publications, highlighting relevant information, and getting up to speed in a new topical area are just a few potential applications,” Yager observed. “I’m interested to see where this all goes.” We could never have envisioned where we are now three years ago, and I’m excited to see where we will be in three years.”