Build a lasting personal brand

AI Tools Automate Extraction of Experimental Data from Scientific Papers to Accelerate Materials Discovery

By Editorial Staff

TL;DR

NIMS researchers developed LLM tools to accelerate materials database construction, giving scientists a competitive edge in discovering new functional materials faster than traditional methods.

The Starrydata project uses LLMs to extract structured data from scientific papers, automating the conversion of complex information into organized databases for materials property analysis.

By digitizing and sharing experimental data globally, this research accelerates materials development for sustainable technologies, potentially improving energy efficiency and environmental solutions worldwide.

Researchers are using AI like ChatGPT to mine millions of scientific papers, transforming untapped experimental data into searchable databases that reveal hidden patterns in materials science.

Found this article helpful?

Share it with your network and spread the knowledge!

AI Tools Automate Extraction of Experimental Data from Scientific Papers to Accelerate Materials Discovery

Materials scientists developing new functional materials for technologies like smartphones and automobiles face significant challenges in predicting material properties, as theoretical models alone cannot provide reliable predictions due to complex relationships between composition, synthesis methods, and resulting properties. A research team led by Dr. Yukari Katsura at Japan's National Institute for Materials Science has developed two artificial intelligence tools that leverage large language models to automate the extraction of experimental data from scientific papers, dramatically accelerating the construction of materials property databases.

The tools are designed to streamline data collection for Starrydata, a materials property database launched in 2015 that previously relied on manual data extraction from papers. "Graphs in the millions of papers published to date contain valuable experimental data collected by past researchers, and much of it remains untapped," says Dr. Katsura. The research was recently published in the journal Science and Technology of Advanced Materials: Methods at https://doi.org/10.1080/27660400.2025.2590811.

The first tool, Starrydata Auto-Suggestion for Sample Information, is already integrated into the Starrydata2 web system and uses OpenAI's GPT via API to read paper text and suggest candidate entries for data fields pre-designed for specific materials domains. When users paste text from a paper's abstract or experimental methods section, the system automatically displays candidate entries in English below each input field.

The second tool, Starrydata Auto-Summary GPT, deconstructs entire open-access paper PDFs uploaded by users and automatically summarizes all descriptions of figures, tables, and samples appearing in papers as structured data in JSON format. Generated using ChatGPT's custom GPT feature, the resulting data can be viewed as easy-to-read tables in web browsers. While this data isn't currently incorporated directly into the Starrydata database, it dramatically accelerates data collectors' work in locating target information and entering data.

Dr. Katsura notes that many publishers prohibit artificial intelligence use on paper PDFs, so the team is currently developing the system to target open-access papers. The tools represent a significant advancement because LLMs can perform flexible information extraction that considers background knowledge and context, enabling automation of converting complex information sources like scientific papers into structured data.

The implications for materials science and related industries are substantial. Building large-scale datasets of experimental data through this approach could enable researchers to gain inspiration through bird's-eye views of data and realize property predictions based on empirical trends using machine learning. "A paper is a logical structure assembled to convey the author's claims, but by deconstructing it and returning it to the form of experimental data, other researchers can also use it for their own research," explains Dr. Katsura.

Currently, Starrydata has progressed in building databases for specific materials science fields like thermoelectric materials and magnets. As an open dataset usable for new materials development, it's beginning to be utilized by leading researchers worldwide. The team aims to raise broader awareness of large-scale experimental data's potential and establish paper data collection as a recognized research form within the scientific community. This development could significantly accelerate materials discovery cycles across industries that depend on advanced materials, from electronics to automotive to energy sectors.

Curated from NewMediaWire

blockchain registration record for this content
Editorial Staff

Editorial Staff

@editorial-staff

Newswriter.ai is a hosted solution designed to help businesses build an audience and enhance their AIO and SEO press release strategies by automatically providing fresh, unique, and brand-aligned business news content. It eliminates the overhead of engineering, maintenance, and content creation, offering an easy, no-developer-needed implementation that works on any website. The service focuses on boosting site authority with vertically-aligned stories that are guaranteed unique and compliant with Google's E-E-A-T guidelines to keep your site dynamic and engaging.