Home Analytics NASA accelerates science with gen AI-powered search

by Thor Olavsrud

Senior Writer

NASA accelerates science with gen AI-powered search

Case Study

Jan 15, 20245 mins

Aerospace and Defense IndustryArtificial IntelligenceCIO

Aiming to give scientists better access to the vast quantity of science data it collects, NASA has created a Science Discovery Engine that leverages generative AI to deliver contextual results.

Credit: NASA

When you generate and collect as much data as the US National Aeronautics and Space Administration (NASA) does, finding just the right data set for a research project can be a problem.

With seven operating centers, nine research facilities, and more than 18,000 staff, the agency continually generates an overwhelming amount of data, which it stores in more than 30 science data repositories across five topical areas — astrophysics, heliophysics, biological science, physical science, earth science, and planetary science. Overall, the agency houses more than 88,000 datasets and 715,000 documents across 128 data sources. Its earth science data alone is expected to hit 250 petabytes by 2025. In light of such complexity, scientists need more than just domain expertise to navigate through it all.

“It requires researchers to know which repository to go to and what that repository has,” says Kaylin Bugbee, NASA data scientist at Marshall Space Flight Center in Huntsville, Ala. “You have to be both science literate and data literate.”

In 2019, NASA’s Science Mission Directorate (SMD) released a report based on a series of interviews with scientists that made it clear those scientists needed a centralized search capability to help them find the data they needed. The SMD’s mission is to engage with the US science community, sponsor scientific research, and use aircraft, balloon, and spaceflight programs for investigations in Earth orbit, in the Solar System, and beyond. Recognizing that giving scientists and researchers access to its data was fundamental to its purpose, SMD developed its Open Source Science Initiative (OSSI) as a result of that report in an effort to make publicly funded scientific research transparent, inclusive, accessible, and reproducible. The mission of the OSSI: a commitment to the open sharing of software, data, and knowledge (including algorithms, papers, documents, and ancillary information) as early as possible in the scientific process.

“It really came from the scientists and scientific community, and it also aligns with our broader SMD priority of enabling interdisciplinary science,” Bugbee says. “That’s where new discoveries are made.”

To facilitate that mission, the agency is now turning to a combination of neural nets and generative AI to put those vast amounts of data at scientists’ fingertips.

Restoring order

A key element of OSSI is the Science Discovery Engine (SDE), a centralized search and discovery capability for all of NASA’s open science data and information, powered by Sinequa’s enterprise search platform.

“Until the SDE was created, you couldn’t go to a single place to search for our open data and documentation,” Bugbee says. “Now it serves as a single search capability for our open science data.”

New York-based Sinequa, which got its start more than two decades ago with a semantic search engine, focuses on leveraging AI and large language models (LLMs) to deliver contextual search information. It has since integrated Microsoft’s Azure OpenAI Service with its own neural search capabilities to power the platform.

Specifically, Sinequa’s neural search capability uses a combination of keyword and vector search to discover information, while its GPT summarizes the information gathered into rapidly digestible and reusable formats. It also allows scientists to use natural language to ask deeper questions and refine the search or the response. The SDE understands nearly 9,000 different scientific terms, with that number expected to grow as the AI learns.

Bugbee and her interdisciplinary team, which includes scientists with expertise in data stewardship and informatics, as well as developers and AI and ML experts, worked closely with stakeholders to understand their needs, and also with NASA’s Office of the CIO and Sinequa to build a proof of concept.

“They helped us set up the environment we needed,” she explains. “We had to have an open capability, so we had some special architectural needs.”

Bugbee says one of her team’s biggest challenges in getting everything up and running was how dispersed content was across the NASA ecosystem. Her team spent about a year trying to understand the information landscape, the data, and the metadata schemas.

“All of the contextual information that really brings richness to the data — things like code and GitHub, or algorithm documentation that describes how the data was developed — that kind of content is spread over a number of web pages and it’s been an effort to curate and identify where all those things reside,” she says.

Cleared for launch

Bugbee is no stranger to data management and data stewardship. She cut her teeth in the field working to improve metadata quality in Data.gov and on President Obama’s Climate Data Initiative. But working on the SDE really drove home the importance of good curation workflow: the processes for principled and controlled data creation, maintenance, and management.

“If I could go back in time, I’d have a more robust curation workflow built in from the beginning,” she says. “We kind of used the out-of-the-box approach to start and it worked for a time, but to really get the results we wanted, we needed that curation workflow.”

While the SDE is still in beta, Bugbee says her team has received a great deal of positive feedback from scientists to date, and the plan is to deliver a more fully operational system later this year. Already the team has implemented a new user interface that allows users to filter by topics before they begin their search.

by Thor Olavsrud

Senior Writer

Thor Olavsrud covers data analytics, business intelligence, and data science for CIO.com. He resides in New York.

Africa

Americas

Asia

Europe

Oceania

Topics

About

Policies

Our Network

More

NASA accelerates science with gen AI-powered search

Aiming to give scientists better access to the vast quantity of science data it collects, NASA has created a Science Discovery Engine that leverages generative AI to deliver contextual results.

Restoring order

Cleared for launch

More from this author

Salesforce debuts Zero Copy Partner Network to ease data integration

Salesforce launches Einstein Copilot for general availability

10 famous AI disasters

H&R Block answers tax questions using gen AI

Show me more

IBM doubles down on hybrid cloud with $6.4B HashiCorp acquisition

DIY cloud cost management: The strategic case for building your own tools

FTC’s noncompete decision signals major shifts in IT job market ahead

CIO Leadership Live Middle East with Ahmed Wattar, Group Information Technology Director at Alfa Medical Group

CIO Leadership Live Middle East with Dr. Mohammad Alshehri, CISO and Cybersecurity Consultant

CIO Leadership Live Middle East with Wissam Al Adany, Chief Information Officer, ADES Holding

3 Leadership Tips: Renate Cuneen, Vice President, Global Corporate Technology, Canada Life

GenAI and Trust: How Companies Are Thinking About the Trustworthiness of AI and GenAI Tools

CIO Leadership Live Middle East with Dr. Mohammad Alshehri, CISO and Cybersecurity Consultant

NASA accelerates science with gen AI-powered search

Aiming to give scientists better access to the vast quantity of science data it collects, NASA has created a Science Discovery Engine that leverages generative AI to deliver contextual results.

Restoring order

Cleared for launch

Related content

M&A action is gaining momentum, are your cloud security leaders prepared?

CIOs eager to scale AI despite difficulty demonstrating ROI, survey finds

Oracle adds AI capabilities to its Fusion Cloud CX

What LinkedIn learned leveraging LLMs for its billion users

From our editors straight to your inbox

More from this author

Salesforce debuts Zero Copy Partner Network to ease data integration

Salesforce launches Einstein Copilot for general availability

10 famous AI disasters

H&R Block answers tax questions using gen AI

Show me more

IBM doubles down on hybrid cloud with $6.4B HashiCorp acquisition

DIY cloud cost management: The strategic case for building your own tools

FTC’s noncompete decision signals major shifts in IT job market ahead

CIO Leadership Live Middle East with Ahmed Wattar, Group Information Technology Director at Alfa Medical Group

CIO Leadership Live Middle East with Dr. Mohammad Alshehri, CISO and Cybersecurity Consultant

CIO Leadership Live Middle East with Wissam Al Adany, Chief Information Officer, ADES Holding

3 Leadership Tips: Renate Cuneen, Vice President, Global Corporate Technology, Canada Life

GenAI and Trust: How Companies Are Thinking About the Trustworthiness of AI and GenAI Tools

CIO Leadership Live Middle East with Dr. Mohammad Alshehri, CISO and Cybersecurity Consultant