Skip to Main Content

Accelerating medicine’s AI race, Google is releasing a version of its generative language model to health care customers who will begin testing its ability to perform specific tasks in medical and research settings, STAT has learned.

The AI tool, known as Med-Palm 2, will be distributed to a select group of Google’s cloud computing customers over the next several months, with the goal of assessing its ability to accurately and safely sift through and summarize vast stores of medical information.

advertisement

The move will intensify competition with GPT-4, the model built by OpenAI that has triggered a flood of speculation about the ability of generative AI tools to help workers in many industries, including health care, do their jobs faster and better. Microsoft, a major investor in OpenAI, unveiled plans Wednesday to embed generative AI tools into its own health care computing services.

In a field that takes a deliberative approach to innovation, the effort to seize on the rapid advances in technology is at once exciting and unsettling, health AI experts said.

“It’s hard to keep track of the time scales here — ChatGPT came out in December,” said Andrew Beam, an assistant professor of biomedical informatics at Harvard Medical School. Even though research has accelerated rapidly since then, he said, there’s a need for deeper evaluations to understand the problems presented by these tools, and not just their possibilities.

advertisement

“We need lots of people kicking the tires on these models to understand when they’re safe to use, when they work well and when they don’t work well,” Beam said.

Google’s limited release of Med-Palm is its first tentative step in that direction. Unlike ChatGPT, the model is specifically trained on a health care vocabulary and designed to work in that domain. In the limited testing phase, a subset of hospitals, health plans, drug companies and other organizations will be allowed to experiment with the tool on various tasks. Google executives said provisions will be made, in tightly controlled circumstances, to allow customers to expose the AI to patients’ private health information. That will allow them to assess its utility on the type of data that clinicians routinely use in real health care settings.

But the executives said they also expect the experimentation to be far broader and enable a variety of users to leverage publicly-available data in new ways. “We don’t come to the table with some preconceived notion of what are the highest-value and best use cases for a piece of technology,” said Greg Corrado, a senior research scientist at Google. “We have ideas, but it’s really their business to understand what are the first things we should work on. We want to move as fast as is safely possible.”

Google declined to name the organizations that will be involved in the testing program, but emphasized that they will be selected to assess the model’s usefulness on a range of tasks. The possible uses of the technology in health care are vast. It could be used to analyze disparate data to help diagnose patients with complex diseases, or match patients to certain treatments or clinical trials. It could help clinicians with a multitude of administrative tasks that now must be done manually, including filling out medical records, summarizing medical evidence, and writing reports when patients receive test results or get discharged from the hospital.

If it can do some or all of those things reliably, Med-Palm and its AI peers stand to significantly change health care, and maybe even make its providers and researchers more effective and less susceptible to human weaknesses such as fatigue, bias, and distraction. But Google’s executives emphasized that achieving such ambitious goals is a long and uncertain road that starts — not ends — with demonstrations such as answering questions drawn from medical board exams.

“In the health care setting, there are still many unknowns with generative AI,” said Aashima Gupta, global director of health care at Google Cloud. “For the broader rollout to work, all these questions need to be answered in the enterprise setting,” she said, referring to the business and clinical environments where users will test the model’s reliability and usefulness. Gupta said the generative AI tooling will be made available through the company’s cloud computing offerings, with the results of early testing to inform the development of specific products.

Part of the effort to assess the model will also involve developing standard datasets on which Google’s AI could be compared with GPT and other models. Corrado emphasized that Google will seek to support such independent inquiries and exploration.  “We’re not going to stop the science,” he said. “My view is that this is going remain a wide open and rapidly evolving field for years to come, and I hope the things that are genuine scientific innovations will continue to show up and be shared.”

As the tools get rolled out for commercial uses, experts said transparency is especially important when it comes to using AI to automate services.

“The thing that worries me most is that there is no actual accountability for these instruments,” said Howard Forman, a radiologist and professor of health care economics at the Yale School of Management. He noted that generative AI tools have been shown to make up information or even cite non-existent sources in medical literature —  the kind of errors that, if acted upon, could cause patients to be physically or financially harmed.

That’s what makes it imperative to rigorously and openly evaluate the tools not just within corporate computing environments, but in peer-reviewed publications where findings can be questioned and tested.

“If we start to rely on machines to be creating text without reference, it starts to become worrisome,” Forman said. “You have to ask, what is the basis for this? I would be really cautious about using this if nobody can be held accountable for it.”

This story is part of a series examining the use of artificial intelligence in health care and practices for exchanging and analyzing patient data. It is supported with funding from the Gordon and Betty Moore Foundation.

To submit a correction request, please visit our Contact Us page.