The Grand Experiment
The tech leader undertook to see whether or not they could develop an AI system that could correctly diagnose patients who present with some complaint of illness or symptomology. According to the digital version of Wired, the team at Microsoft used 304 case studies sourced from the New England Journal of Medicine to devise a test called the Sequential Diagnosis Benchmark (SDBench). A language model broke down each case into a step-by-step process that a doctor would perform in order to reach a diagnosis.
Microsoft’s researchers then used the MAI-DxO system to query several leading AI models (such as OpenAI’s GPT, Google’s Gemini, Anthropic’s Claude, Meta’s Llama, and xAI’s Grok) in a way that “loosely mimics several human experts working together,” according to Wired. The experiment demonstrated that MAI-DxO outperformed human doctors—and not just by a little. The Microsoft system achieved an accuracy rate of 80%, versus 20% for actual physicians.
Whole Lotta Shaking Goin’ On
As if the results of the Microsoft experiment were not seismic enough, the study found that the system also reduced costs by 20%. This was accomplished through the AI tool’s ability to figure out and select the less expensive tests and procedures. According to Dominic King, a Microsoft vice president, "Our model performs incredibly well, both getting to the diagnosis and getting to that diagnosis very cost effectively.”
If these results can be replicated and integrated on a mass scale in hospitals and clinics and other treatment centers, the outcomes could be tectonic. Indeed, AI is already being used in the American healthcare industry, but how far will it go and to what extent will doctors and facilities trust it? According to some, this type of approach offers new potentials, making it all the more alluring for potential customers. Here’s Wired on the new advantages of the Microsoft system:
The new Microsoft research differs from previous work in that it more accurately replicates the way human physicians diagnose disease—by analyzing symptoms, ordering tests, and performing further analysis until a diagnosis is reached. Microsoft describes the way that it combined several frontier AI models as “a path to medical superintelligence,” in a blog post about the project.
However, consumers should carefully consider potential issues involved with this project. David Sontag, a scientist at MIT and cofounder of Layer Health (a startup that builds medical AI tools), has indicated that Microsoft’s findings should be received with some measure of caution because doctors in the study were asked not to use any additional tools to help with their diagnosis. This means that the study “may not be a reflection of how they operate in real life.” Sontag adds that “it remains to be seen whether the AI system would significantly reduce costs in practice. The doctors involved in the study may have taken into account factors that the AI could not, such as a patient’s tolerance for a procedure or the availability of a particular medical instrument.”
Despite the reasonable reservations of some, AI is an unstoppable locomotive heading steadily in our direction. There is no doubt that, while these systems are not yet perfect, they are being perfected as time goes by. At some point, they may replace a good deal of what doctors do for patients currently. “Oh, Ms. Smith, the digital doctor will see you now!”