There are more AI health tools than ever—but how well do they work?

Foto: MIT Tech Review
More than 900 artificial intelligence-based medical devices have already received FDA certification; however, the dynamic growth of new tools is not matched by transparency regarding their actual effectiveness. While these technologies promise a revolution in diagnostics and patient care, the medical community is raising alarms over the lack of unified standards for evaluating their performance in clinical settings. The primary issue remains that many algorithms are trained on limited datasets, which can lead to errors when analyzing cases of individuals from different ethnic backgrounds or lifestyles. For global users, this poses the risk of utilizing solutions that perform excellently in laboratory tests but fail in a diverse reality. In response to these challenges, initiatives such as the Coalition for Health AI (CHAI) are emerging, aiming to create a network of independent testing laboratories. These are intended to verify tools for safety and bias. The introduction of rigorous post-market surveillance requirements is becoming crucial for building trust. Today, the development of digital health requires not only innovative algorithms but, above all, evidence that the technology actually improves treatment outcomes rather than generating additional information noise. Effective validation is the only way for AI to cease being a technological curiosity and become a cornerstone of modern medicine.
In recent weeks, technology giants have made a sharp turn toward digital medicine, challenging traditional healthcare models. Earlier this month, Microsoft launched Copilot Health — a new, dedicated space within the Copilot app that allows users to connect their medical records directly to the AI system. Just a few days earlier, Amazon announced that its proprietary Health AI tool, based on large language models (LLM) and previously reserved exclusively for One Medical members, would be made available to a wider audience.
However, this sudden offensive raises a fundamental question: are algorithms ready to serve as medical advisors? While the promise of instant access to test result analysis is tempting, the tech industry is still grappling with the issue of accuracy and patient safety in the face of hallucinations from generative models.
Microsoft and Amazon's Ecosystem Enters the Doctor's Office
The strategy of Microsoft with Copilot Health is based on the integration of scattered data. After consenting to access their records, a user can ask the system specific questions regarding their treatment history, laboratory results, or post-operative recommendations. This is an attempt to jump the barrier previously formed by hermetic medical language — AI is intended to act as a translator that can search hundreds of pages of documentation in seconds to answer whether a given blood parameter has remained within the norm over recent years.
Read also
On the other hand, Amazon with its Health AI focuses on a hybrid model. This tool, originating from the One Medical ecosystem, utilizes LLM models to support patients in navigating the healthcare system. Key features of these solutions include:
- Direct integration with Electronic Health Records (EHR).
- The ability to generate summaries of medical visits in natural language.
- Automatic monitoring of trends in patient test results.
- Assistance in scheduling appointments and managing prescriptions.
Reliability of Algorithms Under Expert Scrutiny
Despite the impressive pace of implementing new features, the medical community remains skeptical about the autonomy of these tools. The main problem remains the fact that the LLM models on which Copilot Health and Health AI are based do not "understand" medicine in a biological sense — they merely predict the most probable sequence of words based on training data. In a medical context, an error in interpreting units of measurement or missing a critical drug interaction could have catastrophic consequences.
Analysts point out that existing AI tools in health have performed best in narrow specialties, such as radiology or histopathological image analysis, where the algorithm has a specific, visual task. Attempting to create a general "health assistant" that handles the entire complexity of the human body is a completely different scale of difficulty. The challenge lies in ensuring these systems can admit to a lack of knowledge instead of generating convincing-sounding but incorrect advice.
Medical Data as the New Fuel for AI
The entry of LLM technology into the health sphere also changes how we perceive data privacy. For Copilot Health to be effective, it must operate on the most sensitive patient information. Although Microsoft and Amazon declare the highest security standards and compliance with regulations such as HIPAA, the scale of data collected raises concerns about its secondary use.
Own analysis suggests that we are facing a new paradigm of "democratization of medical data." Until now, the patient was a passive recipient of information, often not understanding their own documentation. Tools like Health AI provide them with a tool for control, but simultaneously shift the responsibility for verifying what the algorithm suggests onto them. The risk is that users may begin to trust AI more than doctors, especially in regions where access to specialists is difficult or expensive.
Technological Limitations and the Trust Barrier
The current generation of AI health assistants still struggles with a lack of situational context. An algorithm sees a dry test result but does not know about the patient's lifestyle, their stress levels, or minor symptoms that are crucial for an experienced doctor during an interview. Therefore, even though these tools are "more accessible than ever," their role should remain purely auxiliary — as an interface to data, not a source of diagnosis.
Key limitations of current AI systems in medicine include:
- Propensity for hallucinations: Creating non-existent medical facts or misinterpreting laboratory norms.
- Lack of empathy and ethics: AI cannot convey difficult information in a way tailored to the patient's psychological state.
- Dependence on data quality: If the medical records in the system are incomplete or contain errors, the AI will duplicate these errors in its analyses.
The introduction of Copilot Health and the expansion of Amazon tool availability is a breakthrough moment that will permanently change the patient-technology relationship. However, the true test for these solutions will not be the number of users, but their ability to actually improve treatment outcomes without generating additional risk. The industry must develop "safety valve" mechanisms that will correct model responses in real-time before they reach the patient. In the near future, we will witness a transformation in which AI becomes an indispensable, though still requiring supervision, filter between humans and the complicated world of medicine.






