Agentic AI in Medicine: How Autonomous AI Is Moving Beyond Diagnosis

Artificial intelligence has already demonstrated its value in narrow healthcare applications, from medical imaging analysis to clinical documentation support. However, a new generation of AI systems is beginning to move beyond these isolated tasks. Recent research published in Nature highlights how agentic AI may soon play a much larger role in patient care by handling everything from diagnosis and treatment recommendations to long-term management planning.

Two newly introduced systems, MIRA and AIME, offer a glimpse into what this next phase of medical AI could look like. While both remain research tools operating in simulated environments, they represent a significant step toward autonomous clinical decision-making.

Moving Beyond Diagnostic Assistance

Most healthcare AI systems today function as support tools. They help physicians identify abnormalities, summarize records, or answer clinical questions, but the final responsibility for patient management remains with human clinicians.

MIRA and AIME take a different approach. Rather than assisting with a single task, both systems were designed to manage larger portions of the patient journey.

MIRA focused on emergency medicine. The system gathered patient histories, reviewed physical exam findings, ordered diagnostic tests, recommended medications, selected procedures, and determined whether patients should be admitted to the hospital. Researchers evaluated the model using 500 real emergency department cases and compared its performance against board-certified physicians.

AIME focused on outpatient care. The system followed patients across three separate visits, developing management plans and adjusting recommendations over time. Unlike traditional diagnostic models, AIME emphasized longitudinal care and treatment planning rather than a single clinical encounter.

Together, these studies demonstrate how agentic AI is expanding beyond diagnosis and entering areas traditionally reserved for physicians.

MIRA’s Performance in Emergency Care

Among the most notable findings was MIRA’s diagnostic accuracy.

Across the study population, MIRA achieved an overall diagnostic accuracy of 87.8%, compared with 78.1% among board-certified physicians. The difference was particularly striking in conditions such as appendicitis and pancreatitis, where the system frequently matched or exceeded physician performance.

Researchers also found that MIRA aligned with clinical guidelines more consistently than physicians. The model selected appropriate medications with a reported accuracy of 99.8% while maintaining strong performance in areas such as fluid management and procedural recommendations.

Interestingly, MIRA ordered more laboratory tests but fewer imaging studies. The authors suggested that the absence of financial incentives may have contributed to these differences in resource utilization.

The system also demonstrated resilience when exposed to adversarial prompts, language barriers, patient anxiety, and other challenging interactions.

AIME and Longitudinal Patient Management

While MIRA focused on acute care, AIME addressed a different challenge: ongoing patient management.

The system used two distinct AI agents. One interacted conversationally with patients, while the second performed deeper clinical reasoning and developed treatment plans. Researchers evaluated AIME across three outpatient visits spanning multiple medical specialties.

Overall, the system performed at a level comparable to board-certified primary care physicians. In several categories, including treatment precision and guideline adherence, physician reviewers rated AIME’s recommendations more favorably than those generated by clinicians.

AIME also excelled in medication management. Using a newly developed benchmark called RxQA, the system demonstrated strong performance in selecting appropriate medications, dosing strategies, treatment duration, and follow-up recommendations.

One of the most notable aspects of the study was the use of hundreds of fully integrated clinical guidelines to support decision-making. This allowed the model to provide highly detailed and structured treatment plans.

The Limits of Today’s Medical AI

Despite these impressive results, both studies have important limitations.

Neither MIRA nor AIME operated in real clinical environments. Both relied entirely on text-based interactions and did not incorporate factors such as body language, tone of voice, medical imaging interpretation, or many of the subtle cues physicians use when evaluating patients.

The datasets were also relatively controlled. Real-world medicine often involves incomplete information, conflicting histories, communication barriers, and complex social factors that are difficult to simulate.

In AIME’s case, patient interactions occurred over only three visits separated by a few days, which differs substantially from the realities of outpatient healthcare.

These limitations mean that the findings should be viewed as promising research rather than evidence that autonomous AI is ready to replace clinicians.

The Challenge of Guideline-Driven Care

One of the most interesting findings from both studies was the systems’ exceptional adherence to clinical guidelines.

At first glance, this appears to be a major advantage. Both models consistently generated highly structured recommendations and demonstrated strong alignment with established standards of care.

However, medicine is not simply the application of guidelines.

Clinical decision-making often requires consideration of patient preferences, financial constraints, prior experiences, fears, and individual circumstances. Many guidelines themselves are based on expert consensus rather than definitive evidence.

As a result, perfect guideline adherence may not always translate into optimal patient care. The art of medicine lies in balancing evidence-based recommendations with the unique needs of each individual patient.

What Comes Next for Agentic AI in Medicine?

The systems evaluated in these studies represent only an early stage of development.

Future agentic AI platforms may incorporate specialized agents dedicated to laboratory testing, medical imaging, wearable sensors, genomics, environmental exposures, and other clinical domains. Rather than relying on a single model, healthcare AI could evolve into networks of collaborating agents that collectively support patient care.

Importantly, the language models used in these studies are already being surpassed by newer generations of AI systems. Continued improvements in reasoning, memory, multimodal analysis, and clinical integration are likely to expand capabilities even further.

The key question is no longer whether AI can support medical decision-making. Increasingly, the question is how much of the clinical workflow AI may eventually be able to manage.

A Future Built on Collaboration

While MIRA and AIME suggest that agentic AI can improve diagnosis, treatment planning, and communication, the future of healthcare will likely depend on collaboration rather than replacement.

The most important studies have yet to be conducted. Researchers will ultimately need to compare three approaches: AI-only care, physician-only care, and hybrid physician-AI models.

For now, these systems offer an early look at how autonomous AI may reshape medicine. Although real-world adoption remains years away, the trajectory is becoming increasingly clear. Agentic AI is evolving from a clinical assistant into a potential partner in patient care.