PathFinder: A Multi-Modal Multi-Agent System for Medical Diagnostic Decision-Making Applied to Histopathology

1 University of Washington 2 Microsoft Research
3 David Geffen School of Medicine, UCLA 4 Allen Institute for AI
Equal Contribution

PathFinder demonstrates an AI-driven multi-agent framework for medical decision-making, showcasing how it navigates data, collects evidence, and generates interpretable diagnoses through collaborative agents.

Key Takeaways

  • Proposed PathFinder, a multi-modal, multi-agent AI framework for medical decision-making, designed to navigate complex data, collect evidence, and provide interpretable diagnoses.
  • Integrates four AI agents—Triage, Navigation, Description, and Diagnosis—collaborating for efficient and interpretable diagnostics.
  • Achieves state-of-the-art performance, trained and evaluated on the M-Path Skin Biopsy dataset with 74% accuracy, surpassing pathologists by 9% and outperforming existing AI models by 8%.
  • Enhances transparency and explainability, with AI-generated descriptions comparable to GPT-4o, supporting pathologist validation.

Abstract

Diagnosing diseases through histopathology whole slide images (WSIs) is fundamental in modern pathology but is challenged by the gigapixel scale and complexity of WSIs. Trained histopathologists overcome this challenge by navigating the WSI, looking for relevant patches, taking notes, and compiling them to produce a final holistic diagnostic. Traditional AI approaches, such as multiple instance learning and transformer-based models, fail short of such a holistic, iterative, multi-scale diagnostic procedure, limiting their adoption in the real-world. We introduce PathFinder, a multi-modal, multi-agent framework that emulates the decision-making process of expert pathologists. PathFinder integrates four AI agents—the Triage Agent, Navigation Agent, Description Agent, and Diagnosis Agent—that collaboratively navigate WSIs, gather evidence, and provide comprehensive diagnoses with natural language explanations. The Triage Agent classifies the WSI as benign or risky; if risky, the Navigation and Description Agents iteratively focus on significant regions, generating importance maps and descriptive insights of sampled patches. Finally, the Diagnosis Agent synthesizes the findings to determine the patient's diagnostic classification. Our Experiments show that PathFinder outperforms state-of-the-art methods in skin melanoma diagnosis by 8% while offering inherent explainability through natural language descriptions of diagnostically relevant patches. Qualitative analysis by pathologists shows that the Description Agent’s outputs are of high quality and comparable to GPT-4o. PathFinder is also the first AI-based system to surpass the average performance of pathologists in this challenging melanoma classification task by 9%, setting a new record for efficient, accurate, and interpretable AI-assisted diagnostics in pathology.

Method Overview

We present PathFinder, a multi-modal, multi-agent AI framework for medical decision-making. PathFinder emulates expert diagnostic workflows by integrating multiple AI agents to iteratively analyze data, collect evidence, and generate interpretable diagnoses. Overall, PathFinder operates as follows:

  • Triage Agent assesses medical data to determine initial risk classification.
  • Navigation Agent identifies and prioritizes key regions for further analysis, refining its focus iteratively.
  • Description Agent generates natural language descriptions of diagnostically relevant findings.
  • Diagnosis Agent synthesizes collected evidence to provide a final classification with an explainable decision-making process.

This agentic approach enables efficient, transparent, and human-like diagnostic reasoning, surpassing traditional AI methods in accuracy and interpretability.

PathFinder pipeline

Results

We conducted comprehensive experiments to evaluate PathFinder by examining different architectures for each agent component, achieving 74% accuracy that surpasses both human experts (65%) and previous state of the art (66% best).

M-Path diagnosis results

Majority voting performance for whole slide image (WSI) diagnosis on the M-Path dataset.

Number of Trajectory Ablations

Ablation results. We ran 10 experiments, and plotted both the mean and standard deviation.

BibTeX

@misc{ghezloo2025pathfindermultimodalmultiagentmedical,
      title={PathFinder: A Multi-Modal Multi-Agent System for Medical Diagnostic Decision-Making Applied to Histopathology}, 
      author={Fatemeh Ghezloo and Mehmet Saygin Seyfioglu and Rustin Soraki and Wisdom O. Ikezogwo and Beibin Li and Tejoram Vivekanandan and Joann G. Elmore and Ranjay Krishna and Linda Shapiro},
      year={2025},
      eprint={2502.08916},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2502.08916}, 
}