"The most important thing for a clinician is not how fancy the stethoscope is, but how good is the substance between the ear pieces."
Dr. Proctor Harvey, renowned cardiologist at Georgetown School of Medicine
One of the most frequently asked questions I get from attendees of meetings is: What is your favorite large language model? Having extensive experience with ChatGPT models in a clinical setting, I'll first delve into their capabilities, then explore other prominent models and their specific applications in healthcare.
Introduction to ChatGPT models and their architecture
ChatGPT models are built using the transformer-based Generative Pre-trained Transformer (GPT) architecture, leveraging natural language processing (NLP) for contextually aware communication. These general-purpose Large Language Models (LLMs) are adept at various tasks such as question-answering, text summarization, translation, and coding support, all integrated into workflows via Application Programming Interfaces (APIs).
A Historical overview of the generative pre-trained transformer (GPT) series
The evolution of GPT began in 2018 with GPT-1, demonstrating foundational generative text capabilities. GPT-2, released a year later, brought improved coherence and fluency. 2020 saw the advent of GPT-3, with its impressive 175 billion parameters, achieving a higher level of text generation, translation, summarization, and coding.
GPT-3.5, released in 2022, ushered in an unprecedented level of adoption and enthusiasm due to its enhanced speed and public accessibility. In 2024, GPT-4 and its counterpart, GPT-4 Turbo (often called "GPT 4.5"), emerged as powerful models featuring strong reasoning capabilities, with the latter optimized for cost efficiency. These models boast significantly longer context windows (128k tokens), enabling them to process larger documents. Finally, GPT-4o (omni), released in April 2024, stands out as OpenAI's first multimodal GPT model, offering impressive accuracy, reasoning, and speed across text, image, and voice inputs.
GPT Model | Release Date | Knowledge Cutoff | Cost | Mode | Context Window | Speed | Accuracy | Reasoning | Best Uses | Comment |
---|---|---|---|---|---|---|---|---|---|---|
GPT-3.5 | Early 2023 | September 2021 | Free | Text only | 4-16K tokens | Fastest | Moderate | Good | Basic Q&A, Summarization, Simple tasks, Entry-level chatbots | Higher likelihood of hallucinations |
GPT-4 | March 2023 | September 2021 | $20/month via ChatGPT Plus | Text and image input | 8-32K tokens | Moderate | High | Strong | Complex analysis, Deep reasoning, Complex writing, Programming support | Slowest of the GPT models and will sunset April 2025 |
GPT-4 Turbo (aka “GPT-4.5”) | November 2023 | September 2023 | $20/month via ChatGPT Plus | Text and image input | Up to 128K tokens | Fast (Faster than GPT-4) | High | Strong | Long document analysis, Long document summarization, Advanced coding, Enterprise tasks | Excellent balance of speed and accuracy |
GPT-4o (omni) | April 2024 | Real-time | $20/month via ChatGPT Plus | Full multimodal (text, image, voice) | Up to 128K tokens | Fastest high-performance model | Highest | Superior | Human-like conversations, Real-time voice and vision tasks, Multimodal tasks, Broad-spectrum assistant functions |
Best overall reasoning with lowest hallucination rate
|
GPT-4o and its "o" model variants
The "o" variants of GPT-4o offer specialized functionalities. The o3 model is OpenAI's most powerful reasoning model, designed for complex problem-solving and capable of integrating visual inputs. The o4-mini model balances performance and efficiency with strong reasoning, faster response times, and lower computational costs. The o4-mini-high model further enhances accuracy and processing speed compared to its congener.
Guide to ChatGPT model selection and subscription plans
The choice of ChatGPT model should be tailored to the clinical task at hand:
- GPT-3.5: Efficient and fast for simple tasks like basic patient interactions and administrative duties.
- GPT-4: Excellent for complex clinical reasoning, including detailed medical analyses and nuanced clinical education.
- GPT-4 Turbo (or GPT-4.5): Best for rapid, real-time fast-paced clinical scenarios.
- GPT-4o (omni) or its variants: Preferred for multimodal clinical data aiming for comprehensive precision medical support.
ChatGPT offers various subscription plans: a Free Plan with limited access, a ChatGPT Plus package ($20/month) for priority access and faster responses, and a ChatGPT Pro membership ($200/month) providing unlimited access to advanced models, ideal for researchers and high-volume users. Team and Enterprise packages are also available.
Clinical Applications: Real-World Case Studies
In clinical settings, GPT-4o and GPT-4.5 often perform comparably for differential diagnosis and management. It's worth noting that outputs can vary with identical prompts, suggesting the benefit of multiple attempts or comparing models.
Case 1: Rare Disease Diagnosis All GPT models correctly diagnosed Alström syndrome in a 9-year-old male with dilated cardiomyopathy, visual impairment, and obesity within minutes. The "o" models provided superior rationale explanations.
Case 2: Clinical Management All GPT models correctly recommended stopping Entresto and avoiding ACE inhibitors for an 18-year-old pregnant woman with dilated cardiomyopathy, suggesting beta-blocker therapy. Response times were under 30 seconds, with no major differences observed across GPT-4o, GPT-4.5, and the three "o" models.
Beyond OpenAI: Exploring Other LLMs in Healthcare
"While OpenEvidence has strong utility as a targeted point-of-care clinical care resource, it may not be as useful as a comprehensive information tool. Due to the short length and concise focus of its responses, it does not readily provide expanded medical knowledge that is relevant to the topic, which may lead to early closure for the novice learner or tired clinician. Therefore, the clinician-educator and learner should be aware of unconscious gaps in knowledge and work together to strengthen their curiosity skills and how to ask high-yield clinical questions." - Velyn Wu, MD. Review of OpenEvidence in Family Medicine
Beyond OpenAI's portfolio, several other LLMs are gaining prominence in healthcare, each with unique strengths:
- Claude 3.7 (Sonnet): Developed by Anthropic, this hybrid reasoning model (released February 2025) excels at creating personalized treatment plans by processing diverse medical data, streamlining note summarization, and managing patient communication. It can suggest treatments and predict their effectiveness.
- MedPaLM-2/MedLM: Google's specialized LLM for healthcare (MedPaLM-2 in 2022, commercialized as MedLM in early 2024) is trained on extensive medical datasets. Available via Vertex AI, its larger model handles complex tasks, while a medium model scales across tasks. Use cases include medical research, clinical decision support, and personalized medicine. MedPaLM M is a multimodal variant interpreting clinical language, medical imaging, and genomics.
- Gemini 2.5 Pro: Google DeepMind's "thinking" model (released March 2025) is designed for complex problems, capable of reasoning through thoughts before responding. With deep multimodal reasoning and a long context window, it's ideal for building complete clinician-patient interaction pipelines, from transcription to patient summaries, serving as an excellent medical assistant.
- OpenEvidence: Launched in early 2024 by clinicians and technologists, this specialized healthcare AI focuses on real-time clinical information retrieval with over 35 million journal articles. Scoring over 90% on the USMLE, it's a retrieval-augmented generation system with a curated evidence database. While excellent for targeted answers, it's not a substitute for clinical expertise and requires reconciliation with varying methodological quality in published literature. OpenEvidence is partnered with Elsevier’s ClinicalKey AI.
- Glass AI 4.0: Developed by Glass Health (initial launch late 2023), this tool specifically assists clinicians in differential diagnosis, clinical reasoning, and structured case management. Its AI architecture incorporates knowledge graphs and structured clinical pathways. The newest version (Glass 4.0) features continuous chat, advanced reasoning, expanded medical literature coverage, and increased response speed, facilitating AI-powered clinical guidance, streamlined diagnostic decisions, and guideline-driven clinical plans.
Other notable LLMs in the biomedical domain include DeepSeek, GatorTron, Grok-3, LlaMA 4, Mistral 7B, and Perplexity, alongside AI tools like Microsoft CoPilot and Poe. These will be explored in future discussions.
Just like Formula 1 cars, these models are highly sophisticated. However, it is the clinician, in synergy with the technology, that determines the winning uses of these models.
AIMed25: leading the conversation in AI in healthcare
Healthcare Large Language Models (LLMs) will be among the popular topics covered at AIMed25, the longest-running meeting focused on artificial intelligence in medicine and healthcare (inaugurated in 2013). Attended by over 1,000 clinicians, healthcare leaders, data scientists, students, and entrepreneurs globally, the 3-day meeting covers a broad range of topics including generative AI, agentic AI, LLMs, cybersecurity, and intelligent extended reality. Special tracks this year include AI in pediatrics and neonatology, AI in health professional education, and AI and mental health of clinicians and patients. Features include breakfast workshops, afternoon subspecialty breakout sessions, an abstract competition, and a special one-day American Board of AI in Medicine (ABAIM) course. AIMed25 will also feature a special Chief AI Officer agenda.
See you there!