Earn CME at the Point-of-Care with MDCalc

Get hassle-free (no post-test!) CME from MDCalc for using point-of-care clinical calculators you know and trust.
Affiliate Disclosure

This article contains affiliate links, meaning Modern MedEd may earn a commission at no cost to you should you decide to make a purchase through such a link. This helps support our free content. Thank you!

AI as a Clinical Decision Support Tool in Psychiatry

By Jordan P. Roberts, PA-C
Updated: April 20, 2026
A conceptual medical illustration in a clean vector style. On the left, a chaotic, tangled cloud of dark charcoal lines represents confusion. These lines pass through a glowing, vertical blue glass pane acting as an interface. On the right side, the lines emerge as three smooth, organized threads in sage green, deep blue, and white. A doctor's hand enters from the right, gently touching one of the organized threads, symbolizing the clarity and insight provided by AI.
AI IN PSYCHIATRY PROMISE VS. PERIL 📱 66% of clinicians use AI tools in daily practice. ✔ Instant "Curbside Consults" ✔ Auto-CME Credits ✔ Ambient Scribing (Less Burnout) THE RELIABILITY GAP Standardized Tests vs. Real World Data 90% USMLE Pass Rate 24% Real-World Accuracy (Complex Cases) *Source: Nature (2024). Performance drops on messy data. HIDDEN BIAS Scaling Prejudice AI often suggests different treatments based on race alone. EXAMPLE (2025 Study): Black patients were more likely to be told to "reduce alcohol" for the same symptoms. ⚖️ LIABILITY WARNING AI is "Informational Only". YOU hold 100% of the risk. MODERN MEDED
AI IN PSYCHIATRY PROMISE VS. PERIL 📱 66% of clinicians use AI tools in daily practice. ✔ Instant "Curbside Consults" ✔ Auto-CME Credits ✔ Ambient Scribing (Less Burnout) THE RELIABILITY GAP Standardized Tests vs. Real World Data 90% USMLE Pass Rate 24% Real-World Accuracy (Complex Cases) *Source: Nature (2024). Performance drops on messy data. HIDDEN BIAS Scaling Prejudice AI often suggests different treatments based on race alone. EXAMPLE (2025 Study): Black patients were more likely to be told to "reduce alcohol" for the same symptoms. ⚖️ LIABILITY WARNING AI is "Informational Only". YOU hold 100% of the risk. MODERN MEDED

Want the full infographic in PDF format? Join our email list and I'll sent it over right away!

AI Clinical Decision Tools in Psychiatry

While general population's uptake of AI is high, the proportion of physicians and other clinicians using these tools is even greater, an estimated two in three, representing a 78% increase in just two years, according to the AMA. This article will focus on LLMs as clinical decision support tools, including the way they are used, how they can augment your practice, and pitfalls to beware.

Clinical decision support (CDS) systems are nothing new (see VisualDx and MDCalc), and are very useful in day to day practice. However, the way LLMs can be used to find answers and get actionable recommendations at the point of care is nothing short of amazing.

On the surface, many of these tools appear grossly similar. OpenEvidence, ReachRx, DoxGPT (which incorporates Pathway, another CDS LLM), and other medical LLMs all offer the same visual ‘front door’ into the application with an “Ask Anything” chat form.

Vendors differentiate their products from each other by (a) curated evidence bases, (b) integration into clinician workflows, (c) mechanisms to improve provenance (citation, peer-review overlays), and (d) additional tools such as ambient scribes, drug interaction checkers, and other communication features built into the platform. Another perk is that many of these are free for verified clinicians to use.

These tools differ from standard LLMs because they are strengthened with RAG, or retrieval augmented generation, a process that supplements the model’s internal knowledge with live searches of curated medical literature, reducing hallucinations and enabling responses that are anchored to identifiable primary sources.

Despite their easy availability, these tools remain very much under development. Tech companies are on a hiring spree trying to get this technology right in healthcare. PS – If you're interested, we've posted some of these AI jobs for healthcare professionals on our job board.

Want the full infographic in a high-quality PDF format? Join our email list and I'll send it over right away!

Benefits of an AI CDS Tool

If you are reading this, you are probably the kind of person who would already be using one or more of these tools.

I'm certainly not judging – I use them all the time too. Quick answers, free unlimited use, CME credits, and extra features certainly make LLMs as a clinical decision support tool pretty attractive.

Quick Answers

Does adding bupropion to Auvelity (bupropion-dextromethorphan) further inhibit the metabolism of dextromethorphan? Does lamotrigine affect bone density? What is the proposed mechanism of low dose naltrexone?

These are just a few questions I've found myself asking OpenEvidence (and other medical LLMs) in between patient visits in recent weeks. I got near instant answers that took me just a few seconds to read and understand. I verified the answers I got (they were correct, this time) at a later time, when I had a bit more breathing room to look stuff up.

No doubt you have had similar experiences.

Free & Easy Category 1 CME Credits

If you are a regular reader, you’ll know this is one of my favorite perks. Platforms like OpenEvidence (free) and VisualDx (paid) award you CME credits every time you search for and read up on a query. The process is simple and painless, and awards you the credit you deserve for your ongoing learning.

Extra Features & Tools

DoxGPT is part of the Doximity web and mobile app, which also features Doximity Dialer, Video visit platform, digital fax, and AI scribe. You are probably already on Doximity (80% of clinicians are, according to their numbers), so this is a huge advantage. Open the app on your phone and any of these tools you need are right at your fingertips.

OpenEvidence also has a HIPAA compliant dialer and AI scribe built into the web and mobile app versions. This makes it easy to use multiple tools with limited context switching, which contributes to mental fatigue. They too have an easy-to-use web app and free Android and iOS app. OpenEvidence is betting its content partnerships with top journals will increase trust (and therefore adoption). Doximity recently acquired Pathway, which was an established medical LLM that strengthens its product. It's a great medical app to add to your device. 

A Review of the Literature

We all know about hallucinations, so the question for us is: can we trust a medically fine-tuned LLM to act as a true clinical decision support tool, or is it confidently spewing out nonsense?

Reliability & Validity

This is probably the most important factor for widespread adoption of LLMs in healthcare.

A 2025 study of healthcare workers found that 40% of respondents used LLMs like ChatGPT at least weekly and 70% rated their experiences as positive. However, many healthcare providers in the study expressed concerns about the accuracy and reliability of the models.

Studies of general medicine LLMs (OpenEvidence in particular) tend to show the platform as accurate in medical scenarios, but did not substantially impact clinical decision making or modify plans, though it did tend to reinforce plans.

General purpose LLMs often perform relatively poorly in real world medical situations, despite their ability to pass the USMLE. A 2024 Nature study evaluating GPT-3.5-turbo and GPT-4-turbo found accuracy rates of 8% and 24%, respectively, when asked to provide clinical recommendations based on emergency department records.

A scoping review (an overview of a topic less targeted than a systematic review) published in 2025 identified several studies that attempted to evaluate LLM performance as a clinical decision support (CDS) tool in psychiatry.
I personally don’t think the LLMs did very well here – in one study using case based vignettes (which lack the full context of a real patient), ChatGPT 3.5 achieved a “Grade A” rating in only 61 out of 100 cases. There was no clinical validation performed on this analysis either.

Another study in this review evaluating GPT-4’s performance against community clinicians showed the model selected appropriate bipolar depression treatment in 50.8% of cases (which was slightly higher than that of community clinicians). This was another vignette-based prompt series comparing the outputs of LLMs under various conditions with expert opinion on the same vignettes.

No strong studies (and exactly zero RCTs that I could dig up) found that any LLMs improve patient outcomes, increase diagnostic accuracy or treatment selection, or reduce adverse events in psychiatric practice. Most trials evaluate LLM performance with standardized test sets, vignettes, and clinician surveys, so the strength of these findings remains up for debate. However, these are still the early days of AI. That said, none of this inspires particular confidence (or dread) that psychiatrists and therapists will be replaced by AI anytime soon.

AI for Antidepressant Selection (GPT-4) 0% 50% 100% Percentage of Cases 75% Optimal Medication Identified 50% Contraindicated/Suboptimal ALSO Suggested Source: Perlis RH. medRxiv. 2023. doi:10.1101/2023.04.14.23288595
AI vs. Humans: Bipolar Treatment 0% ~35% ~70% Appropriate Treatment Selected (%) ~25% Unaugmented AI (Base Model) ~51% Augmented AI (+ Guidelines) ~45% Community Clinicians ⚠️ WARNING: Even Augmented AI suggested contraindicated options in 1 in 10 cases. Source: Perlis RH et al. Neuropsychopharmacology 2024. doi:10.1038/s41386-024-01841-2

Bias & Stigma

Bias is not a new problem in psychiatry, but LLMs risk scaling existing biases extremely easily. Large language models are trained on enormous volumes of text, including published research, clinical guidelines, and of course, disturbingly unfiltered internet content (and have you seen the internet lately – yikes).

Even when “medically fine-tuned,” these models inherit the assumptions, omissions, and structural biases present in their training data. Multiple peer-reviewed studies demonstrate that AI tools can exhibit demographic bias in clinical contexts. In JAMA Network Open, a cross-sectional vignette study showed that AI chatbots provided different recommendations based on patient gender, race, and socioeconomic status, underscoring the risk that algorithmic outputs may reproduce or amplify inequities present in healthcare systems.

A qualitative comparison of four LLMs showed that the models often proposed inferior treatments when the vignette explicitly or implicitly indicated the patient's race was black.

Importantly, retrieval-augmented generation (RAG) does not eliminate bias. It may improve factual accuracy, but if the underlying literature is biased (and much of psychiatry’s literature historically is), the model will faithfully reproduce those biases with impressive efficiency.

There is also a bias called ‘automation bias,’ clinicians should be aware of. This is the human tendency to trust and rely upon outputs generated by automated systems (like AI tools) while undervaluing our own critical judgment. Over-reliance on these systems could lead to significant consequences in psychiatry.

Lack of Regulatory Standards & Potential Liability

At present, most clinician-facing medical LLMs exist in a regulatory gray zone.

Many vendors are careful to label their products as “informational” or “educational,” and “not medical devices,” even when they clearly function as clinical decision support systems. This allows them to avoid FDA oversight while still being marketed directly to clinicians as point-of-care tools.

From a regulatory standpoint, this places the burden of responsibility squarely on the user. There is nothing new about this, even CME programs do not assume any liability for their content.

Key issues here include:

  • Liability: If an AI-generated recommendation contributes to patient harm, responsibility almost certainly rests with the clinician, not the vendor. There is little legal precedent to suggest otherwise.
  • Transparency: Most platforms do not fully disclose their training data, fine-tuning processes, or update cadence, making independent validation difficult.
  • Version drift: Models can change behavior over time with updates, sometimes without clear documentation, raising concerns about consistency and auditability.
  • Institutional governance: Many clinicians are using these tools outside formal health system approval or oversight, particularly when accessing them via mobile apps or personal accounts.

From a medico-legal perspective, using an LLM as a reference is defensible; using it as a decision-maker is not. Until regulatory frameworks mature, clinicians should assume that AI recommendations offer zero liability protection and full personal risk.

Best Practices

1. Select tools fine-tuned with medical information

Not all LLMs are created equal. General-purpose chatbots are optimized for linguistic fluency, not clinical accuracy. If you are going to use AI as a second opinion, prioritize tools that are explicitly trained or fine-tuned on medical literature and clinical workflows. Purpose-built medical LLMs are not perfect, but they tend to hallucinate less often and anchor responses to established guidelines and literature.

That alone is reason enough to avoid using consumer-grade chatbots for clinical questions when better options exist at the same price point (free).

2. Treat outputs as suggestions, not directives

LLM outputs are effectively digital curbside consults. As we mentioned above, the buck stops with you for any decisions you make, regardless of where the suggestion came from.

3. Understand its limitations

LLMs are particularly flawed when they have incomplete or conflicting information, or when decisions are heavily influenced by the patient’s own values (risk tolerance, side effect non-negotiables, etc). The way you frame your prompt also impacts the way the model will respond.

4. Stay up to date

These tools are evolving rapidly, and their behavior may change over time as they are updated. They may become better at some tasks and worse at others. New platforms will emerge with their own benefits and weaknesses. One way to make sure you are getting updates is to join our email list, where you’ll see the latest in industry trends and jobs as it relates to healthcare professionals.

Modern MedEd Takeaway

This is one of the most promising areas of AI in my opinion. Keep an eye out for exponential improvements in this technology over the coming months and years. Try out a few medical LLMs for yourself and see how you like them. But keep your clinical reasoning sharp and don’t outsource your mind to these tools.

Subscribe to Modern MedEd

Additional Citations

American Medical Association. (2023). 2 in 3 physicians are using health AI, 78% in 2023. Retrieved from https://www.ama-assn.org/practice-management/digital-health/2-3-physicians-are-using-health-ai-78-2023

Bouguettaya, A., Stuart, E.M. & Aboujaoude, E. Racial bias in AI-mediated psychiatric diagnosis and treatment: a qualitative comparison of four large language models. npj Digit. Med. 8, 332 (2025). https://doi.org/10.1038/s41746-025-01746-4

Hua, Y., Na, H., Li, Z. et al. A scoping review of large language models for generative tasks in mental health care. npj Digit. Med. 8, 230 (2025). https://doi.org/10.1038/s41746-025-01611-4

Hurt RT, Stephenson CR, Gilman EA, et al. The Use of an Artificial Intelligence Platform OpenEvidence to Augment Clinical Decision-Making for Primary Care Physicians. J Prim Care Community Health. 2025;16:21501319251332215. doi:10.1177/21501319251332215

Kim JCai ZRChen MLSimard JFLinos E. Assessing Biases in Medical Decisions via Clinician and AI Chatbot Responses to Patient Vignettes. JAMA Netw Open. 2023;6(10):e2338050. doi:10.1001/jamanetworkopen.2023.38050

Perlis, R.H., Goldberg, J.F., Ostacher, M.J. et al. Clinical decision support for bipolar depression using large language models. Neuropsychopharmacol. 49, 1412–1416 (2024). https://doi.org/10.1038/s41386-024-01841-2

Shah, N., Pfeffer, M., Liang, P., et al. (2025). Holistic evaluation of large language models for medical applications. Stanford HAI. Retrieved from https://hai.stanford.edu/news/holistic-evaluation-of-large-language-models-for-medical-applications

Sumner J, Wang Y, Tan SY, Chew EHH, Wenjun Yip A. Perspectives and Experiences With Large Language Models in Health Care: Survey Study. J Med Internet Res. 2025;27:e67383. Published 2025 May 1. doi:10.2196/67383

Williams, C.Y.K., Miao, B.Y., Kornblith, A.E. et al. Evaluating the use of large language models to provide clinical recommendations in the Emergency Department. Nat Commun 15, 8236 (2024). https://doi.org/10.1038/s41467-024-52415-1

Alt text: Tall infographic titled Clinical Decision Support Tool. It explains how AI retrieves medical guidelines to support clinician decisions. Icons show a computer, brain, and documents. Charts display clinician trust levels, documentation time and accuracy, and survey bars comparing usefulness. A section shows potential benefits like speed and reliability. An image of a scale represents risk. A final note highlights liability concerns. The design uses blue, teal, and purple panels with simple illustrations and graphs.
Scroll to Top

My Account

OR

Quickly & Easily Find CME
Based on Your Learning Preferences

(18 modailties found)

Click the icon or title to go straight to the most relevant section of the website for that particular CME format.

Audio CME Courses

Listen to your CME while pretending you've found the secret to multitasking.

Clinical Decision Support CME

Experience next level CME with AI-powered clinical decision support tools.

CME Conferences

Miss CME meetings? Take a look at some real-life (and virtual) CME events!

CME with Gift Card Rewards

CME with gift cards and other rewards to stretch your CME allowance further than ever.

Medical Calculators that Earn CME

Earn CME credits at the point of care for using the tools you already know and trust.

Medical Spanish CME Tutoring

Take your communication to the next level and increase access to care for entire communities.

CME Subscriptions

Pay once, access CME for two or three years in dozens of specialties. Updates included.

Online CME Activities

Now that you know you don't HAVE to travel for CME anymore, why would you?

Psychedelic CME

Stay ahead of the game with the latest research in psychedelic medicine. Patients will be asking.

Procedural (Hands-On) CME

Learn essential procedures from home with the same professional materials more cost-effectively.

Question Bank CME

Engaging board-style vignettes with CME credit that adds up quickly and easily.

Travel CME

Learn about essential updates from somewhere you've never been before.

Unlimited CME Credits

Earn all the CME credits you could possibly need (or want) from one CME purchase.

Wilderness CME

Enhance your medical survival skills by practicing in real-life environments with experienced instructors.

Video CME

Visual learners have more options than ever for finding quality CME that speaks to their style.

Ski CME

Learn stuff. Ski. Repeat.
Get paid for claiming "SKI-M-E" credit.

CME Cruises

Earn CME on a river. Or earn CME credits at sea. Don't forget to claim your "SEA"-M-E.

Augmented Reality CME

Give it a few years and it'll be cool.
Just you watch.

Our CME Values: Quality Certified

We work with independent CME providers that do not accept industry funding.

ANy CME we recommend meets the following standards:

Providers not meeting these standards may be mentioned for sake of completeness or topic relevance.
Our relationship will be clearly disclosed.