Affiliate Disclosure

This article contains affiliate links, meaning Modern MedEd may earn a commission at no cost to you should you decide to make a purchase through such a link. This helps support our free content. Thank you!

AI as a Clinical Decision Support Tool in Psychiatry

By Jordan P. Roberts, PA-C

Updated: April 20, 2026

Table of Contents

Show Infographic

Want the full infographic in PDF format? Join our email list and I'll sent it over right away!

Key Points
An estimated two in three clinicians now use AI tools, with medical-specific LLMs (like OpenEvidence and DoxGPT) utilizing Retrieval Augmented Generation (RAG) to anchor responses in cited literature.
Beyond clinical Q&A, modern platforms are increasing utility by integrating ambient scribes, drug interaction checkers, and awarding CME credit and tracking directly into the workflow.
While user satisfaction is high, actual performance varies; general-purpose models (like GPT-4) have demonstrated low accuracy in complex real-world scenarios, and no RCTs yet prove these tools improve patient outcomes.
If the underlying training data or medical literature contains historical prejudices, models can scale these biases and may recommend inferior treatments for marginalized demographics.

AI Clinical Decision Tools in Psychiatry

While general population's uptake of AI is high, the proportion of physicians and other clinicians using these tools is even greater, an estimated two in three, representing a 78% increase in just two years, according to the AMA. This article will focus on LLMs as clinical decision support tools, including the way they are used, how they can augment your practice, and pitfalls to beware.

Clinical decision support (CDS) systems are nothing new (see VisualDx and MDCalc), and are very useful in day to day practice. However, the way LLMs can be used to find answers and get actionable recommendations at the point of care is nothing short of amazing.

On the surface, many of these tools appear grossly similar. OpenEvidence, ReachRx, DoxGPT (which incorporates Pathway, another CDS LLM), and other medical LLMs all offer the same visual ‘front door’ into the application with an “Ask Anything” chat form.

Vendors differentiate their products from each other by (a) curated evidence bases, (b) integration into clinician workflows, (c) mechanisms to improve provenance (citation, peer-review overlays), and (d) additional tools such as ambient scribes, drug interaction checkers, and other communication features built into the platform. Another perk is that many of these are free for verified clinicians to use.

These tools differ from standard LLMs because they are strengthened with RAG, or retrieval augmented generation, a process that supplements the model’s internal knowledge with live searches of curated medical literature, reducing hallucinations and enabling responses that are anchored to identifiable primary sources.

Despite their easy availability, these tools remain very much under development. Tech companies are on a hiring spree trying to get this technology right in healthcare. PS – If you're interested, we've posted some of these AI jobs for healthcare professionals on our job board.

Want the full infographic in a high-quality PDF format? Join our email list and I'll send it over right away!

Benefits of an AI CDS Tool

If you are reading this, you are probably the kind of person who would already be using one or more of these tools.

I'm certainly not judging – I use them all the time too. Quick answers, free unlimited use, CME credits, and extra features certainly make LLMs as a clinical decision support tool pretty attractive.

Quick Answers

Does adding bupropion to Auvelity (bupropion-dextromethorphan) further inhibit the metabolism of dextromethorphan? Does lamotrigine affect bone density? What is the proposed mechanism of low dose naltrexone?

These are just a few questions I've found myself asking OpenEvidence (and other medical LLMs) in between patient visits in recent weeks. I got near instant answers that took me just a few seconds to read and understand. I verified the answers I got (they were correct, this time) at a later time, when I had a bit more breathing room to look stuff up.

No doubt you have had similar experiences.

Free & Easy Category 1 CME Credits

If you are a regular reader, you’ll know this is one of my favorite perks. Platforms like OpenEvidence (free) and VisualDx (paid) award you CME credits every time you search for and read up on a query. The process is simple and painless, and awards you the credit you deserve for your ongoing learning.

Extra Features & Tools

DoxGPT is part of the Doximity web and mobile app, which also features Doximity Dialer, Video visit platform, digital fax, and AI scribe. You are probably already on Doximity (80% of clinicians are, according to their numbers), so this is a huge advantage. Open the app on your phone and any of these tools you need are right at your fingertips.

OpenEvidence also has a HIPAA compliant dialer and AI scribe built into the web and mobile app versions. This makes it easy to use multiple tools with limited context switching, which contributes to mental fatigue. They too have an easy-to-use web app and free Android and iOS app. OpenEvidence is betting its content partnerships with top journals will increase trust (and therefore adoption). Doximity recently acquired Pathway, which was an established medical LLM that strengthens its product. It's a great medical app to add to your device.

A Review of the Literature

We all know about hallucinations, so the question for us is: can we trust a medically fine-tuned LLM to act as a true clinical decision support tool, or is it confidently spewing out nonsense?

Reliability & Validity

This is probably the most important factor for widespread adoption of LLMs in healthcare.

A 2025 study of healthcare workers found that 40% of respondents used LLMs like ChatGPT at least weekly and 70% rated their experiences as positive. However, many healthcare providers in the study expressed concerns about the accuracy and reliability of the models.

Studies of general medicine LLMs (OpenEvidence in particular) tend to show the platform as accurate in medical scenarios, but did not substantially impact clinical decision making or modify plans, though it did tend to reinforce plans.

General purpose LLMs often perform relatively poorly in real world medical situations, despite their ability to pass the USMLE. A 2024 Nature study evaluating GPT-3.5-turbo and GPT-4-turbo found accuracy rates of 8% and 24%, respectively, when asked to provide clinical recommendations based on emergency department records.

A scoping review (an overview of a topic less targeted than a systematic review) published in 2025 identified several studies that attempted to evaluate LLM performance as a clinical decision support (CDS) tool in psychiatry.
I personally don’t think the LLMs did very well here – in one study using case based vignettes (which lack the full context of a real patient), ChatGPT 3.5 achieved a “Grade A” rating in only 61 out of 100 cases. There was no clinical validation performed on this analysis either.

Another study in this review evaluating GPT-4’s performance against community clinicians showed the model selected appropriate bipolar depression treatment in 50.8% of cases (which was slightly higher than that of community clinicians). This was another vignette-based prompt series comparing the outputs of LLMs under various conditions with expert opinion on the same vignettes.

No strong studies (and exactly zero RCTs that I could dig up) found that any LLMs improve patient outcomes, increase diagnostic accuracy or treatment selection, or reduce adverse events in psychiatric practice. Most trials evaluate LLM performance with standardized test sets, vignettes, and clinician surveys, so the strength of these findings remains up for debate. However, these are still the early days of AI. That said, none of this inspires particular confidence (or dread) that psychiatrists and therapists will be replaced by AI anytime soon.

Bias & Stigma

Bias is not a new problem in psychiatry, but LLMs risk scaling existing biases extremely easily. Large language models are trained on enormous volumes of text, including published research, clinical guidelines, and of course, disturbingly unfiltered internet content (and have you seen the internet lately – yikes).

Even when “medically fine-tuned,” these models inherit the assumptions, omissions, and structural biases present in their training data. Multiple peer-reviewed studies demonstrate that AI tools can exhibit demographic bias in clinical contexts. In JAMA Network Open, a cross-sectional vignette study showed that AI chatbots provided different recommendations based on patient gender, race, and socioeconomic status, underscoring the risk that algorithmic outputs may reproduce or amplify inequities present in healthcare systems.

A qualitative comparison of four LLMs showed that the models often proposed inferior treatments when the vignette explicitly or implicitly indicated the patient's race was black.

Importantly, retrieval-augmented generation (RAG) does not eliminate bias. It may improve factual accuracy, but if the underlying literature is biased (and much of psychiatry’s literature historically is), the model will faithfully reproduce those biases with impressive efficiency.

There is also a bias called ‘automation bias,’ clinicians should be aware of. This is the human tendency to trust and rely upon outputs generated by automated systems (like AI tools) while undervaluing our own critical judgment. Over-reliance on these systems could lead to significant consequences in psychiatry.

Lack of Regulatory Standards & Potential Liability

At present, most clinician-facing medical LLMs exist in a regulatory gray zone.

Many vendors are careful to label their products as “informational” or “educational,” and “not medical devices,” even when they clearly function as clinical decision support systems. This allows them to avoid FDA oversight while still being marketed directly to clinicians as point-of-care tools.

From a regulatory standpoint, this places the burden of responsibility squarely on the user. There is nothing new about this, even CME programs do not assume any liability for their content.

Key issues here include:

Liability: If an AI-generated recommendation contributes to patient harm, responsibility almost certainly rests with the clinician, not the vendor. There is little legal precedent to suggest otherwise.
Transparency: Most platforms do not fully disclose their training data, fine-tuning processes, or update cadence, making independent validation difficult.
Version drift: Models can change behavior over time with updates, sometimes without clear documentation, raising concerns about consistency and auditability.
Institutional governance: Many clinicians are using these tools outside formal health system approval or oversight, particularly when accessing them via mobile apps or personal accounts.

From a medico-legal perspective, using an LLM as a reference is defensible; using it as a decision-maker is not. Until regulatory frameworks mature, clinicians should assume that AI recommendations offer zero liability protection and full personal risk.

Best Practices

1. Select tools fine-tuned with medical information

Not all LLMs are created equal. General-purpose chatbots are optimized for linguistic fluency, not clinical accuracy. If you are going to use AI as a second opinion, prioritize tools that are explicitly trained or fine-tuned on medical literature and clinical workflows. Purpose-built medical LLMs are not perfect, but they tend to hallucinate less often and anchor responses to established guidelines and literature.

That alone is reason enough to avoid using consumer-grade chatbots for clinical questions when better options exist at the same price point (free).

2. Treat outputs as suggestions, not directives

LLM outputs are effectively digital curbside consults. As we mentioned above, the buck stops with you for any decisions you make, regardless of where the suggestion came from.

3. Understand its limitations

LLMs are particularly flawed when they have incomplete or conflicting information, or when decisions are heavily influenced by the patient’s own values (risk tolerance, side effect non-negotiables, etc). The way you frame your prompt also impacts the way the model will respond.

4. Stay up to date

These tools are evolving rapidly, and their behavior may change over time as they are updated. They may become better at some tasks and worse at others. New platforms will emerge with their own benefits and weaknesses. One way to make sure you are getting updates is to join our email list, where you’ll see the latest in industry trends and jobs as it relates to healthcare professionals.

Modern MedEd Takeaway

This is one of the most promising areas of AI in my opinion. Keep an eye out for exponential improvements in this technology over the coming months and years. Try out a few medical LLMs for yourself and see how you like them. But keep your clinical reasoning sharp and don’t outsource your mind to these tools.

Subscribe to Modern MedEd

First name

Last name

Clinician Type

Specialty

Additional Citations

American Medical Association. (2023). 2 in 3 physicians are using health AI, 78% in 2023. Retrieved from https://www.ama-assn.org/practice-management/digital-health/2-3-physicians-are-using-health-ai-78-2023

Bouguettaya, A., Stuart, E.M. & Aboujaoude, E. Racial bias in AI-mediated psychiatric diagnosis and treatment: a qualitative comparison of four large language models. npj Digit. Med. 8, 332 (2025). https://doi.org/10.1038/s41746-025-01746-4

Hua, Y., Na, H., Li, Z. et al. A scoping review of large language models for generative tasks in mental health care. npj Digit. Med. 8, 230 (2025). https://doi.org/10.1038/s41746-025-01611-4

Hurt RT, Stephenson CR, Gilman EA, et al. The Use of an Artificial Intelligence Platform OpenEvidence to Augment Clinical Decision-Making for Primary Care Physicians. J Prim Care Community Health. 2025;16:21501319251332215. doi:10.1177/21501319251332215

Kim J, Cai ZR, Chen ML, Simard JF, Linos E. Assessing Biases in Medical Decisions via Clinician and AI Chatbot Responses to Patient Vignettes. JAMA Netw Open. 2023;6(10):e2338050. doi:10.1001/jamanetworkopen.2023.38050

Perlis, R.H., Goldberg, J.F., Ostacher, M.J. et al. Clinical decision support for bipolar depression using large language models. Neuropsychopharmacol. 49, 1412–1416 (2024). https://doi.org/10.1038/s41386-024-01841-2

Shah, N., Pfeffer, M., Liang, P., et al. (2025). Holistic evaluation of large language models for medical applications. Stanford HAI. Retrieved from https://hai.stanford.edu/news/holistic-evaluation-of-large-language-models-for-medical-applications

Sumner J, Wang Y, Tan SY, Chew EHH, Wenjun Yip A. Perspectives and Experiences With Large Language Models in Health Care: Survey Study. J Med Internet Res. 2025;27:e67383. Published 2025 May 1. doi:10.2196/67383

Williams, C.Y.K., Miao, B.Y., Kornblith, A.E. et al. Evaluating the use of large language models to provide clinical recommendations in the Emergency Department. Nat Commun 15, 8236 (2024). https://doi.org/10.1038/s41467-024-52415-1

Stay Clinically Meaningful

First name

Last name

Clinician Type

Specialty

Find the Right CME Fast

Find specialty-grade CME in psychiatry, dermatology, primary care, and more

Browse CME by Specialty

CME Search Filters

Search CME by topic, feature, etc

🔎

Specialty

🩺

Price

💲

Credits

🥇

Provider

If you are human, leave this field blank.

Featured Jobs

CME Conferences

Search live, in-person, and virtual CME conferences across the country here.

Find a Conference

CME Conference Filters

Keyword Search

Specialty

Format

Location

View Job Details »

June 8, 2026

Healthcare operations Evaluator

View Job Details »

June 8, 2026

Clinical / biomedical / pharma Evaluator

View Job Details »

June 8, 2026

Clinician wondering what happened to CME with gift card offers

WTF Happened to CME with Gift Cards?

June 26, 2026 No Comments

ACCME finally banned CME with gift card offers for good in April of 2026. Here’s what you need to know to continue to make the most of your CME money.

A conceptual image depicting a Physician Associate (PA) observing a wall of framed degrees appearing to have been rearranged, symbolizing the impact of the DOE's degree reclassification proposal on the PA profession and its workforce.

If the Degree Isn’t ‘Professional,’ Will the Professionals Stay?

January 14, 2026 No Comments

A new DOE proposal threatens to strip Physician Associate (PA) degrees of ‘professional’ status. See the surprising survey results on the impact to student loans, rising professional frustration, and the threat to healthcare’s future.

AI in Mental Health: LLMs and Therapy

January 6, 2026 No Comments

Explore the rise of LLMs in mental health. We’ve summarized the latest data on LLM-based chatbots’ efficacy, privacy risks, and “AI psychosis.”

A conceptual medical illustration in a clean vector style. On the left, a chaotic, tangled cloud of dark charcoal lines represents confusion. These lines pass through a glowing, vertical blue glass pane acting as an interface. On the right side, the lines emerge as three smooth, organized threads in sage green, deep blue, and white. A doctor's hand enters from the right, gently touching one of the organized threads, symbolizing the clarity and insight provided by AI.

AI as a Clinical Decision Support Tool in Psychiatry

December 30, 2025 No Comments

With 66% of clinicians now using AI tools, the ‘digital consult’ is here. But can an algorithm understand the nuance of psychiatric care? We review the latest data on Medical LLMs—weighing the benefits of instant knowledge retrieval against the serious risks of hallucinations, racial bias in treatment recommendations, and the ‘regulatory gray zone’ of liability. Read the modern psychiatrist’s guide to keeping the human in the loop.

An hourglass with currency icons and a credit card in the top section filtering down through the middle and emerging as textbooks, badges, and certificates representing medical education and the sense that CME stipends are expiring

How to Spend Leftover CME Money Before It Expires

December 21, 2025 No Comments

Don’t let your CME allowance go down with the fiscal year. Use it on smart, legit, and maybe even gift-card-earning CME before it expires. Here’s how.

A conceptual medical illustration in a minimalist, high-end vector style against a clean white background. On the left, a stylized glowing tablet showing a survey interface with checkmarks, connected to a stethoscope icon. Data streams flow from the tablet towards the center. On the right, these data streams converge and materialize into a rising stack of stylized, glowing reward coins and a large upward-trending arrow graph

Paid Medical Surveys: 9 Most Lucrative Panels to Join

November 19, 2025 8 Comments

Taking paid medical surveys in your area of expertise is an easy way to earn some extra cash. Joining the right paid physician survey panels is not as easy.

AI as a Clinical Decision Support Tool in Psychiatry

AI Clinical Decision Tools in Psychiatry

Benefits of an AI CDS Tool

Quick Answers

Free & Easy Category 1 CME Credits

Extra Features & Tools

A Review of the Literature

Reliability & Validity

Bias & Stigma

Lack of Regulatory Standards & Potential Liability

Best Practices

1. Select tools fine-tuned with medical information

2. Treat outputs as suggestions, not directives

3. Understand its limitations

4. Stay up to date

Modern MedEd Takeaway

Additional Citations

My Account

Quickly & Easily Find CME Based on Your Learning Preferences

(18 modailties found)

Our CME Values: Quality Certified

ANy CME we recommend meets the following standards:

CME

Recently Upgraded (free) Tools

New Resource Updates

Latest Insights

Quickly & Easily Find CME
Based on Your Learning Preferences