Angling a Babel Fish: the Evolving Role of AI Language Models in Medical Interpretation

April 17, 2024

 By Eric Hu

Peer Reviewed

“May I have the patient’s first name, last name, and medical record number?” The fellow, shooting a piercing glance at the interpreter through the wall-mounted phone, broadcasted her annoyance at the situation with a subtle head shake and near-imperceptible eye roll. The last call had inexplicably dropped, and she was now displeased at the amount of time the visit was taking. Twenty minutes later, as our patient left the clinic, she spilled out a sentiment shared among many of the providers I’ve worked with throughout the past year: “I wish my French were better… I hate calling the interpreter.” Thinking back on the time I waited an hour on the line for a Tagalog interpreter who never answered, sympathizing was no challenge.

At a hospital like Bellevue, serving patients from every corner of the globe, navigating complex language barriers is vital to providing clear and accurate information, allowing patients to understand their diagnoses and make informed treatment decisions. The implications of language barriers are immense: in the US, patients with limited English proficiency are significantly more likely to experience adverse events constituting physical harm throughout their treatment as a consequence of communication errors.1,2 Medical interpreters help not only bridge the language gap, but also enrich the provider-patient relationship through cultural context; in so doing, trained professional interpreters have been shown to improve patient satisfaction, quality of care, and risk of adverse events.3 Moreover, the benefits of interpreters are reflected in patients’ clinical outcomes, with shorter hospital stays and a lower risk of readmission.4 Even beyond direct patient interaction, professional translation of medical documents, including lab results and discharge instructions, can improve patient adherence and safety outside the hospital.5

In 2023, providers have more options than ever for on-demand interpretation, including hiring dedicated in-house staff, contracting video and phone services, and recruiting multilingual hospital personnel to serve as ad-hoc interpreters.6 Despite this, choosing an interpreter remains an imperfect art, a delicate balance of competing priorities. While in-person interpreters consistently rank highly in satisfaction and efficacy,7 they become prohibitively expensive if used in every encounter.6 Video-based interpreters are a decent stand-in, although they need a stable high-speed internet connection and require that providers remain aware of the visuospatial constraints inherent in their setup.8 Phone services provide a low-cost alternative and are the most widely used; however, challenges include interpreter difficulty with communicating educational and psychosocial concepts sans visual feedback,9 variabilities in call quality,10,11 communication issues (e.g., overlapping speech),10,11 and the limited availability of uncommon languages.12 Additionally, delays in interpreter access represent a significant issue.11  A 2021 study demonstrated that phone services delay transfers to the ICU by a median of 18 minutes compared to in-person interpreters and 35 minutes compared to English-speaking patients13 – vitally important minutes in both the figurative and literal sense. Lastly, hospital staff may be insufficiently trained in the formal, structured use of interpreter services and unaware of best practices; these provider knowledge gaps result in miscommunication and overestimation of patients’ understanding.14 Taken as a whole, all these limitations beg the question: how can we improve the cost efficiency, ease of use, reliability, and on-demand accessibility of medical interpreters while maintaining their clinical benefit?

Artificial intelligence has thundered into the modern world like an unstoppable tempest, reshaping industries, supercharging innovation, and propelling humanity into an era of unparalleled technological marvels!

In line with that ChatGPT response to my query for a “short, dramatic, hyperbolic sentence,” the 2020s are shaping up to be the decade of AI. Driven largely by the unveiling of OpenAI’s clever chatbot, public interest in AI and natural language processing has skyrocketed in the past year. GPT-3.5/4, the beating heart of ChatGPT, sits aside Google’s PaLM and Meta’s LLaMa as representatives of the large language model (LLM)–self-training AI systems developed through the analysis of billions of words of human-generated text scrubbed from the Internet. Through this process, these systems set out to mathematically encode the complex contextual relationships between words in a multitude of languages, allowing them to predict and generate natural language.15-17 As a consequence, among the diverse skills on its résumé, GPT is capable of machine translation between the over 80 languages in its training dataset and will continue to improve as more users interact with it.18 Furthermore, the capability of LLMs to respond naturally to interactive prompting allows users to fine-tune translated outputs by providing clinical and cultural context; the proper use of such prompting to improve translation, especially for low-resource languages, is one of many areas of active research.19,20 Additionally, the role of AI-assisted translation extends beyond that of traditional interpreters: LLMs have demonstrated strengths in simplifying medical consent forms and eliminating jargon, thereby improving patient understanding even among English speakers.21

At the time of writing, leveraging generative AI in medical workflows is no longer a mere concept. Epic Systems has already begun working with Microsoft to integrate GPT-4 into its electronic medical record system, where it will provide automated note-writing suggestions and templates for improving patient communication.22 However, as AI technology progresses rapidly, it is important to remain aware of its limitations. Language models are perfectly capable of spitting out false information that sounds correct, a phenomenon known as hallucination; generated information must be thoroughly double-checked before being used in clinical decisions.23 Since LLMs are trained off publicly available data, misinformation can disseminate in AI-generated outputs, something critics have termed an “infodemic.”23,24 Additionally, any model trained off a dataset perpetuates the biases within, and given that many of the major LLMs on the market today are proprietary, ethical debates abound regarding the responsibility of software companies to moderate content and their duty to make the details of such moderation more transparent.23 Lastly, care must be taken to ensure that all use complies with Health Insurance Portability and Accountability Act (HIPAA) regulations and that protected health information does not leak into training datasets.23

Given these limitations, it’s clear that for the foreseeable future, AI likely won’t and probably shouldn’t act independently of human interpreters. Nevertheless, remote interpretation services offer one frontier within which we can conceptualize the potential evolution of AI interpreters. Only a month ago, OpenAI rolled out an update of ChatGPT with speech/image recognition and speech generation technologies, demonstrating the rapid acceleration of generative AI’s multimodal capabilities.25 Today, it’s easy to imagine the silent integration of multimodal AI into the workflow of remote human interpreters, who could receive translation suggestions informed by the large knowledge base of an LLM without changing the experience of providers and patients on the other end of the call. As those AI programs continue to refine their speech recognition and translation abilities through self-training, they could eventually get a promotion to become the primary interpreters, providing real-time translations over a tablet or smartphone, which would be occasionally monitored and adjusted remotely by human operators. While the operator could still step in when needed, the use of a digital avatar and synthetic voice would standardize the patient interface, addressing communication challenges posed by variabilities in call quality and fluency among humans. Providing an on-screen text transcript would assist those hard of hearing while allowing both provider and patient to identify errors with speech recognition. The ubiquitous nature of these automated in-house interpreters could potentially reduce delays in interpreter access, improve costs, and increase the availability of low-resource languages, all while maintaining the vital role of human interpreters.

In Douglas Adams’s sci-fi epic The Hitchhiker’s Guide to the Galaxy, he conceived of the Babel fish, an enigmatic species uniquely symbiotic with all others which, when inserted into the ear, allowed one to understand every language in the universe instantly. While LLM tools like ChatGPT promise to revolutionize medical communication and interpretation, they have a long way to go before they can live up to Adams’s piscine panacea. As industries rush to adopt these tools into their workflow, careful consideration of the ethical, legal, sociological, and technical challenges associated with unfettered AI development is increasingly necessary. Nevertheless, AI language models promise to play an important role in augmenting the complex, dynamic relationship between providers, interpreters, and patients.

Eric Hu is a Class of 2025 medical student at NYU Grossman School of Medicine

Peer reviewed by Michael Tanner, MD, Associate Editor, Clinical Correlations, Professor, Department of Medicine at NYU Grossman School of Medicine

Image courtesy of Wikimedia Commons: Artificial-Intelligence.jpg Source Pixabay


  1. Divi C, Koss RG, Schmaltz SP, Loeb JM. Language proficiency and adverse events in US hospitals: a pilot study. Int J Qual Health Care. 2007;19(2):60-67.
  2. Cohen AL, Rivara F, Marcuse EK, McPhillips H, Davis R. Are language barriers associated with serious medical events in hospitalized pediatric patients? Pediatrics. 2005;116(3):575-579.
  3. Flores G. The impact of medical interpreter services on the quality of health care: a systematic review. Med Care Res Rev. 2005;62(3):255-299.
  4. Lindholm M, Hargraves JL, Ferguson WJ, Reed G. Professional language interpretation and inpatient length of stay and readmission rates. J Gen Intern Med. 2012;27(10):1294-1299.
  5. Davis SH, Rosenberg J, Nguyen J, et al. Translating Discharge Instructions for Limited English-Proficient Families: Strategies and Barriers. Hosp Pediatr. 2019;9(10):779-787.
  6. Slade S, Sergent SR. Language Barrier. In: StatPearls [Internet]. StatPearls Publishing; 2023. Accessed October 6, 2023.
  7. Heath M, Hvass AMF, Wejse CM. Interpreter services and effect on healthcare–A systematic review of the impact of different types of interpreters on patient outcome. J Migr Health. 2023;7:100162.
  8. Klammer M, Pöchhacker F. Video remote interpreting in clinical communication: A multimodal analysis. Patient Educ Couns. 2021;104(12):2867-2876.
  9. Price EL, Pérez-Stable EJ, Nickleach D, López M, Karliner LS. Interpreter perspectives of in-person, telephonic, and videoconferencing medical interpretation in clinical encounters. Patient Educ Couns. 2012;87(2):226-232.
  10. Wang J. “It keeps me on my toes”: Interpreters’ perceptions of challenges in telephone interpreting and their coping strategies. Target Int J Transl Stud. 2018; 30:439–473.
  11. Wang J. ‘Telephone interpreting should be used only as a last resort.’ Interpreters’ perceptions of the suitability, remuneration and quality of telephone interpreting. Perspectives (Montclair). 2017;26:100–116.
  12. Jaeger FN, Pellaud N, Laville B, et al. Barriers to and solutions for addressing insufficient professional interpreter use in primary healthcare. BMC Health Serv Res. 2019;19:753.
  13. Oca SR, Navas A, Leiman E, Buckland DM. Effect of language interpretation modality on throughput and mortality for critical care patients: A retrospective observational study. J Am Coll Emerg Physicians Open. 2021;2(4):e12477.
  14. Juckett G, Unger K. Appropriate use of medical interpreters. Am Fam Physician. 2014;90(7):476-480.
  15. Contextual learning is nearly all you need. Nat Biomed Eng. 2022;6:1319–1320.
  16. Prepare for truly useful large language models. Nat Biomed Eng. 2023;7:85–86.
  17. Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting TSW. Large language models in medicine. Nat Med. 2023;29:1930–1940.
  18. Botpress Community. List of languages supported by ChatGPT. Botpress. Updated April 13, 2023. Accessed October 6, 2023.
  19. Peng K, Ding L, Zhong Q, et al. Towards Making the Most of ChatGPT for Machine Translation. arXiv. Preprint posted online March 24, 2023.
  20. Jiao W, Wang W, Huang JT, Wang X, Tu Z. Is ChatGPT a good translator? A preliminary study. arXiv. Preprint posted online March 19, 2023.
  21. Ali R, Connolly ID, Tang O, et al. Bridging the Literacy Gap for Surgical Consents: An AI-Human Expert Collaborative Approach. medRxiv. Preprint posted online May 10, 2023.
  22. Turner BEW. Epic, Microsoft bring GPT-4 to EHRs. Digital Health Business & Technology. April 17, 2023. Accessed October 6, 2023.
  23. Sallam M. ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare. 2023;11(6):887.
  24. De Angelis L, Baglivo F, Arzilli G, et al. ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health. Front Public Health. 2023;11:1166120.
  25. Roose K. The New ChatGPT Can ‘See’ and ‘Talk.’ Here’s What It’s Like. New York Times. September 27, 2023. Accessed October 21, 2023.