Preview |
PDF (Original Article)
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
1MB |
|
Other (Supplementary Information)
33kB |
| Item Type: | Article |
|---|---|
| Title: | Large language models for patient education prior to interventional radiology procedures: a comparative study |
| Creators Name: | Levita, Bogdan, Eminovic, Semil, Lüdemann, Willie Magnus, Schnapauff, Dirk, Schmidt, Robin, Haack, Anna-Maria, Dell'Orco, Andrea, Nawabi, Jawed and Penzkofer, Tobias |
| Abstract: | PURPOSE: This study evaluates four large language models' (LLMs) ability to answer common patient questions preceding transarterial periarticular embolization (TAPE), computed tomography (CT)-guided high-dose-rate (HDR) brachytherapy, and bleomycin electrosclerotherapy (BEST). The goal is to evaluate their potential to enhance clinical workflows and patient comprehension, while also assessing associated risks. MATERIALS AND METHODS: Thirty-five TAPE, 34 CT-HDR brachytherapy, and 36 BEST related questions were presented to ChatGPT-4o, DeepSeek-V3, OpenBioLLM-8b, and BioMistral-7b. The LLM-generated responses were independently assessed by two board-certified radiologists. Accuracy was rated on a 5-point Likert scale. Statistics compared LLM performance across question categories for patient-education suitability. RESULTS: DeepSeek-V3 attained the highest mean scores for BEST [4.49 (± 0.77)] and CT-HDR [4.24 (± 0.81)] and demonstrated comparable performance to ChatGPT-4o for TAPE-related questions (DeepSeek-V3 [4.20 (± 0.77)] vs. ChatGPT-4o [4.17 (± 0.64)]; p = 1.000). In contrast, OpenBioLLM-8b (BEST 3.51 (± 1.15), CT-HDR 3.32 (± 1.13), TAPE 3.34 (± 1.16)) and BioMistral-7b (BEST 2.92 (± 1.35), CT-HDR 3.03 (± 1.06), TAPE 3.33 (± 1.28)) performed significantly worse than DeepSeek-V3 and ChatGPT-4o across all procedures. Preparation/Planning was the only category without statistically significant differences across all three procedures. CONCLUSION: DeepSeek-V3 and ChatGPT-4o excelled on TAPE, BEST, and CT-HDR brachytherapy questions, indicating potential to enhance patient education in interventional radiology, where complex but minimally invasive procedures often are explained in brief consultations. However, OpenBioLLM-8b and BioMistral-7b exhibited more frequent inaccuracies, suggesting that LLMs cannot replace comprehensive clinical consultations yet. Patient feedback and clinical workflow implementation should validate these findings. |
| Keywords: | Large Language Models, Interventional Radiology, Patient Education |
| Source: | CVIR Endovascular |
| ISSN: | 2520-8934 |
| Publisher: | Springer Nature |
| Volume: | 8 |
| Number: | 1 |
| Page Range: | 81 |
| Date: | 13 October 2025 |
| Official Publication: | https://doi.org/10.1186/s42155-025-00609-z |
| PubMed: | View item in PubMed |
Repository Staff Only: item control page


Tools
Tools

