欧洲泌尿外科学会性与生殖健康指南机器人的准确性、可读性与可理解性

作者：心理与性

发表时间：2026-04-09 17:42

欧洲泌尿外科学会性与生殖健康指南机器人的准确性、可读性与可理解性

Accuracy, readability, and understandability of European Association of Urology guidelines bot for Sexual and Reproductive Health Guidelines

——《性医学杂志》第23卷第4期，2026年4月——

, Volume 23, Issue 4,

April 2026

【摘要】背景：近期，欧洲泌尿外科学会（EAU）指南推出了一款官方机器人，旨在辅助泌尿外科医生进行指南查阅。然而，截至目前，尚无针对该工具的外部验证数据。目的：评估该“性与生殖健康指南机器人”在回答问题时的准确性、完整性及清晰度。方法：基于EAU《性与生殖健康指南》中的推荐建议，共设计了228个问题。每个问题均被输入至EAU指南机器人中，其给出的回答随后由两名泌尿男科专家进行审阅。若审阅结果存在分歧，则通过与第三位专家讨论的方式予以解决。评估结果进一步按推荐等级进行了分层分析。最终效果：利用5分制李克特量表（Likert scale）评估该机器人对指南相关问题的回答在准确性、完整性和清晰度方面的达标率；并分析推荐等级对回答质量的影响。结果：总体而言，共设计了228个问题。在准确性方面，224/228（98.3%）个回答被判定为准确（评分4-5分）；2/228（0.9%）个回答准确性尚可（评分3分）；而2/228（0.9%）个回答被判定为不准确（评分1-2分）。在完整性方面，223/228（97.8%）个回答被判定为完整（评分4-5分）；2/228（0.9%）个回答完整性尚可（评分3分）；而3/228（1.3%）个回答被判定为不完整。最后，在清晰度方面，225/228（98.7%）个回答被判定为清晰（评分4-5分）；2/228（0.9%）个回答清晰度尚可（评分3分）；且无回答被判定为不清晰（0/228）。在对比“强推荐”与“弱推荐”等级的问题回答时，未发现显著差异。临床启示：EAU指南机器人可作为一种可靠的临床决策支持工具，辅助泌尿外科医生快速获取关于性与生殖健康管理的循证医学指导。优势与局限性：本研究是对EAU指南机器人进行的首次外部评估。我们的结果表明，与通用人工智能工具相比，该工具在可靠性方面实现了显著提升。然而，我们的查询内容较为直接，且是直接基于指南建议拟定的；因此，所得结果可能并不适用于复杂的现实临床情境。结论：EAU指南机器人（EAU Guidelines Bot）是一款用于辅助性与生殖健康指南查阅的准确且可靠的工具，但仍需进一步的验证，以评估其在临床实践中的适用性。

【关键词】人工智能，聊天机器人，EAU指南，性与生殖健康

[Abstract] Background: Recently, the European Association of Urology (EAU) Guidelines presented an official Bot to assist urologists during Guidelines navigation. However, up to date no external validation is available. Aim: To assess accuracy, completeness, and clarity of the Guidelines Bot for Sexual and Reproductive Health. Methods: A total of 228 questions based on the EAU Sexual and Reproductive Health Guidelines recommendations were developed. Each question was inputted to the EAU Guidelines Bot and the response was reviewed by two expert uro-andrologists. Discrepancies were resolved by discussion with a third expert. Results were further stratified per grade of recommendation. Outcomes: Evaluate the rate of accurate, complete, and clear answers to guidelines-related questions using a 5-point Likert scale and the impact of the grade of recommendation on the quality of the answer. Results: Overall, 228 questions were developed. In terms of accuracy 224/228 (98.3%) were defined as accurate (score-4-5), 2/228 (0.9%) presented a fair accuracy (score = 3) while 2/228 (0.9%) were deemed not accurate (score 1-2). In terms of completeness, 223/228 (97.8%) were defined as complete (score-4-5), 2/228 (0.9%) presented a fair completeness (score 3), while 3/228 (1.3%) were deemed not complete. Finally in terms of clarity, 225/228 (98.7%) were defined as clear (score-4-5), 2/228 (0.9%) presented a fair clarity (score 3) and 0/228 were not clear. When comparing strong and weak recommendations, no differences were recorded. Clinical Implications: The EAU Guidelines Bot may serve as a reliable clinical decision support tool for urologists seeking rapid, evidence-based guidance on sexual and reproductive health management. Strengths & Limitations: This is the first external evaluation of the EAU Guidelines Bot. Our results suggest a significant improvement in terms of reliability when compared to general AI tools. However, our queries were straightforward and developed directly from guideline recommendations and results might not apply to complex real-world clinical scenarios. Conclusions: EAU Guidelines Bot represents an accurate and reliable tool for Sexual and Reproductive Health Guidelines navigation, but further validation is required to evaluate its applicability in clinical practice.

[Key word] AI, chatbot, eau guidelines, sexual and reproductive health

论文原文：Valerio Santarelli, Riccardo Lombardo, Matteo Romagnoli, et al. (2026). Accuracy, readability, and understandability of European Association of Urology guidelines bot for Sexual and Reproductive Health Guidelines. The Journal of Sexual Medicine, Volume 23, Issue 4, April 2026.

https://doi.org/10.1093/jsxmed/qdag041

（翻译兼责任编辑：MARY）

（需要英文原文的朋友，请联系微信：millerdeng95或iacmsp）