- 无标题文档
查看论文信息

论文题名(中文):

 基于大语言模型的咨询平台开发及其在安宁疗护患者及照顾者中的可用性评价    

姓名:

 施呈昊    

论文语种:

 chi    

学位:

 硕士    

学位类型:

 专业学位    

学校:

 北京协和医学院    

院系:

 北京协和医学院护理学院    

专业:

 护理学-护理学    

指导教师姓名:

 邹海欧    

论文完成日期:

 2025-05-27    

论文题名(外文):

 Development of a Consultation Platform Based on Large Language Models and Its Usability Evaluation Among Hospice Care Patients and Caregivers    

关键词(中文):

 大语言模型 人工智能 安宁疗护 网络平台 可用性评价    

关键词(外文):

 Large Language Model (LLM) Artificial Intelligence (AI) Hospice Care Web Platform Usability Evaluation    

论文文摘(中文):

研究背景:随着我国人口老龄化加剧及慢性病发病率升高,安宁疗护需求日益增长,而我国目前的安宁疗护资源难以满足日益增长的需求。社区和家庭为主要照顾场所的社区安宁疗护能有效降低对安宁疗护资源的需求,因而被多个国家和地区推广。然而在社区居家场景下,患者及照顾者在症状控制、心理支持和信息获取等方面常面临挑战,现有社区安宁疗护服务受限于专业人才短缺与传统随访模式的低效性,难以充分满足需求。大语言模型(Large Language Models,LLM)的发展为改善此困境提供了契机,但其回答的准确性与可溯源性制约了其在医疗领域的应用。为解决这一问题,开发者开发了许多优化方法(如增强检索生成、微调、提示词)用于改善LLM的表现,使其有望在医疗领域落地应用。

研究目的:采用GraphRAG技术,基于安宁疗护专业知识构建知识图谱,优化LLM在安宁疗护领域的问答表现,并评价优化效果;开发一个集成优化后LLM、面向安宁疗护患者、照顾者及医护人员的在线咨询平台;评价咨询平台在安宁疗护患者及照顾者咨询(尤其症状控制、舒适护理方面)中的可用性,并收集用户反馈以指导平台改进。

研究方法:本研究分三阶段进行:第一阶段 通过文献检索与专家筛选构建安宁疗护知识语料库,采用GraphRAG技术生成知识图谱并优化LLM(Qwen-turbo, Deepseek-V2.5);通过用于北京某三级甲等综合医院安宁疗护相关科室进修生的试题(52道选择题)对比优化模型、基座模型与人类表现,选定最优模型。第二阶段 基于Web技术(Node.js, SQLite等)开发“慧医伴”平台,采用迭代开发模式,结合专家评审与初步用户测试(PSSUQ)进行优化。第三阶段 采用混合方法评价平台可用性:招募14名安宁疗护患者或照顾者进行为期4周的远程可用性测试,使用埃德蒙顿症状评估量表(ESAS)评估干预前后症状变化,使用系统可用性量表(SUS)评估主观可用性;并对3名用户进行半结构化访谈,采用Colaizzi七步法分析访谈资料。

研究结果:第一阶段 LLM优化效果评价显示,Deepseek-V2.5-GraphRAG模型表现最佳(平均分41.40±1.52),显著优于其基座模型(36.40±2.19)及人类专家(38.82±3.92),而Qwen-turbo-GraphRAG(34.20±1.30)表现劣于其基座模型(40.80±0.45),提示优化效果具模型依赖性。第二阶段 成功开发了包含用户管理、患者管理、智能问答及审核机制的“慧医伴”平台,实现了预定功能。第三阶段 在可用性评价中,14名研究对象完成了4周远程可用性测试,共提出43个问题,直接通过12个问题,修改后通过25个问题,并进行微信或电话沟通13例次,线上或线下诊疗2例次,远程可用性测试期间平台运行稳定,SUS平均分为70.00,表明可用性达到可接受水平。ESAS评分配对t检验显示,疼痛、焦虑、食欲、生活质量等症状有改善趋势,但差异无统计学意义(P>0.05),且McNemar检验显示症状严重程度的变化在测试前后也无显著变化(P>0.05)。访谈分析提炼出六大主题:技术适应性挑战(尤其老年用户界面与操作问题)、信息可靠性的认可(源于专业性和对医疗团队的信任)、医疗安全与即时回应的矛盾(审核机制导致延迟)、人机结合服务模式的价值(技术与人工支持互补)、心理负担的缓解(减少“麻烦他人”顾虑)以及照顾者照护需求与支持(对系统化知识和情感支持的渴望)。

研究结论:GraphRAG技术可有效优化特定LLM在安宁疗护领域的问答准确性与专业性,但其效果依赖于基座模型的选择。本研究成功构建的“慧医伴”在线咨询平台具有可接受的可用性,其创新的“人机结合”(LLM辅助生成、医护审核把关)服务模式在提供可靠、便捷的居家安宁疗护咨询、缓解用户心理负担方面展现出应用价值,可作为现有社区安宁疗护服务的有效补充。然而,平台在提升老年用户易用性、缩短审核响应时间、提供紧急情况应对支持以及满足照顾者系统化学习与情感支持需求方面仍需持续优化。未来研究应扩大样本量、延长干预时间、探索更高效的审核机制,并考虑与医疗信息系统集成。

论文文摘(外文):

Background: As China's population ages and the incidence of chronic diseases rises, the demand for hospice care is growing. However, current hospice care resources are insufficient to meet this increasing need, making community and home-based care the predominant approach. In these settings, patients and caregivers often face challenges related to symptom control, psychological support, and information access. Existing community hospice care services are constrained by a shortage of professional personnel and the inefficiency of traditional follow-up methods, thus failing to adequately meet demands. Advances in large language models present a timely opportunity, yet their use in healthcare is still limited by questions of accuracy and traceability. Various optimization techniques (such as Retrieval-Augmented Generation, fine-tuning, and prompt engineering) are being explored to enhance LLM, potentially enabling their practical implementation in the medical field.

Objective: To employ GraphRAG technology to construct a knowledge graph based on specialized hospice care knowledge, optimize the question-answering performance of LLM in the hospice care domain, and evaluate the optimization effect; to develop an online consultation platform, integrating the optimized LLM for hospice care patients, caregivers, and healthcare professionals; and to evaluate the usability of the consultation platform for hospice care patients and caregivers (particularly concerning symptom control and comfort care), collecting user feedback to guide platform improvements.

Methods: This study was conducted in three phases. Phase I Built a hospice care knowledge corpus through literature review and expert screening, using GraphRAG technology to generate a knowledge graph and optimize LLM (Qwen-turbo, Deepseek-V2.5). The performance of optimized models, base models, and humans was compared using a test designed for trainees at the Peking Union Medical College Hospital Palliative Care Center (52 multiple-choice questions) to select the optimal model. Phase II Developed the "Hui Yi Ban" platform using Web technologies (Node.js, SQLite, etc.) through an iterative development process, incorporating expert reviews and preliminary user testing (PSSUQ) for optimization. Phase III Employed a mixed-methods approach to evaluate platform usability: 14 hospice care patients or caregivers were recruited for a 4-week remote usability test, using the Edmonton Symptom Assessment System (ESAS) to assess pre- and post-intervention symptom changes and the System Usability Scale (SUS) to evaluate subjective usability. Semi-structured interviews were conducted with 3 users, and interview materials were analyzed using Colaizzi's seven-step method.

Results: Phase I LLM optimization evaluation revealed that the Deepseek-V2.5-GraphRAG model performed best (41.40±1.52), significantly outperforming its base model (36.40±2.19) and human experts (38.82±3.92). Conversely, Qwen-turbo-GraphRAG (34.20±1.30) performed worse than its base model (40.80±0.45), suggesting that the optimization effect is model-dependent. Phase II The "Hui Yi Ban" platform, incorporating user management, patient management, intelligent Q&A, and a review mechanism, was successfully developed and achieved its intended functionalities. Phase III In the usability evaluation, 14 participants completed the 4-week remote test, submitting 43 questions. Twelve questions were approved directly, 25 were approved after modification, supplemented by 13 instances of WeChat/phone communication and 2 online/offline consultations. The platform remained stable during the test period. The mean SUS score was 70.00, indicating acceptable usability. ESAS scores versus t-tests showed a trend toward improvement in symptoms such as pain, anxiety, appetite, and quality of life, but the differences were not statistically significant (P > 0.05). McNemar's test showed that changes in symptom severity were also not significant before or after the test (P > 0.05). Interview analysis identified six major themes: challenges in technology adoption (especially interface and operational difficulties for older users), recognition of information reliability (attributed to professionalism and trust in the medical team), the conflict between medical safety and timely response (due to delays caused by the review mechanism), the value of the human-machine collaboration service model (complementarity of technology and human support), alleviation of psychological burden (reducing the concern of "bothering others"), and caregiver needs and support (desire for systematic knowledge and emotional support).

Conclusion: GraphRAG technology can effectively enhance the accuracy and professionalism of specific LLM in the hospice care domain, but its effectiveness depends on the choice of the base model. The successfully developed "Hui Yi Ban" online consultation platform demonstrates acceptable usability. Its innovative "human-machine collaboration" model (LLM-assisted generation with healthcare professional oversight) shows value in providing reliable and convenient home-based hospice care consultation and alleviating user psychological burden, serving as a viable supplement to existing community hospice care services. However, the platform requires further optimization regarding ease of use for older adults, reduction of review response times, provision of support for emergency situations, and meeting caregivers' needs for systematic learning and emotional support. Future research should involve larger sample sizes, longer intervention durations, exploration of more efficient review mechanisms, and consideration of integration with healthcare information systems.

开放日期:

 2025-06-09    

无标题文档

   京ICP备10218182号-8   京公网安备 11010502037788号