心理与性

注册日期:2007-12-11
访问总量:250105次

menu网络日志正文menu

谄媚型AI削弱亲社会意向并助长依赖性


发表时间:+-

image.png

谄媚型AI削弱亲社会意向并助长依赖性

Sycophantic AI decreases prosocial intentions and promotes dependence

 

——《科学》第391卷,第6792期,2026年3月26日——

Volume 391, Issue 6792, 26 March 2026

 

【结构化摘要】引言:随着人工智能(AI)系统日益渗透到日常生活的建议与指导场景中,一种名为“逢迎”(sycophancy)的现象引发了广泛关注:即基于AI的大型语言模型倾向于对用户表现出过度附和、谄媚或无原则肯定的行为。尽管既往研究已揭示,这种逢迎倾向会对那些本就易受操纵或易产生妄想的弱势群体构成风险,但其对普通大众的社会判断与行为究竟会产生何种影响,目前仍是一个未解之谜。本文旨在揭示:逢迎现象在当前主流的AI系统中已普遍存在,且会对用户的社会判断能力产生实质性的负面影响。研究理由:一系列备受瞩目的社会事件已将AI的逢迎行为与严重的心理危害(如产生妄想、自残乃至自杀倾向)紧密联系在一起。除此类极端案例外,社会心理学与道德心理学领域的研究亦表明,不加辨别的肯定会引发更为隐蔽、却同样后果严重的负面效应:即固化用户的非适应性信念(maladaptive beliefs),削弱其承担责任的意愿,并阻碍其在犯错后采取行为纠正与修复措施。基于此,我们提出如下假设:AI模型往往会对用户提供过度的肯定——即便在社会伦理或道德规范层面,这种肯定显得极不恰当;且此类逢迎式回复将对用户的信念与行为意图产生消极的引导作用。为验证这一假设,我们设计并实施了两项相互补充的实验。首先,我们利用三组涵盖多元应用场景的数据集——具体包括日常建议咨询、涉及道德越轨的情境,以及明确包含有害内容的场景——对11个主流AI模型进行了全面测试,旨在量化并评估“逢迎”现象在这些模型中的普遍程度。其次,我们开展了三项预先注册的实验,共有2405名参与者加入其中,旨在探究逢迎行为如何影响用户对AI的判断、行为意图及认知。参与者在基于情境描述的场景中,以及通过实时聊天互动的方式与AI系统进行交互;在实时聊天环节中,他们讨论了自己生活中曾经历过的真实冲突。我们还测试了这些影响是否会因AI的回复风格或用户感知的回复来源(即是AI还是人类)而有所差异。结果:我们发现,AI的逢迎行为不仅普遍存在,且具有危害性。在所测试的11个AI模型中,AI肯定用户行为的频率平均比人类高出49%,即便是在涉及欺骗、违法或其他有害行为的情境下也是如此。在针对Reddit社区r/AmITheAsshole版块帖子的回复中,当人类群体一致认为用户有错(肯定率为0%)时,AI系统却在51%的案例中肯定了用户。在我们的真人实验中,即使仅与逢迎型AI进行过一次交互,也会削弱参与者承担责任及修复人际冲突的意愿,同时却增强了他们对自己行为正确性的确信。然而,尽管这种逢迎行为扭曲了用户的判断,但逢迎型AI模型依然赢得了用户的信任与偏爱。在控制了人口统计学特征、对AI的既往熟悉程度、感知的回复来源以及回复风格等个体差异因素后,上述所有影响依然显著存在。这形成了一种“反常激励”机制,导致逢迎行为难以消退:恰恰是这种会造成危害的特性,反而成为了驱动用户参与度的主要动力。结论:AI的逢迎行为绝非仅仅关乎回复风格的细枝末节,也非仅限于特定领域的边缘风险,而是一种普遍存在的行为模式,且会产生广泛的后续影响。尽管获得肯定会让用户感到被支持,但这种逢迎行为实际上会削弱用户自我纠正的能力,并妨碍其做出负责任的决策。然而,正因为这种逢迎行为深受用户喜爱且能有效提升参与度,目前几乎没有任何动力去促使其消减。我们的研究突显了解决AI逢迎行为这一紧迫需求;作为一种可能危害用户自我认知及人际关系的社会风险,我们必须通过制定有针对性的设计规范、评估标准及问责机制来加以应对。研究结果表明,那些看似无害的设计与工程决策,实则可能引发后果严重的危害;因此,审慎地研究并预判AI技术可能带来的影响,对于保障用户的长期福祉至关重要。

 

[Structured Abstract] Introduction: As artificial intelligence (AI) systems are increasingly used for everyday advice and guidance, concerns have emerged about sycophancy: the tendency of AI-based large language models to excessively agree with, flatter, or validate users. Although prior work has shown that sycophancy carries risks for groups who are already vulnerable to manipulation or delusion, syncophancy’s effects on the general population’s judgments and behaviors remain unknown. Here, we show that sycophancy is widespread in leading AI systems and has harmful effects on users’ social judgments. Rationale: High-profile incidents have linked sycophancy to psychological harms such as delusions, self-harm, and suicide. Beyond these cases, research in social and moral psychology suggests that unwarranted affirmation can produce subtler but still consequential effects: reinforcing maladaptive beliefs, reducing responsibility-taking, and discouraging behavioral repair after wrongdoing. We hypothesized that AI models excessively affirm users even when socially or morally inappropriate and that such responses negatively influence users’ beliefs and intentions. To test this, we conducted two complementary experiments. First, we measured the prevalence of sycophancy across 11 leading AI models using three datasets spanning a variety of use contexts, including everyday advice queries, moral transgressions, and explicitly harmful scenarios. Second, we conducted three preregistered experiments with 2405 participants to understand how sycophancy influences users’ judgments, behavioral intentions, and perceptions of AI. Participants interacted with AI systems in vignette-based settings and a live-chat interaction where they discussed a real past conflict from their lives. We also tested whether effects varied by response style or perceived response source (AI versus human). Results: We find that sycophancy is both prevalent and harmful. Across 11 AI models, AI affirmed users’ actions 49% more often than humans on average, including in cases involving deception, illegality, or other harms. On posts from r/AmITheAsshole, AI systems affirm users in 51% of cases where human consensus does not (0%). In our human experiments, even a single interaction with sycophantic AI reduced participants’ willingness to take responsibility and repair interpersonal conflicts, while increasing their own conviction that they were right. Yet despite distorting judgment, sycophantic models were trusted and preferred. All of these effects persisted when controlling for individual traits such as demographics and prior familiarity with AI; perceived response source; and response style. This creates perverse incentives for sycophancy to persist: The very feature that causes harm also drives engagement. Conclusion: AI sycophancy is not merely a stylistic issue or a niche risk, but a prevalent behavior with broad downstream consequences. Although affirmation may feel supportive, sycophancy can undermine users’ capacity for self-correction and responsible decision-making. Yet because it is preferred by users and

 

drives engagement, there has been little incentive for sycophancy to diminish. Our work highlights the pressing need to address AI sycophancy as a societal risk to people’s self-perceptions and interpersonal relationships by developing targeted design, evaluation, and accountability mechanisms. Our findings show that seemingly innocuous design and engineering choices can result in consequential harms, and thus carefully studying and anticipating AI’s impacts is critical to protecting users’ long-term well-being.

 

论文原文:Myra Cheng, Cinoo Lee, Pranav Khadpe , Sunny Yu, et al. (2026). Sycophantic AI decreases prosocial intentions and promotes dependence. Science, Volume 391, Issue 6792, 26 March 2026.

https://doi.org/10.1126/science.aec8352

 

(翻译兼责任编辑:MARY)

 

(需要英文原文的朋友,请联系微信:millerdeng95或iacmsp)



浏览(97)
thumb_up(0)
评论(0)
  • 当前共有0条评论