OpenAI ChatGPT-5 在研究中显示错误率为 25%

Kerem Gülen · September 25, 2025, 17:58 ·1 min read

根据 Tom’s Guide 的一篇文章，对 OpenAI 的 ChatGPT-5 模型的研究表明，它在大约 25% 的情况下会产生错误的答案。虽然这凸显了持续的错误率，但与前身 GPT-4 相比，该模型在准确性方面有了显着提高。

具体来说，与 GPT-4 相比，ChatGPT-5 的事实错误减少了约 45%，产生的幻觉或完全编造的答案也减少了六倍。尽管取得了这些进展，但研究报告称，该模型仍然存在过度自信的问题，并且可以自信地呈现不正确的信息，这种特征通常被称为幻觉。

模型的性能和准确性因具体任务而异。例如，它在 2025 年 AIME 数学测试中得分为 94.6%，在一组现实世界编码任务中的成功率为 74.9%。在更具挑战性的 MMLU Pro 基准测试（一项涵盖科学、数学和历史的学术测试）上，ChatGPT-5 的准确率约为 87%。然而，它在常识性知识和复杂推理题上仍然会出错。

该研究将这些错误归因于几个潜在因素。其中包括模型在充分理解细微问题、使用可能过时或不完整的训练数据方面的局限性，以及基于概率模式预测的基本设计。这种机制有时会产生看似合理但实际上不准确的响应。

该文章建议用户验证来自 ChatGPT-5 的任何关键信息。鉴于该模型并非万无一失，这种谨慎对于与专业、学术或健康问题相关的询问尤其重要，即使该模型已记录在案的可靠性方面有所改进。

微软警告 4 亿台 Windows PC 面临安全风险

Written by

Kerem from Turkey has an insatiable curiosity for the latest advancements in tech gadgets and a knack for innovative thinking. With 3 years of experience in editorship and a childhood dream of becoming a journalist, Kerem has always been curious about the latest tech gadgets and is constantly seeking new ways to create. As a Master's student in Strategic Communications, Kerem is eager to learn more about the ever-evolving world of technology. His primary focuses are artificial intelligence and digital inclusion, and he delves into the most current and accurate information on these topics.

View all posts →