ChatGPT Safety Update Improves Risk Detection

News Desk - 15/05/2026

ChatGPT is introducing new safety updates designed to improve how ChatGPT recognizes risk and responds in sensitive conversations.

People use ChatGPT every day for a wide range of conversations, from simple questions to complex personal topics. However, some interactions may involve distress or emerging risk.

Therefore, ChatGPT has been updated to better recognize when risk may be developing over time. These improvements help the system identify subtle or evolving cues across conversations.

As a result, ChatGPT can respond more carefully when needed, including de-escalating situations, refusing harmful detail, or redirecting users toward safer options.

These updates are built on years of model training, evaluations, monitoring systems, and collaboration with mental health and safety experts.

Why context matters in conversations

In sensitive situations, context is critical. A message that seems harmless on its own may carry different meaning when viewed alongside earlier signals.

Therefore, ChatGPT is trained to consider conversation history to better understand intent. This helps it distinguish between safe interactions and rare high-risk cases.

In addition, the system is designed to identify patterns that may suggest harmful intent developing gradually over time.

Safety summaries for improved awareness

Some risks may appear across multiple conversations rather than within a single chat.

To address this, ChatGPT now uses safety summaries. These are short and factual notes about earlier safety-relevant context.

These summaries are created by a model trained for safety reasoning tasks. They are narrowly scoped and stored only for a limited time.

Importantly, they are not used for general personalization or long-term memory. Instead, they are only used when relevant to serious safety concerns.

Expert input from mental health professionals

These systems were developed with input from mental health experts. This includes psychiatrists and psychologists from the Global Physicians Network.

They specialize in areas such as forensic psychology, suicide prevention, and self-harm response.

Their input helped define when safety summaries should be created, how long context should be considered, and what information is most relevant.

In addition, their guidance helped ensure responses remain appropriate in sensitive situations.

Measuring improvements in safety performance

Internal evaluations show significant improvements in safety outcomes.

In long single-conversation scenarios, safe-response performance improved by 50 percent in suicide and self-harm cases. It also improved by 16 percent in harm-to-others cases.

Furthermore, across multiple conversations and models, performance gains remained consistent.

On GPT-5.5 Instant, safe-response performance improved by 52 percent in harm-to-others cases and 39 percent in suicide and self-harm cases.

Across more than 4,000 evaluations, safety summaries scored 4.93 out of 5 for relevance and 4.34 out of 5 for factual accuracy.

At the same time, testing showed no meaningful drop in everyday conversation quality.

Looking ahead in AI safety

Going forward, ChatGPT will continue improving its ability to detect risk that develops gradually or across conversations.

Currently, these improvements focus on self-harm and harm-to-others scenarios. However, future work may extend to other high-risk areas such as biology and cyber safety.

Additional safeguards will continue to guide this development.

Overall, ChatGPT is evolving to better balance helpfulness and caution in sensitive contexts.