Anthropic will make its Claude model's safeguards for frontier LLM development visible to users. Previously, these safeguards would invisibly limit the model's effectiveness for certain research requests, drawing significant criticism. The company now acknowledges this was the wrong tradeoff and will display notifications when safeguards are triggered, falling back to a previous model version with a clear reason.
Opening Kapyn…