Further Notes on Our Recent Research on AI Delegation and Long-Horizon Reliability
Microsoft Research clarifies findings on LLM reliability in delegated tasks. The paper investigates how LLMs can corrupt documents when given delegated workflows and aims to improve evaluation methods for these complex AI systems.