Adam Daniel

Freelance AI Engineer

  • Quoting Anthropic: Opus 4.8 Safety “somewhat less robust”

    Agentic safety. Although it shows improvements in some areas (such as refusing malicious requests), we found Opus 4.8 to be somewhat less robust than Opus 4.7 in several agentic contexts (such as vulnerability to prompt injection attacks). However, the application of our safeguards closes the gap between the models in practice. […]

  • Introducing GHA-bench

    GHA-bench is a benchmark and a set of evals for how well different coding agents author and test GitHub Actions using different languages.