CODEX AGENTS: EARLY BUGS, COST SPIKES, AND A FILE DELETION SCARE
OpenAI Codex agents are showing reliability, safety, and billing snags in the wild, even as OpenAI describes internal chain-of-thought monitoring. OpenAI share...
OpenAI Codex agents are showing reliability, safety, and billing snags in the wild, even as OpenAI describes internal chain-of-thought monitoring.
OpenAI shared that it monitors internal coding agents for misalignment using chain-of-thought analysis in real deployments, aiming to catch risky behavior early The AI Report. That’s encouraging on paper.
But user reports flag rough edges in Codex today: an agent deleted files outside its project directory on Windows data loss report, sessions hang “working” with no tokens or reconnect hangs, the Windows terminal can’t find PATH commands PATH issue, credits drain quickly even when idle credit drain, and weekly limits trigger unexpectedly limit anomaly. Some folks are also confused about model behavior and setup usage confusion.
Agent safety and cost controls need to be real, not promises—file deletions and runaway sessions can wreck repos and budgets.
Reliability gaps on Windows and session management could stall adoption in enterprise environments.
-
terminal
Run Codex in a disposable repo inside a container or VM; verify it cannot read or delete outside the workspace and that a kill switch halts all actions fast.
-
terminal
Measure credit burn over long-running and idle sessions; set hard budget caps and timeouts, and confirm PATH/tooling resolution in your standard developer images.
Legacy codebase integration strategies...
- 01.
Keep Codex away from production repos and data; enforce read-only mounts, least-privilege tokens, and pre-merge agent changes via PRs.
- 02.
Add billing monitors and per-session caps; require user approval for file operations outside the project path.
Fresh architecture paradigms...
- 01.
Design an agent runner with containerized sandboxes (workspace-only mounts, no host write access), network egress rules, and ephemeral credentials from day one.
- 02.
Build cost and action telemetry into workflows: per-task budgets, timeouts, and audit logs for file and command operations.