OpenClaw "OAuth refresh failed (refresh_contention)": clear the stuck lock
Your openai-codex agent went slow and started answering "as Claude" even though it's
configured for Codex, and the logs repeat OAuth refresh failed (refresh_contention):
another process is already refreshing ... | file lock timeout for
/home/node/.openclaw/locks/oauth-refresh/…. This is a stale oauth-refresh
lock, not a bad token: a refresh grabbed the file lock at startup, never released
it, and now every refresh waits ~150s for a lock that will never free. The
fix: delete the stale lock file(s) and restart the instance — not
re-authenticate. Re-auth doesn't touch the lock, and restarting alone doesn't help because
the lock lives on the data volume and survives restarts.
FailoverError: OAuth refresh failed (refresh_contention): another process is
already refreshing ... | file lock timeout for
/home/node/.openclaw/locks/oauth-refresh/sha256-<hash>.lock
[model-fallback/decision] requested=openai-codex/gpt-5.5 candidate=openai-codex/gpt-5.5
reason=auth next=anthropic/claude-sonnet-4-6
🧠 Model: anthropic/claude-sonnet-4-6 · 🔑 oauth (openai-codex:<name>)
# ...every turn ~150s slower; a restart doesn't fix it. Why the lock never clears itself (pid 1)
OpenClaw serializes token refresh with a file lock so two lanes don't refresh the same
profile at once. The lock file
~/.openclaw/locks/oauth-refresh/sha256-<profilehash>.lock records its
holder as {"pid": 1, "createdAt": ...}, and the staleness check is simply
"is that pid still alive?" That check is sound on a bare host — but inside a
container the openclaw process is always pid 1, and pid 1 always exists. So a lock
abandoned by a crashed or killed refresh is never seen as stale: the check keeps
finding "pid 1 is alive," keeps honoring the dead lock, and every subsequent refresh blocks
on it until it times out with refresh_contention.
That's also why restarting the instance doesn't help. The lock sits on the instance's persistent data volume, not in memory, so it's still there after the pod comes back — and the fresh process re-reads the same stuck lock. Restarting only helps after the lock file is gone.
Why your Codex agent is answering as Claude
Once the lock is stuck, every lane that needs a token — main, each
cron job, each session:…:heartbeat — tries to refresh, waits out
the lock timeout (~150s), and then falls back:
reason=auth next=anthropic/claude-sonnet-4-6. The model-fallback chain picks
the next configured provider, so a primary of openai-codex/gpt-5.5 silently
runs on anthropic/claude-sonnet-4-6. That produces the two headline symptoms
at once: everything is slow (each turn eats a ~150s stall first) and the
model line shows Claude even though you configured Codex. Crucially, at
this stage there is no 401, no invalid_grant, no
revoked — only refresh_contention. The token is fine;
the lock is the problem.
Confirm it's a stale lock, not a dead token
Two failures look similar in the logs but need opposite fixes. Read the fallback
reason before you touch anything:
-
reason=auth+file lock timeout/refresh_contention= a stale lock. The token is valid; a lock is deadlocked. Delete the lock and restart (below). This page. -
reason=auth_permanent+401 "refresh token has already been used"= a dead token (refresh_token_reused). The single-use refresh token was genuinely burned and no lock delete will bring it back — you re-authenticate. That's a different fix: OpenClaw Codex OAuth "refresh token already used" (401). - Verify the lock is actually abandoned. There are two lock files,
in different directories, both carrying
{"pid": 1, ...}: the coordination semaphore~/.openclaw/locks/oauth-refresh/sha256-<hash>.lock, and the write-lock next to the profile file,~/.openclaw/agents/<agent>/agent/auth-profiles.json.lock(not under.openclaw/locks/). Confirm each still reads"pid": 1before removing it. Don't try to read the profiles themselves — they're encrypted at rest (AES-GCM:ciphertext/iv/tag).
A stale lock left in place long enough can turn into the dead-token case. The refresh
token is single-use and rotating; prolonged contention lets two lanes double-use it, and
the freshly minted token's write is blocked by the same deadlocked lock so it's never
persisted. Clear the lock early and you only lose the stalls. Leave it and you can come
back to reason=auth_permanent + 401 refresh_token_reused — at
which point clearing the lock is still correct, but you'll also have to re-auth.
Fix it (self-hosted)
Confirm each lock file is stale ("pid": 1), delete both, then restart the
instance so openclaw re-initializes with a clean lock directory:
# 1. confirm it's the stale pid:1 lock (should print "pid": 1)
cat ~/.openclaw/locks/oauth-refresh/*.lock
# 2. remove BOTH the coordination lock and the co-located auth-profiles write-lock
rm -f ~/.openclaw/locks/oauth-refresh/*.lock
find ~/.openclaw/agents -name 'auth-profiles.json.lock' -delete
# 3. restart the gateway so it re-inits with no held lock
openclaw gateway stop && openclaw gateway start
Then verify recovery: the refresh_contention lines stop, turns are fast again,
and the model line shows openai-codex/… instead of the anthropic fallback:
openclaw logs --since 6m | grep -c refresh_contention # want 0
# and the gateway model line back on your configured primary:
# 🧠 Model: openai-codex/gpt-5.5 · 🔑 oauth (openai-codex:<name>)
If, after clearing, a refresh finally reaches the provider and comes back
401 refresh_token_reused (reason=auth_permanent), the token was
already burned — re-authenticate the profile:
openclaw models auth login --provider openai Stop babysitting your OpenClaw box
Fix it once — or stop fixing it for good.
Apply the checklist above and keep self-hosting, or skip the maintenance entirely: run your OpenClaw on managed hosting from $6.90/mo, starting with a 7-day free trial. We handle the stale locks, gateway restarts, version upgrades, and uptime — and you can import your existing instance in a couple of minutes. Cancel anytime.
- Managed hosting handles stale
.jsonl.lockfiles, gateway restarts, and version upgrades for you - Import your existing OpenClaw setup in minutes — keep your channels and configuration
- The optional $199 setup is scoped: no custom development, enterprise/SRE support, or unsupported self-hosting repair
If you would rather compare options first, review OpenClaw cloud hosting or see the best OpenClaw hosting options before deciding.
Related, but not this
- "Your refresh token has already been used" (401) — the permanent
dead-token case (
reason=auth_permanent), where clearing the lock isn't enough and you re-authenticate: see OpenClaw Codex OAuth "refresh token already used" (401). - "agent failed before reply: session file locked" — a different lock (the session/conversation file, not the oauth-refresh lock): see fix OpenClaw "session file locked" timeouts.
- "No API key found for provider openai" after an upgrade — the empty-store case (no profile imported at all), not a wedged lock: see OpenClaw "No API key found for provider openai" after an upgrade.
How managed hosting avoids this
The stale lock survives because it sits on a persistent volume that a plain restart can't
clear. On managed OpenClaw hosting from Lobsterland,
the runtime's coordination locks (.openclaw/locks) are mounted on ephemeral
storage that dies with the pod, so a wedged oauth-refresh lock can't outlive a restart. An
init step that runs before the openclaw process starts also sweeps any leftover
*.lock carrying the lock signature — including the
auth-profiles.json.lock that sits next to the profile data — so
refresh_contention from a pid:1 lock doesn't recur. You restart; keeping locks
from wedging across restarts is the platform's job.
Frequently asked questions
What does OpenClaw's "OAuth refresh failed (refresh_contention)" mean?
It means a previous token refresh grabbed the oauth-refresh file lock and never released it,
so every later refresh times out waiting for the lock. The log line is
FailoverError: OAuth refresh failed (refresh_contention): another process is already
refreshing ... | file lock timeout for /home/node/.openclaw/locks/oauth-refresh/sha256-<hash>.lock.
The refresh token itself is usually still valid at this stage — this is a lock deadlock, not
a rejected credential (there's no 401 or invalid_grant yet). The
fix is to clear the stale lock and restart, not to re-authenticate.
Why does OpenClaw run on Claude when I configured openai-codex?
Because each lane (main, cron, session heartbeats) waits about 150 seconds on
the deadlocked oauth-refresh lock, gives up, and model-fallback kicks in:
[model-fallback/decision] requested=openai-codex/... reason=auth
next=anthropic/claude-sonnet-4-6. So the model line shows anthropic even though your
primary is openai-codex. It's a symptom of the stuck refresh lock, not a configuration
change — clear the lock and the configured Codex primary comes back.
Why doesn't restarting the instance fix refresh_contention?
Because the lock file lives on the instance's persistent data volume, so it survives a pod restart. The staleness check is "is the holder pid alive?" — but inside a container the openclaw process is always pid 1, and pid 1 always exists, so an abandoned lock is never detected as stale. A plain restart re-reads the same stuck lock. You have to delete the lock file (or run on a host that keeps these locks ephemeral) before the restart can help.
Does deleting the lock fix it, or do I have to re-authenticate?
Check the fallback reason. reason=auth with a file lock timeout
means a stale lock — deleting the lock and restarting fixes it and the token is fine.
reason=auth_permanent with 401 refresh_token_reused means the
single-use refresh token was already burned (prolonged contention can double-use and strand
it), and you must re-authenticate the openai-codex profile. Clearing the lock is still the
right first step: it removes the 150-second stalls and surfaces which case you're in.
Where are OpenClaw's oauth-refresh lock files?
There are two, in different directories, and both carry {"pid": 1, "createdAt": ...}.
The coordination semaphore is
/home/node/.openclaw/locks/oauth-refresh/sha256-<profilehash>.lock. The
auth-profile write-lock sits next to the profile file at
/home/node/.openclaw/agents/<agent>/agent/auth-profiles.json.lock (not
under .openclaw/locks/). Clearing only the first can leave the second holding
the deadlock, so sweep both. Don't try to read the profiles themselves — they're encrypted
at rest (AES-GCM).