Blog

OpenClaw "OAuth refresh failed (refresh_contention)": clear the stuck lock

Q: Why does OpenClaw run on Claude when I configured openai-codex?

Because each lane (main, cron, session heartbeats) waits about 150 seconds on the deadlocked oauth-refresh lock, gives up, and model-fallback kicks in: [model-fallback/decision] requested=openai-codex/... reason=auth next=anthropic/claude-sonnet-4-6. So the model line shows anthropic even though your primary is openai-codex. It is a symptom of the stuck refresh lock, not a configuration change — clear the lock and the configured Codex primary comes back.

Q: Does deleting the lock fix it, or do I have to re-authenticate?

Check the fallback reason. reason=auth with a file lock timeout means a stale lock — deleting the lock and restarting fixes it and the token is fine. reason=auth_permanent with 401 refresh_token_reused means the single-use refresh token was already burned (prolonged contention can double-use and strand it), and you must re-authenticate the openai-codex profile. Clearing the lock is still the right first step: it removes the 150-second stalls and surfaces which case you are in.

Q: Where are OpenClaw's oauth-refresh lock files?

There are two, in different directories, and both carry {"pid": 1, "createdAt": ...}. The coordination semaphore is /home/node/.openclaw/locks/oauth-refresh/sha256- .lock. The auth-profile write-lock sits next to the profile file at /home/node/.openclaw/agents/ /agent/auth-profiles.json.lock (not under .openclaw/locks/). Clearing only the first can leave the second holding the deadlock, so sweep both. Don't try to read the profiles themselves — they're encrypted at rest (AES-GCM).

Your openai-codex agent went slow and started answering "as Claude" even though it's configured for Codex, and the logs repeat OAuth refresh failed (refresh_contention): another process is already refreshing ... | file lock timeout for /home/node/.openclaw/locks/oauth-refresh/…. This is a stale oauth-refresh lock, not a bad token: a refresh grabbed the file lock at startup, never released it, and now every refresh waits ~150s for a lock that will never free. The fix: delete the stale lock file(s) and restart the instance — not re-authenticate. Re-auth doesn't touch the lock, and restarting alone doesn't help because the lock lives on the data volume and survives restarts.

What you'll see

FailoverError: OAuth refresh failed (refresh_contention): another process is
already refreshing ... | file lock timeout for
/home/node/.openclaw/locks/oauth-refresh/sha256-<hash>.lock

[model-fallback/decision] requested=openai-codex/gpt-5.5 candidate=openai-codex/gpt-5.5
  reason=auth next=anthropic/claude-sonnet-4-6

🧠 Model: anthropic/claude-sonnet-4-6 · 🔑 oauth (openai-codex:<name>)
# ...every turn ~150s slower; a restart doesn't fix it.

Why the lock never clears itself (pid 1)

OpenClaw serializes token refresh with a file lock so two lanes don't refresh the same profile at once. The lock file ~/.openclaw/locks/oauth-refresh/sha256-<profilehash>.lock records its holder as {"pid": 1, "createdAt": ...}, and the staleness check is simply "is that pid still alive?" That check is sound on a bare host — but inside a container the openclaw process is always pid 1, and pid 1 always exists. So a lock abandoned by a crashed or killed refresh is never seen as stale: the check keeps finding "pid 1 is alive," keeps honoring the dead lock, and every subsequent refresh blocks on it until it times out with refresh_contention.

That's also why restarting the instance doesn't help. The lock sits on the instance's persistent data volume, not in memory, so it's still there after the pod comes back — and the fresh process re-reads the same stuck lock. Restarting only helps after the lock file is gone.

Why your Codex agent is answering as Claude

Once the lock is stuck, every lane that needs a token — main, each cron job, each session:…:heartbeat — tries to refresh, waits out the lock timeout (~150s), and then falls back: reason=auth next=anthropic/claude-sonnet-4-6. The model-fallback chain picks the next configured provider, so a primary of openai-codex/gpt-5.5 silently runs on anthropic/claude-sonnet-4-6. That produces the two headline symptoms at once: everything is slow (each turn eats a ~150s stall first) and the model line shows Claude even though you configured Codex. Crucially, at this stage there is no 401, no invalid_grant, no revoked — only refresh_contention. The token is fine; the lock is the problem.

Confirm it's a stale lock, not a dead token

Two failures look similar in the logs but need opposite fixes. Read the fallback reason before you touch anything:

reason=auth + file lock timeout / refresh_contention = a stale lock. The token is valid; a lock is deadlocked. Delete the lock and restart (below). This page.
reason=auth_permanent + 401 "refresh token has already been used" = a dead token (refresh_token_reused). The single-use refresh token was genuinely burned and no lock delete will bring it back — you re-authenticate. That's a different fix: OpenClaw Codex OAuth "refresh token already used" (401).
Verify the lock is actually abandoned. There are two lock files, in different directories, both carrying {"pid": 1, ...}: the coordination semaphore ~/.openclaw/locks/oauth-refresh/sha256-<hash>.lock, and the write-lock next to the profile file, ~/.openclaw/agents/<agent>/agent/auth-profiles.json.lock (not under .openclaw/locks/). Confirm each still reads "pid": 1 before removing it. Don't try to read the profiles themselves — they're encrypted at rest (AES-GCM: ciphertext/iv/tag).

The escalation to watch for

A stale lock left in place long enough can turn into the dead-token case. The refresh token is single-use and rotating; prolonged contention lets two lanes double-use it, and the freshly minted token's write is blocked by the same deadlocked lock so it's never persisted. Clear the lock early and you only lose the stalls. Leave it and you can come back to reason=auth_permanent + 401 refresh_token_reused — at which point clearing the lock is still correct, but you'll also have to re-auth.

Fix it (self-hosted)

Confirm each lock file is stale ("pid": 1), delete both, then restart the instance so openclaw re-initializes with a clean lock directory:

# 1. confirm it's the stale pid:1 lock (should print "pid": 1)
cat ~/.openclaw/locks/oauth-refresh/*.lock

# 2. remove BOTH the coordination lock and the co-located auth-profiles write-lock
rm -f ~/.openclaw/locks/oauth-refresh/*.lock
find ~/.openclaw/agents -name 'auth-profiles.json.lock' -delete

# 3. restart the gateway so it re-inits with no held lock
openclaw gateway stop && openclaw gateway start

Then verify recovery: the refresh_contention lines stop, turns are fast again, and the model line shows openai-codex/… instead of the anthropic fallback:

openclaw logs --since 6m | grep -c refresh_contention   # want 0
# and the gateway model line back on your configured primary:
# 🧠 Model: openai-codex/gpt-5.5 · 🔑 oauth (openai-codex:<name>)

If, after clearing, a refresh finally reaches the provider and comes back 401 refresh_token_reused (reason=auth_permanent), the token was already burned — re-authenticate the profile:

openclaw models auth login --provider openai

Stop babysitting your OpenClaw box

Fix it once — or stop fixing it for good.

Apply the checklist above and keep self-hosting, or skip the maintenance entirely: run your OpenClaw on managed hosting from $6.90/mo, starting with a 7-day free trial. We handle the stale locks, gateway restarts, version upgrades, and uptime — and you can import your existing instance in a couple of minutes. Cancel anytime.

Managed hosting — from $6.90/mo Your own hosted OpenClaw instance with automatic restarts and version upgrades. Starts with a 7-day free trial — import your current setup, keep your channels, cancel anytime.

$199 managed setup — optional Prefer we do it for you? One workspace configured end-to-end: first-run config, one 30-minute onboarding/debug session, and a 7-day follow-up. Limited weekly slots.

Managed hosting handles stale .jsonl.lock files, gateway restarts, and version upgrades for you
Import your existing OpenClaw setup in minutes — keep your channels and configuration
The optional $199 setup is scoped: no custom development, enterprise/SRE support, or unsupported self-hosting repair

If you would rather compare options first, review OpenClaw cloud hosting or see the best OpenClaw hosting options before deciding.

Run it on managed hosting — from $6.90/mo Prefer we set it up? $199 managed setup See how managed OpenClaw hosting keeps oauth-refresh locks from wedging

OpenClaw import first screen in OpenClaw Setup dashboard (light theme) — 1) Paste import payload

OpenClaw import first screen in OpenClaw Setup dashboard (dark theme) — 1) Paste import payload

OpenClaw import completed screen in OpenClaw Setup dashboard (light theme) — 2) Review and launch

OpenClaw import completed screen in OpenClaw Setup dashboard (dark theme) — 2) Review and launch

Related, but not this

"Your refresh token has already been used" (401) — the permanent dead-token case (reason=auth_permanent), where clearing the lock isn't enough and you re-authenticate: see OpenClaw Codex OAuth "refresh token already used" (401).
"agent failed before reply: session file locked" — a different lock (the session/conversation file, not the oauth-refresh lock): see fix OpenClaw "session file locked" timeouts.
"No API key found for provider openai" after an upgrade — the empty-store case (no profile imported at all), not a wedged lock: see OpenClaw "No API key found for provider openai" after an upgrade.

How managed hosting avoids this

The stale lock survives because it sits on a persistent volume that a plain restart can't clear. On managed OpenClaw hosting from Lobsterland, the runtime's coordination locks (.openclaw/locks) are mounted on ephemeral storage that dies with the pod, so a wedged oauth-refresh lock can't outlive a restart. An init step that runs before the openclaw process starts also sweeps any leftover *.lock carrying the lock signature — including the auth-profiles.json.lock that sits next to the profile data — so refresh_contention from a pid:1 lock doesn't recur. You restart; keeping locks from wedging across restarts is the platform's job.

Import your current OpenClaw instance in 1 click

Frequently asked questions

What does OpenClaw's "OAuth refresh failed (refresh_contention)" mean?

It means a previous token refresh grabbed the oauth-refresh file lock and never released it, so every later refresh times out waiting for the lock. The log line is FailoverError: OAuth refresh failed (refresh_contention): another process is already refreshing ... | file lock timeout for /home/node/.openclaw/locks/oauth-refresh/sha256-<hash>.lock. The refresh token itself is usually still valid at this stage — this is a lock deadlock, not a rejected credential (there's no 401 or invalid_grant yet). The fix is to clear the stale lock and restart, not to re-authenticate.

Why does OpenClaw run on Claude when I configured openai-codex?

Because each lane (main, cron, session heartbeats) waits about 150 seconds on the deadlocked oauth-refresh lock, gives up, and model-fallback kicks in: [model-fallback/decision] requested=openai-codex/... reason=auth next=anthropic/claude-sonnet-4-6. So the model line shows anthropic even though your primary is openai-codex. It's a symptom of the stuck refresh lock, not a configuration change — clear the lock and the configured Codex primary comes back.

Why doesn't restarting the instance fix refresh_contention?

Because the lock file lives on the instance's persistent data volume, so it survives a pod restart. The staleness check is "is the holder pid alive?" — but inside a container the openclaw process is always pid 1, and pid 1 always exists, so an abandoned lock is never detected as stale. A plain restart re-reads the same stuck lock. You have to delete the lock file (or run on a host that keeps these locks ephemeral) before the restart can help.

Does deleting the lock fix it, or do I have to re-authenticate?

Check the fallback reason. reason=auth with a file lock timeout means a stale lock — deleting the lock and restarting fixes it and the token is fine. reason=auth_permanent with 401 refresh_token_reused means the single-use refresh token was already burned (prolonged contention can double-use and strand it), and you must re-authenticate the openai-codex profile. Clearing the lock is still the right first step: it removes the 150-second stalls and surfaces which case you're in.

Where are OpenClaw's oauth-refresh lock files?

There are two, in different directories, and both carry {"pid": 1, "createdAt": ...}. The coordination semaphore is /home/node/.openclaw/locks/oauth-refresh/sha256-<profilehash>.lock. The auth-profile write-lock sits next to the profile file at /home/node/.openclaw/agents/<agent>/agent/auth-profiles.json.lock (not under .openclaw/locks/). Clearing only the first can leave the second holding the deadlock, so sweep both. Don't try to read the profiles themselves — they're encrypted at rest (AES-GCM).