OpenClaw Configuration

Configure OpenClaw with staik to use all models directly via Telegram, Discord, or the terminal.

Configuration

Add the following to your openclaw.json:

JSONopenclaw.json
{
  "models": {
    "providers": {
      "staik": {
        "baseUrl": "https://api.staik.se/v1",
        "apiKey": "sk-st-your-key",
        "api": "openai-completions",
        "models": [
          {
            "id": "gemma4:31b",
            "name": "Gemma 4 31B",
            "contextWindow": 262144,
            "contextTokens": 64000,
            "maxTokens": 8192
          },
          {
            "id": "qwen3.6:35b-a3b",
            "name": "Qwen 3.5 35B A3B",
            "contextWindow": 262144,
            "contextTokens": 64000,
            "maxTokens": 8192
          },
          {
            "id": "qwen3.5:9b",
            "name": "Qwen 3.5 9B",
            "contextWindow": 32768,
            "contextTokens": 28000,
            "maxTokens": 4096
          }
        ]
      }
    }
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "staik/gemma4:31b",
        "fallbacks": ["staik/qwen3.6:35b-a3b", "staik/qwen3.5:9b"]
      },
      "compaction": {
        "mode": "safeguard",
        "keepRecentTokens": 20000,
        "reserveTokens": 24000
      }
    }
  }
}

Available models

ModelBest forContext windowVision
gemma4:31bReview, accuracy, language, vision262 144
qwen3.6:35b-a3bCoding, complex tasks, vision262 144
qwen3.5:9bFast responses, simpler tasks32 768

Switch models with /models in OpenClaw. fallbacks enables automatic failover if the primary model is unavailable.

Fine-tune context handling

These parameters control how OpenClaw manages conversation history and model responses.

ParameterWhereDescription
contextTokensper modelMax tokens of conversation history per request. Lower if the model times out.
maxTokensper modelMax tokens in the model response. Lower if responses are cut off or take too long.
keepRecentTokenscompactionNumber of tokens of recent messages that are never compressed. Raise for more context in short conversations.
reserveTokenscompactionTokens reserved for the model response during compaction. Raise if responses get truncated.
contextWindowper modelThe model's total context window (262K for all staik models). Do not change.

Tips

  • If the model stops responding mid-conversation — lower contextTokens.
  • If responses get truncated — raise reserveTokens.
  • If the model forgets what you just discussed — raise keepRecentTokens.

Config per plan

Every request counts both prompt tokens (conversation history) and completion tokens (model response). The key to staying within budget is contextTokens — it controls how much history is sent with each request.

Early Adopter 100 000 tokens / day

With 100K tokens per day you need to be frugal. Use the fast 9b model and keep context short.

Early Adopteropenclaw.json
{
  "models": {
    "providers": {
      "staik": {
        "baseUrl": "https://api.staik.se/v1",
        "apiKey": "sk-st-your-key",
        "api": "openai-completions",
        "models": [
          {
            "id": "qwen3.5:9b",
            "name": "Qwen 3.5 9B",
            "contextWindow": 32768,
            "contextTokens": 12000,
            "maxTokens": 2048
          }
        ]
      }
    }
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "staik/qwen3.5:9b"
      },
      "compaction": {
        "mode": "safeguard",
        "keepRecentTokens": 6000,
        "reserveTokens": 8000,
        "maxHistoryShare": 0.3
      }
    }
  }
}

Budget: ~12K tokens/request → ~8 conversation rounds per day

  • qwen3.5:9b is 3x faster and uses fewer tokens
  • contextTokens: 12K (not 64K) — 5x fewer prompt tokens per request
  • maxTokens: 2048 — sufficient for most responses
  • maxHistoryShare: 0.3 — compresses early to save tokens

Hobby 250 000 tokens / day

More room for longer conversations. You can use larger models and more context.

Hobbyopenclaw.json
{
  "models": {
    "providers": {
      "staik": {
        "baseUrl": "https://api.staik.se/v1",
        "apiKey": "sk-st-your-key",
        "api": "openai-completions",
        "models": [
          {
            "id": "qwen3.6:35b-a3b",
            "name": "Qwen 3.5 35B",
            "contextWindow": 262144,
            "contextTokens": 24000,
            "maxTokens": 4096
          },
          {
            "id": "qwen3.5:9b",
            "name": "Qwen 3.5 9B",
            "contextWindow": 32768,
            "contextTokens": 16000,
            "maxTokens": 4096
          }
        ]
      }
    }
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "staik/qwen3.5:9b",
        "fallbacks": ["staik/qwen3.6:35b-a3b"]
      },
      "compaction": {
        "mode": "safeguard",
        "keepRecentTokens": 10000,
        "reserveTokens": 16000,
        "maxHistoryShare": 0.4
      }
    }
  }
}

Budget: ~16K tokens/request → ~15 conversation rounds per day

  • 9b as primary, 35b as fallback for complex tasks
  • contextTokens: 24K on 35b gives good context without blowing the budget

Agent 500 000 tokens / hour (rolling)

The Agent plan uses a rolling window instead of a daily limit — perfect for intense sessions. Full context and all models.

Agentopenclaw.json
{
  "models": {
    "providers": {
      "staik": {
        "baseUrl": "https://api.staik.se/v1",
        "apiKey": "sk-st-your-key",
        "api": "openai-completions",
        "models": [
          {
            "id": "gemma4:31b",
            "name": "Gemma 4 31B",
            "contextWindow": 262144,
            "contextTokens": 48000,
            "maxTokens": 8192
          },
          {
            "id": "qwen3.6:35b-a3b",
            "name": "Qwen 3.5 35B",
            "contextWindow": 262144,
            "contextTokens": 48000,
            "maxTokens": 8192
          },
          {
            "id": "qwen3.5:9b",
            "name": "Qwen 3.5 9B",
            "contextWindow": 32768,
            "contextTokens": 28000,
            "maxTokens": 4096
          }
        ]
      }
    }
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "staik/gemma4:31b",
        "fallbacks": ["staik/qwen3.6:35b-a3b", "staik/qwen3.5:9b"]
      },
      "compaction": {
        "mode": "safeguard",
        "keepRecentTokens": 20000,
        "reserveTokens": 24000,
        "maxHistoryShare": 0.6
      }
    }
  }
}

Budget: ~50K tokens/request → ~10 requests per hour (capacity is continuously reclaimed)

  • Gemma 4 as primary — best for review and language
  • All three models available with automatic fallback
  • 48K context — sufficient for large code projects
  • Rolling window: tokens from an hour ago no longer count

Quick reference

PlancontextTokensmaxTokensModelCapacity
Early Adopter12 0002 048qwen3.5:9b~8 / day
Hobby Mini8 0002 048qwen3.5:9b~5 / day
Hobby24 0004 096qwen3.5:9b + 35b~15 / day
Agent Mini24 0004 096qwen3.6:35b~4 / h
Agent48 0008 192gemma4 + 35b + 9b~10 / h
Agent Pro64 0008 192gemma4 + 35b + 9b~20 / h

Sub-agents (ACP)

When your agent works on larger projects (e.g. an HTML/CSS/JS one-page with multiple files), OpenClaw normally splits the work into sub-agents via ACP (Agent Coordination Protocol). Each sub-agent gets isolated context, which dramatically lowers token usage.

Why it matters

Without ACP, the main agent has to hold the entire project state in its context between each file generation. With 5 files and 30k tokens of context per request, you burn 150k+ tokens in a single round. With ACP, each sub-agent only gets its file-specific context, which can lower token usage by 70-80%.

Configuration

Add the following to your openclaw.json:

ACPopenclaw.json
{
  "agents": {
    "list": [{
      "id": "<your_agent_id>",
      "runtime": {
        "type": "acp",
        "acp": {
          "agent": "codex",
          "backend": "acpx",
          "mode": "persistent"
        }
      }
    }]
  },
  "session": {
    "threadBindings": {
      "enabled": true,
      "spawnSubagentSessions": true
    }
  }
}

Verify ACP is working

Run the following directly in your chat (e.g. Telegram) to spawn a sub-agent manually:

/acp spawn <agent_id>

To see exactly what OpenClaw is doing in real time (which sub-agents are spawned, where errors occur), enable verbose logging directly in your chat:

/verbose

Common issues

"Cannot start sub-agent (gateway connection closed)"

This is OpenClaw's internal gateway, not the staik API gateway. Check that:

  • the acpx backend is installed
  • gateway.mode: "local" is set
  • the OpenClaw gateway is running on the port set in gateway.port
  • no other processes are blocking the port (e.g. previous instances that didn't shut down)

When ACP fails, the main agent falls back to doing everything itself, which is why the entire project context gets sent in every request — and you burn tokens fast.

Troubleshooting

OpenClaw produces no response

If OpenClaw does not produce any response at all, try turning off streaming in your provider configuration. Add "streaming": "off" to your provider block:

Fixopenclaw.json
{
  "models": {
    "providers": {
      "staik": {
        "baseUrl": "https://api.staik.se/v1",
        "apiKey": "sk-st-your-key",
        "api": "openai-completions",
        "streaming": "off",
        "models": [...]
      }
    }
  }
}

Why does this help?

Some channels and configurations can have issues with streaming responses from OpenAI-compatible APIs. Turning off streaming makes OpenClaw send a regular request and wait for the full response before displaying it — bypassing any streaming-related issues.

Agent asks "Should I run?" between steps

For large multi-step tasks, the agent may break work into batches and ask for confirmation between them. This comes from OpenClaw's own agent prompt or the model's default behavior — not the staik gateway. OpenClaw lacks a built-in flag to disable this, but you can override it with a system prompt.

Option 1: per Telegram group

Fixopenclaw.json
{
  "channels": {
    "telegram": {
      "groups": {
        "<your_chat_id>": {
          "systemPrompt": "You are an autonomous agent. Execute ALL steps of a task in sequence without asking between them. Stop only on (1) an explicit error requiring a decision, or (2) when the entire task is done. Never use phrases like \"Should I run?\" or \"Do you want me to continue?\" — just execute the next step."
        }
      }
    }
  }
}

Option 2: workspace files (applies globally)

OpenClaw reads agent persona from markdown files in the workspace folder (e.g. ~/.openclaw/workspace/). Create or edit IDENTITY.md or SOUL.md with the same instruction — it then applies globally, not just per channel.

Other causes

  • Sub-agent could not start: If OpenClaw reports that the gateway connection is closed, its internal sub-agent system has failed (that's OpenClaw's own gateway, not the staik API gateway). The main agent then serializes work and often breaks it into batches.
  • Task too large for one response: If the task requires more output than maxTokens allows, the agent must split it. Raise maxTokens to 8192 if budget allows.

429 Too Many Requests mid-session

You've hit your token limit (daily for Hobby/EA, hourly rolling for Agent). Large contexts (~30k tokens/request) can hit Agent (500k/h) in 16 requests. Solutions:

  • Lower contextTokens from 64000 → 16000–32000
  • Use qwen3.5:9b as primary instead of 35b
  • Upgrade to a higher-limit plan — see pricing

Use the X-RateLimit-Used-Tokens response header to monitor usage in real time.

Next steps