免死金牌: OpenClaw + keepalived
openclaw + keepalived避免 openclaw 意外停止服务
·
背景
问题来自 小龙虾自杀, 当我让 OpenClaw 更新一些配置时, 它执行了一条 openclaw gateway stop 命令, 导致 OpenClaw 服务停止, 然后我就干瞪眼了, 还在傻等, 它甚至一句分别的话都没有说…
openclaw gateway stop
解决方案
用 keeplived 来保持 OpenClaw 服务的运行, 在服务停止时, 能够自动重启服务, 顺便也学一下 keepalived 的用法
查看IP
ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 172.16.0.9 netmask 255.255.0.0 broadcast 172.16.255.255
inet6 fe80::f816:3eff:fedc:b5a8 prefixlen 64 scopeid 0x20<link>
ether fa:16:3e:dc:b5:a8 txqueuelen 1000 (Ethernet)
RX packets 67391273 bytes 27877705315 (27.8 GB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 110078632 bytes 24769427866 (24.7 GB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 2288521 bytes 373008683 (373.0 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 2288521 bytes 373008683 (373.0 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
检测脚本
check_openclaw.sh 脚本, 用于检查 OpenClaw 服务是否运行
#!/bin/bash
export XDG_RUNTIME_DIR="/run/user/0"
if systemctl --user is-active openclaw-gateway.service > /dev/null 2>&1; then
if curl -s -f -o /dev/null http://127.0.0.1:18789/health 2>/dev/null; then
#logger -t openclaw-health "Health check PASSED"
exit 0
else
logger -t openclaw-health "Health check FAILED - service not responding"
exit 1
fi
else
logger -t openclaw-health "Health check FAILED - service is not active"
# 尝试启动服务
logger -t openclaw-health "Attempting to start service"
systemctl --user start openclaw-gateway.service
sleep 3
# 检查启动是否成功
if systemctl --user is-active openclaw-gateway.service > /dev/null 2>&1; then
logger -t openclaw-health "Service started successfully"
exit 0
else
logger -t openclaw-health "Failed to start service"
exit 1
fi
fi
keepalived 配置
keepalived 配置 /etc/keepalived/keepalived.conf, 因为就一台服务器, 所以 state 为 MASTER
global_defs {
router_id OPENCLAW_MONITOR
script_user root
enable_script_security
}
vrrp_script chk_openclaw {
script "/usr/local/bin/check_openclaw.sh"
interval 10 # 每10秒检查一次(更频繁)
timeout 5 # 脚本执行超时5秒
weight -20 # 检查失败时优先级降低20
fall 2 # 连续2次失败判定为故障
rise 1 # 1次成功就恢复
}
vrrp_instance OPENCLAW_MONITOR {
state MASTER # 单机必须用 MASTER
interface eth0 # 使用 eth0
virtual_router_id 51
priority 100 # 优先级
advert_int 2 # 心跳间隔2秒
# 虚拟IP配置
virtual_ipaddress {
172.16.0.100/16 dev eth0 # VIP 绑定到 eth0
}
track_script {
chk_openclaw
}
}
启动 keepalived 服务
# 重启 keepalived
systemctl restart keepalived
# 启用 keepalived 服务
systemctl enable keepalived
# 查看 keepalived 状态
systemctl status keepalived
查看 VIP(虚拟IP)
ip addr show eth0 | grep 172.16.0.100
inet 172.16.0.100/16 scope global secondary eth0
演练故障
# 停止 OpenClaw 服务
openclaw gateway stop
journalctl -t openclaw-health -f 查看日志
Apr 02 16:06:24 lavm-0sdc09108n openclaw-health[1567604]: Attempting to start service
Apr 02 16:06:27 lavm-0sdc09108n openclaw-health[1567623]: Service started successfully
Apr 02 16:06:34 lavm-0sdc09108n openclaw-health[1567638]: Health check FAILED - service not responding
nc -z localhost 18789 验证:
Connection to localhost (127.0.0.1) 18789 port [tcp/*] succeeded!
也可以执行 openclaw gateway status 查看服务状态
🦞 OpenClaw 2026.3.24 (cff6dc9)
I can grep it, git blame it, and gently roast it—pick your coping mechanism.
│
◇
Service: systemd (enabled)
File logs: /tmp/openclaw/openclaw-2026-04-02.log
Command: /root/.nvm/versions/node/v22.22.1/bin/node /root/.pnpm-global/5/.pnpm/openclaw@2026.3.24_@napi-rs+canvas@0.1.97/node_modules/openclaw/dist/index.js gateway --port 18789
Service file: ~/.config/systemd/user/openclaw-gateway.service
Service env: OPENCLAW_GATEWAY_PORT=18789
Service config looks out of date or non-standard.
Service config issue: Gateway service PATH includes version managers or package managers; recommend a minimal PATH. (/root/.nvm/versions/node/v22.22.1/bin)
Service config issue: Gateway service uses Node from a version manager; it can break after upgrades. (/root/.nvm/versions/node/v22.22.1/bin/node)
Service config issue: System Node 22 LTS (22.14+) or Node 24 not found; install it before migrating away from version managers.
Recommendation: run "openclaw doctor" (or "openclaw doctor --repair").
Config (cli): ~/.openclaw/openclaw.json
Config (service): ~/.openclaw/openclaw.json
Gateway: bind=loopback (127.0.0.1), port=18789 (service args)
Probe target: ws://127.0.0.1:18789
Dashboard: http://127.0.0.1:18789/
Probe note: Loopback-only gateway; only local clients can connect.
Runtime: running (pid 1567606, state active, sub running, last exit 0, reason 0)
RPC probe: ok
Listening: 127.0.0.1:18789
Troubles: run openclaw status
Troubleshooting: https://docs.openclaw.ai/troubleshooting
openclaw-gateway.service
还有更简单的方法, 直接在 /etc/systemd/system/openclaw-gateway.service 中配置Restart=always 即可, 这样它就会在服务停止时自动重启服务
[Unit]
Description=OpenClaw Gateway
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
#User=$(whoami)
User=root
EnvironmentFile=/opt/openclaw/.env
WorkingDirectory=/home/$(whoami)/.openclaw
ExecStart=$(which openclaw) gateway --force
Restart=always
RestartSec=2
[Install]
WantedBy=multi-user.target
更多推荐




所有评论(0)