All posts
rustengineering

Zero-downtime auto-updates in a Rust binary

By SecuryBlack

OxiPulse ships updates silently. No package manager, no SSH session, no restart window to schedule. Once a new release is tagged on GitHub, every running agent picks it up within 24 hours (and within 5 minutes of the next restart). Here's exactly how it works.

The constraints

A monitoring agent has an awkward update problem. It must:

  1. Download the new binary without interrupting metric collection
  2. Replace itself atomically — a partially-written binary would be unexecutable
  3. Hand off cleanly so the service manager restarts it with the new version
  4. Never brick a remote server if a download fails

How OxiPulse solves it

The updater runs as a background Tokio task that wakes up 5 minutes after startup and then every 24 hours.

startup
  └─ 5 min → check GitHub Releases API
              ├─ no new version → sleep 24 h → repeat
              └─ new version found
                  ├─ download binary for current platform/arch
                  ├─ verify SHA256
                  ├─ atomic rename (replace in place)
                  └─ std::process::exit(0)

The key step is the atomic rename. On Linux, rename(2) is guaranteed atomic on the same filesystem. On Windows, the self_update crate uses MoveFileExW with MOVEFILE_REPLACE_EXISTING. Either way, there is no moment where the binary on disk is partially written.

Delegating the restart

OxiPulse does not restart itself. After replacing the binary it calls std::process::exit(0) — a clean exit with code 0. The OS service manager sees a stopped service and restarts it automatically:

  • systemd: Restart=on-failure (or always) in the unit file restarts the process
  • Windows Service Manager: the service recovery policy restarts on exit

The new binary starts, collects the version from CARGO_PKG_VERSION at compile time, and reports it to the ingestor as a resource attribute on every OTLP export.

What happens if the download fails?

The update task catches all errors and logs a warning. The existing binary keeps running. On the next 24-hour cycle it tries again. The agent never exits unless a complete, verified binary is in place.

match tokio::task::spawn_blocking(check_and_update).await {
    Ok(Ok(updated)) => {
        if updated {
            std::process::exit(0); // clean handoff to service manager
        }
    }
    Ok(Err(e)) => warn!("update check failed: {}", e), // keep running
    Err(e)  => error!("update task panicked: {}", e),  // keep running
}

No package manager required

This matters more than it sounds. Many servers run locked-down environments where apt, yum or winget are restricted or behind an approval process. OxiPulse's binary is a single static file. There is nothing to install, no dependency graph to satisfy, and no repository to trust beyond GitHub Releases.

The SHA256 verification step ensures the downloaded binary matches what was published, protecting against corrupted downloads or network interference.