Skip to main content

Recovery mode

OpenBao can be started using the -recovery flag to bring it up in Recovery Mode. The main purpose of recovery mode is to allow direct access to storage in case OpenBao isn't starting up due to some newly discovered bug. This probably won't be helpful without an OpenBao expert on hand to advise.

Differences between recovery mode and regular OpenBao operation:

  • none of the usual subsystems run, e.g. expiration, clustering, RPCs from other nodes
  • instead of a regular unseal request, unseal a node by generating a recovery token
  • all requests are to sys/raw and are authenticated using the recovery token

Recovery process

The usual way recovery mode is used is:

  • seal or stop all nodes in the cluster
  • if using Integrated Storage, run bao status on each node to find the highest-index ones (this will require they be running and sealed, as if unsealed a new leader might be elected and writes could happen, confusing the issue)
  • restart the target node in recovery mode
  • generate a recovery token on that node
  • use the recovery token to perform sys/raw requests to repair the node
  • if using Integrated Storage, reform the raft cluster as described below

Integrated storage for HA only (ha_storage)

If Integrated Storage is used in hybrid mode (i.e. for ha_storage), recovery mode will not allow for changes to the Raft data but instead allow for modification of the underlying physical data that is associated with OpenBao's storage backend. This means that the notes regarding Integrated Storage in this doc do not apply.

Integrated storage

With Integrated Storage, not all nodes are equal. It's possible that some nodes are further behind - i.e. haven't applied as many Raft logs. It is important when choosing a node to use for recovery that it has the highest AppliedIndex found in the cluster.

Each node's AppliedIndex value can be obtained by running bao status against the node sealed nodes of the cluster after bringing it down.

Recovery tokens

Recovery tokens are issued in much the same way as root tokens are generated, only using a different endpoint, and the OpenBao node must be sealed first. Unlike root tokens, the recovery token is not persisted, so if OpenBao is restarted into recovery mode a new one must be generated.

Only a single recovery token can be generated. If lost, restart OpenBao and generate a new one.

Raw requests

Requests can be issued to sys/raw in just the same way as in regular OpenBao server mode. The only difference is that in recovery mode, X-Vault-Token must contain a recovery token instead of a service or batch token.

Reform the raft cluster

Recovery mode OpenBao automatically resizes the cluster to size 1. This is necessary because the Raft protocol won't allow changes to be made without a quorum, and in recovery mode we wish to make changes using a single node.

This means that after having used recovery mode, part of the procedure for returning to active service must include re-forming the raft cluster. There are two ways to do so: either delete the OpenBao data directory on the other nodes and re-join them to the recovered node, or use the Manual Recovery Using peers.json approach to get all nodes to agree on what nodes are part of the cluster.