Delay recovery key generation for auto-unseal mechanisms and make rotation authenticated
Summary
We propose adding authenticated root and recovery key rotation endpoints, allow delayed recovery key generation (setting initial shares to 0) and add a CLI helper to manually perform offline recovery key generation on an auto-unseal cluster if keys were lost. This will allow better side-effect-less self-initialization and and solves the issue with the unauthenticated recovery key rotation APIs.
Problem Statement
OpenBao's initialization with an auto-unseal mechanism is side-effecting in a way that requires long-term storage of recovery keys, used to create root tokens for use during initialization (to create any initial audit, auth, and secret engines) and recovery mode.
When running in certain environments, like Kubernetes or provisioning via OpenTofu, having such a side-effecting change makes writing automation difficult. For instance, in the context of GitLab deployments, the GitLab helm chart does not assume RBAC access to create secrets and may optionally be installed with pre-created secrets provisioned manually by an operator. We wish to support this type of ahead-of-time static definition of required secrets; creating a random, long-term, persistent secret (such as recovery key shards) during the installation process is at odds with this.
While a separate RFC discusses approaches for declarative self-initialization and another focuses on static unseal, this focuses on changes to the unseal process to improve operator experience.
This effort is made more important by the recent vulnerability in rekeying recovery and root keys, where an unauthenticated attacker could effectively disable rekey without notification to a server operator. Creating authenticated varieties of these endpoints thus serve a dual purpose: to adequately defend against this category of attack and to allow us to default to zero recovery keys returned on initialization, instead using a privileged token to create them.
Lastly, this document proposes explicit use of rotate
+ <descriptive noun>
as the authoritative descriptive of key rotation processes. rekey
as a term
should not be used to describe something unique as it is a synonym to rotate
.
This should help to address issues with clarity and ambiguity.
User-facing Description
This change lets the sys/init
API endpoint take the 0
value for number of
shares when using an auto-unseal mechanism. When specified, the recovery keys
will not be generated at initialization time and thus will not be returned.
The system will function as if it has no recovery keys. This will not be
applicable for Shamir's based unsealing; when used with
parallel unseal, it will
apply to any specified auto-unseal mechanism. Recovery mode will not function
until such keys are created.
After this change, a new set of endpoints under sys/rotate
will allow
rotating the keyring, root key, and recovery keys. The existing sys/rotate
and sys/rotate/config
will still function but be aliased under the clearer
paths sys/rotate/keyring
and sys/rotate/keyring/config
.
Two new sets of endpoints are added:
sys/rotate/root/*
, for rotating the root key; equivalent to the existingsys/rekey/*
endpoints but fully authenticated, andsys/rotate/recovery/*
, for rotating the recovery key; equivalent to the existingsys/rekey-recovery-key/*
endpoints but fully authenticated.
Note that sys/rotate/root/*
endpoints also rotate Shamir shares when using
the manual Shamir's unseal method; both the root key and Shamir shares are
rotated at the same time.
When no key shares exist, only sys/rotate/recovery/*
can be used to create
them; in this case, a call to sys/rotate/recovery/init
will return working
keys immediately.
The bare sys/rotate/root
will be a sudo
-protected endpoint to directly
perform a root key rotation without requiring existing key shares be provided.
Just root keys are rotated under this endpoint; if the unseal mechanism is
Shamir's, it will not rotate the provided shards. In the future, this could
have a config endpoint like the existing sys/rotate/keyring/config
for doing
automatic (temporal) rotation of the root key.
Finally, a new CLI, bao operator recovery generate
will be added, to support
offline (re-)generation of recovery key shards from a servers' configuration
and access to storage and seal configuration.
Technical Description
From historical evidence, Vault previously intended to allow recovery keys to be optional. However, a subsequent commit removed this, paving the way for "dualseal" (likely an early precursor to what is planned in our parallel unseal RFC).
In particular, we will introduce a new method, SealConfig.ValidateRecovery()
which does not have the shares and threshold limits of the main seal
configuration. When such an empty configuration is in use, we will deny the
sys/rekey-recovery-key
endpoint if it is enabled.
The other benefit of moving these handlers out of http/
an into vault/
will be that we can use the framework
processing and get both authentication
and authorization on the endpoint.
POST sys/rotate/root
This endpoint rotates the root key.
It takes no parameters and returns no data.
This differs from sys/rotate/root/init
in that it does not require key
shares be provided just to rotate the root key independent from the
Shamir's or recovery key shares. Because this is still a privileged action
and inline with the keyring equivalent at sys/rotate/keyring
or
sys/rotate
, this endpoint requires sudo
permissions and is added to the
PathSpecial
for the SystemBackend
.
Note that, for Shamir's seals, the sys/rotate/root/init
endpoint performs
both rotations (the root key and the Shamir shares), conflating concerns,
but sys/rotate/root
and sys/rotate/root/init
endpoint behave the same
for auto-unseal mechanisms (minus the difference in when the rotation occurs).
This sets us up to add a configuration similar to sys/rotate/keyring/config
here as sys/rotate/root/config
to perform automatic root key rotations,
though this isn't strictly part of this config. That would take two
parameters, enabled
and interval
only, as the root key sees relatively
few encryption uses (only when the barrier keyring itself is rotated).
Rationale and Alternatives
This preserves the security implications, however, it protects the rotation
itself by requiring the operator be authenticated to the OpenBao instance.
When operators of the instance are not users within the service, this does
prohibit them from using the new endpoints. However, as long as the added
protections of disable_unauthed_rekey_endpoints=true
are enforced by
default on external listeners, such operators may configure an additional
local-only listener and use it to perform unauthenticated rotations.
Another alternative was not adding the bao operator recovery generate
utility and instead rolling it into the API: if a token has sudo
permissions, allow it to bypass existing share creation and directly
perform rotation. As this changes the threat model for rotating recovery
keys, it was not considered. However, it was added to the root key rotation
endpoint.
Lastly, there remains an alternative to leave these endpoints alone but protect the rekey some other way. An example of this can be seen in the Vault 1.20 fix for HCSEC-2025-11: requiring a nonce on delete. However, this doesn't solve the issue that the initialization endpoint remains unauthenticated.
Downsides
This RFC improves the security posture of rotating in an auto-unseal and manual Shamir's environment, but in a way that requires breaking compatibility with previous releases. On the whole, this improved security posture is beneficial. This has been announced in the v2.3.1 security bulletin but may be too short of notice for some users. Given that key rotation should be relatively rare and likely not automated, it is expected that users will be able to adjust accordingly.
Security Implications
By requiring privileged tokens be used to create and rotate recovery key shares, we seek to improve the security posture of OpenBao. In addition, we prevent creation of long-lived highly privileged access tokens at the start of OpenBao's lifecycle, allowing this to be delayed until an operator can safely store them.
User/Developer Experience
Because the format of these APIs are preserved, this should mostly be an equivalent user experience except due to the deprecation of the existing endpoints. However, the new endpoints should otherwise be fully compatible and an operator should be able to migrate to them, assuming authentication is available within their rotation environment.
Unresolved Questions
None.
Related Issues
Proof of Concept
n/a