Raft telemetry provides information on
OpenBao Integrated Storage.
Default metrics
vault.raft.apply
| Metric type | Value | Description |
|---|
| counter | number | Number of transactions in the configured interval |
The vault.raft.apply metric is generally a good indicator of the write load
on your raft internal storage.
vault.raft.barrier
| Metric type | Value | Description |
|---|
| counter | number | Number of times the node started the barrier |
A node starts the barrier by issuing a blocking call when it wants to ensure
that all pending operations that need to be applied to the finite state machine
are properly queued.
vault.raft.candidate.electSelf
| Metric type | Value | Description |
|---|
| summary | ms | Time required for a node to send a vote request to a peer |
vault.raft.commitNumLogs
| Metric type | Value | Description |
|---|
| gauge | number | Number of logs processed for application to the finite state machine in a single batch |
vault.raft.commitTime
| Metric type | Value | Description |
|---|
| summary | ms | Time required to commit a new entry to the raft log on the leader node |
vault.raft.compactLogs
| Metric type | Value | Description |
|---|
| summary | ms | Time required to trim unnecessary logs |
vault.raft.fsm.apply
| Metric type | Value | Description |
|---|
| summary | number | Number of logs committed by the finite state machine since the last interval |
vault.raft.fsm.applyBatch
| Metric type | Value | Description |
|---|
| summary | ms | Time required by the finite state machine to apply the most recent batch of logs |
vault.raft.fsm.applyBatchNum
| Metric type | Value | Description |
|---|
| counter | number | Number of logs applied in the most recent batch |
vault.raft.fsm.enqueue
| Metric type | Value | Description |
|---|
| summary | ms | Time required to queue up a batch of logs for the finite state machine to apply |
vault.raft.fsm.restore
| Metric type | Value | Description |
|---|
| summary | ms | Time required by the finite state machine to complete a restore operation from a snapshot |
vault.raft.fsm.snapshot
| Metric type | Value | Description |
|---|
| summary | ms | Time required by the finite state machine to record state information for the current snapshot |
vault.raft.fsm.store_config
| Metric type | Value | Description |
|---|
| summary | ms | Time required to store the most recent raft configuration |
vault.raft.get
| Metric type | Value | Description |
|---|
| summary | ms | Time required to retrieve an entry from underlying storage |
vault.raft.list
| Metric type | Value | Description |
|---|
| summary | ms | Time required to retrieve a list of keys from underlying storage |
vault.raft.peers
| Metric type | Value | Description |
|---|
| gauge | number | The number of peers in the raft cluster configuration |
vault.raft.restore
| Metric type | Value | Description |
|---|
| counter | number | Number of times that the node performed a restore operation |
In the context of raft storage, a restore operation refers to the process where
raft consumes an external snapshot to restore its state.
vault.raft.restoreUserSnapshot
| Metric type | Value | Description |
|---|
| timer | ms | Time required to restore the finite state machine from a user snapshot |
vault.raft.rpc.appendEntries
| Metric type | Value | Description |
|---|
| timer | ms | Time required to process a remote appendEntries call from a node |
vault.raft.rpc.appendEntries.processLogs
| Metric type | Value | Description |
|---|
| timer | ms | Time required to completely process the outstanding logs for the given node |
vault.raft.rpc.appendEntries.storeLogs
| Metric type | Value | Description |
|---|
| timer | ms | Time required to record any outstanding logs since the last request to append entries for the given node |
vault.raft.rpc.installSnapshot
| Metric type | Value | Description |
|---|
| timer | ms | Time required to process an installSnapshot RPC call |
Only nodes currently in the follower state report
vault.raft.rpc.installSnapshot metrics.
vault.raft.rpc.processHeartbeat
| Metric type | Value | Description |
|---|
| timer | ms | Time required to process a heartbeat request |
vault.raft.rpc.requestVote
| Metric type | Value | Description |
|---|
| summary | ms | Time required to complete a requestVote call |
vault.raft.snapshot.create
| Metric type | Value | Description |
|---|
| timer | ms | Time required to capture a new snapshot |
vault.raft.snapshot.persist
| Metric type | Value | Description |
|---|
| timer | ms | Time required to record snapshot meta information to disk while taking snapshots |
vault.raft.snapshot.takeSnapshot
| Metric type | Value | Description |
|---|
| timer | ms | Total time required to create and persist the current snapshot |
In most cases, vault.raft.snapshot.takeSnapshot is approximately equal to
vault.raft.snapshot.create + vault.raft.snapshot.persist.
vault.raft.state.candidate
| Metric type | Value | Description |
|---|
| counter | number | Number of times the raft server initiated an election |
vault.raft.state.follower
| Metric type | Value | Description |
|---|
| summary | number | Number of times in the configured interval that the raft server became a follower |
Nodes transition to follower state under the following conditions:
- when the node joins the cluster
- when a leader is elected, but the node was not elected leader
vault.raft.state.leader
| Metric type | Value | Description |
|---|
| counter | number | Number of times the raft server became a leader |
vault.raft.transition.heartbeat_timeout
| Metric type | Value | Description |
|---|
| summary | number | Number of times that the node transitioned to candidate state after not receiving a heartbeat message from the last known leader |
vault.raft.transition.leader_lease_timeout
| Metric type | Value | Description |
|---|
| counter | number | The number of times the leader could not contact a quorum of nodes and therefore stepped down |
vault.raft.verify_leader
| Metric type | Value | Description |
|---|
| counter | number | Number of times in the configured interval that the node confirmed it is still the leader |
Autopilot metrics
<Note heading="Metrics only apply to the active node">
Autopilot only runs on the active node, so autopilot metrics are only
captured for the current active node.
</Note>
vault.autopilot.failure_tolerance
| Metric type | Value | Description |
|---|
| gauge | nodes | The number of healthy nodes in excess of quorum |
The failure tolerance indicates how many currently healthy nodes can fail without losing quorum.
vault.autopilot.healthy
| Metric type | Value | Description |
|---|
| gauge | boolean | Indicates whether all nodes are healthy |
- A value of
1 on the gauge means that Autopilot deems all nodes healthy.
- A value of
0 on the gauge means that Autopilot deems at least 1 node
unhealthy.
vault.autopilot.node.healthy
| Metric type | Value | Description |
|---|
| gauge | boolean | Indicates whether the active node is healthy |
- A value of
1 on the gauge means that Autopilot deems the node indicated by
node_id is healthy.
- A value of
0 on the gauge means that Autopilot cannot communicate with the
node indicated by node_id, or deems the node unhealthy.
Leadership change metrics
Leadership change metrics indicate the overall performance of the integrated
storage on raft servers and the network connection between raft nodes.
vault.raft.leader.dispatchLog
| Metric type | Value | Description |
|---|
| timer | ms | Time required for the leader node to write a log entry to disk |
vault.raft.leader.dispatchNumLogs
| Metric type | Value | Description |
|---|
| gauge | number | Number of logs committed to disk in the most recent batch |
| Metric type | Value | Description |
|---|
| summary | ms | Time since the leader was last able to contact the follower nodes when checking its leader lease |
Raft replication metrics
vault.raft.replication.appendEntries.log
| Metric type | Value | Description |
|---|
| summary | number | Number of logs replicated to a node to establish parity with leader logs |
vault.raft.replication.appendEntries.rpc
| Metric type | Value | Description |
|---|
| timer | ms | Time required to replicate leader node log entries to all follower nodes with appendEntries |
vault.raft.replication.heartbeat
| Metric type | Value | Description |
|---|
| timer | ms | Time required to invoke appendEntries on a peer so the peer does not time out |
vault.raft.replication.installSnapshot
| Metric type | Value | Description |
|---|
| timer | ms | Time required to process an installSnapshot RPC call |
Only nodes currently in the follower state report
vault.raft.replication.installSnapshot metrics.
Storage metrics
vault.raft_storage.bolt.cursor.count
| Metric type | Value | Description |
|---|
| gauge | number | Number of cursors created in the Bolt database |
vault.raft_storage.bolt.freelist.allocated_bytes
| Metric type | Value | Description |
|---|
| gauge | bytes | Total space allocated for the freelist for the Bolt database |
vault.raft_storage.bolt.freelist.free_pages
| Metric type | Value | Description |
|---|
| gauge | number | Number of free pages in the freelist for the Bolt database |
vault.raft_storage.bolt.freelist.pending_pages
| Metric type | Value | Description |
|---|
| gauge | number | Number of pending pages in the freelist for the Bolt database |
vault.raft_storage.bolt.freelist.used_bytes
| Metric type | Value | Description |
|---|
| gauge | bytes | Total space used by the freelist for the Bolt database |
vault.raft_storage.bolt.node.count
| Metric type | Value | Description |
|---|
| gauge | number | Number of node allocations for the Bolt database |
vault.raft_storage.bolt.node.dereferences
| Metric type | Value | Description |
|---|
| gauge | number | Total number of node dereferences by the Bolt database |
vault.raft_storage.bolt.page.bytes_allocated
| Metric type | Value | Description |
|---|
| gauge | bytes | Total space allocated to the Bolt database |
vault.raft_storage.bolt.page.count
| Metric type | Value | Description |
|---|
| gauge | number | Number of page allocations in the Bolt database |
vault.raft_storage.bolt.rebalance.count
| Metric type | Value | Description |
|---|
| gauge | number | Number of node rebalances performed by the Bolt database |
vault.raft_storage.bolt.rebalance.time
| Metric type | Value | Description |
|---|
| summary | ms | Time required by the Bolt database to rebalance nodes |
vault.raft_storage.bolt.spill.count
| Metric type | Value | Description |
|---|
| gauge | number | Number of nodes spilled by the Bolt database |
vault.raft_storage.bolt.spill.time
| Metric type | Value | Description |
|---|
| summary | ms | Total time spent spilling by the Bolt database |
vault.raft_storage.bolt.split.count
| Metric type | Value | Description |
|---|
| gauge | number | Number of nodes split by the Bolt database |
vault.raft_storage.bolt.transaction.currently_open_read_transactions
| Metric type | Value | Description |
|---|
| gauge | number | Number of in-process read transactions for the Bolt DB |
vault.raft_storage.bolt.transaction.started_read_transactions
| Metric type | Value | Description |
|---|
| gauge | number | Number of read transactions started by the Bolt DB |
vault.raft_storage.bolt.write.count
| Metric type | Value | Description |
|---|
| gauge | number | Number of writes performed by the Bolt database |
vault.raft_storage.bolt.write.time
| Metric type | Value | Description |
|---|
| counter | ms | Total cumulative time the Bolt database has spent writing to disk. |
vault.raft_storage.follower.applied_index_delta
| Metric type | Value | Description |
|---|
| gauge | number | The difference between the index applied by the leader and the index applied by the follower as reported by echoes |
vault.raft_storage.follower.last_heartbeat_ms
| Metric type | Value | Description |
|---|
| gauge | ms | Time since the follower last received a heartbeat request |
vault.raft_storage.stats.applied_index
| Metric type | Value | Description |
|---|
| gauge | number | Highest index of raft log last applied to the finite state machine or added to fsm_pending queue |
vault.raft_storage.stats.commit_index
| Metric type | Value | Description |
|---|
| gauge | number | Index of the last raft log committed to disk on the node |
vault.raft_storage.stats.fsm_pending
| Metric type | Value | Description |
|---|
| gauge | number | Number of raft logs queued by the node for the finite state machine to apply |
vault.raft-storage.delete
| Metric type | Value | Description |
|---|
| timer | ms | Time required to insert log entry to delete path |
vault.raft-storage.entry_size
| Metric type | Value | Description |
|---|
| summary | bytes | The total size of a raft entry during log application |
vault.raft-storage.get
| Metric type | Value | Description |
|---|
| timer | ms | Time required to retrieve a value for the given path from the finite state machine |
vault.raft-storage.list
| Metric type | Value | Description |
|---|
| timer | ms | Time required to list all entries under the prefix from the finite state machine |
vault.raft-storage.put
| Metric type | Value | Description |
|---|
| timer | ms | Time required to insert a log entry to the persist path |
vault.raft-storage.transaction
| Metric type | Value | Description |
|---|
| timer | ms | Time required to insert operations into a single log |