Skip to main content

Raft telemetry

Raft telemetry provides information on OpenBao integrated storage.

Default metrics

vault.raft.apply

Metric typeValueDescription
counternumberNumber of transactions in the configured interval

The vault.raft.apply metric is generally a good indicator of the write load on your raft internal storage.

vault.raft.barrier

Metric typeValueDescription
counternumberNumber of times the node started the barrier

A node starts the barrier by issuing a blocking call when it wants to ensure that all pending operations that need to be applied to the finite state machine are properly queued.

vault.raft.candidate.electSelf

Metric typeValueDescription
summarymsTime required for a node to send a vote request to a peer

vault.raft.commitNumLogs

Metric typeValueDescription
gaugenumberNumber of logs processed for application to the finite state machine in a single batch

vault.raft.commitTime

Metric typeValueDescription
summarymsTime required to commit a new entry to the raft log on the leader node

vault.raft.compactLogs

Metric typeValueDescription
summarymsTime required to trim unnecessary logs

vault.raft.fsm.apply

Metric typeValueDescription
summarynumberNumber of logs committed by the finite state machine since the last interval

vault.raft.fsm.applyBatch

Metric typeValueDescription
summarymsTime required by the finite state machine to apply the most recent batch of logs

vault.raft.fsm.applyBatchNum

Metric typeValueDescription
counternumberNumber of logs applied in the most recent batch

vault.raft.fsm.enqueue

Metric typeValueDescription
summarymsTime required to queue up a batch of logs for the finite state machine to apply

vault.raft.fsm.restore

Metric typeValueDescription
summarymsTime required by the finite state machine to complete a restore operation from a snapshot

vault.raft.fsm.snapshot

Metric typeValueDescription
summarymsTime required by the finite state machine to record state information for the current snapshot

vault.raft.fsm.store_config

Metric typeValueDescription
summarymsTime required to store the most recent raft configuration

vault.raft.get

Metric typeValueDescription
summarymsTime required to retrieve an entry from underlying storage

vault.raft.list

Metric typeValueDescription
summarymsTime required to retrieve a list of keys from underlying storage

vault.raft.peers

Metric typeValueDescription
guagenumberThe number of peers in the raft cluster configuration

vault.raft.restore

Metric typeValueDescription
counternumberNumber of times that the node performed a restore operation

In the context of raft storage, a restore operation refers to the process where raft consumes an external snapshot to restore its state.

vault.raft.restoreUserSnapshot

Metric typeValueDescription
timermsTime required to restore the finite state machine from a user snapshot

vault.raft.rpc.appendEntries

Metric typeValueDescription
timermsTime required to process a remote appendEntries call from a node

vault.raft.rpc.appendEntries.processLogs

Metric typeValueDescription
timermsTime required to completely process the outstanding logs for the given node

vault.raft.rpc.appendEntries.storeLogs

Metric typeValueDescription
timermsTime required to record any outstanding logs since the last request to append entries for the given node

vault.raft.rpc.installSnapshot

Metric typeValueDescription
timermsTime required to process an installSnapshot RPC call

Only nodes currently in the follower state report vault.raft.rpc.installSnapshot metrics.

vault.raft.rpc.processHeartbeat

Metric typeValueDescription
timermsTime required to process a heartbeat request

vault.raft.rpc.requestVote

Metric typeValueDescription
summarymsTime required to complete a requestVote call

vault.raft.snapshot.create

Metric typeValueDescription
timermsTime required to capture a new snapshot

vault.raft.snapshot.persist

Metric typeValueDescription
timermsTime required to record snapshot meta information to disk while taking snapshots

vault.raft.snapshot.takeSnapshot

Metric typeValueDescription
timermsTotal time required to create and persist the current snapshot

In most cases, vault.raft.snapshot.takeSnapshot is approximately equal to vault.raft.snapshot.create + vault.raft.snapshot.persist.

vault.raft.state.candidate

Metric typeValueDescription
counternumberNumber of times the raft server initiated an election

vault.raft.state.follower

Metric typeValueDescription
summarynumberNumber of times in the configured interval that the raft server became a follower

Nodes transition to follower state under the following conditions:

  • when the node joins the cluster
  • when a leader is elected, but the node was not elected leader

vault.raft.state.leader

Metric typeValueDescription
counternumberNumber of times the raft server became a leader

vault.raft.transition.heartbeat_timeout

Metric typeValueDescription
summarynumberNumber of times that the node transitioned to candidate state after not receiving a heartbeat message from the last known leader

vault.raft.transition.leader_lease_timeout

Metric typeValueDescription
counternumberThe number of times the leader could not contact a quorum of nodes and therefore stepped down

vault.raft.verify_leader

Metric typeValueDescription
counternumberNumber of times in the configured interval that the node confirmed it is still the leader

Autopilot metrics

<Note heading="Metrics only apply to the active node"> Autopilot only runs on the active node, so autopilot metrics are only captured for the current active node. </Note>

vault.autopilot.failure_tolerance

Metric typeValueDescription
gaugenodesThe number of healthy nodes in excess of quorum

The failure tolerance indicates how many currently healthy nodes can fail without losing quorum.

vault.autopilot.healthy

Metric typeValueDescription
gaugebooleanIndicates whether all nodes are healthy
  • A value of 1 on the gauge means that Autopilot deems all nodes healthy.
  • A value of 0 on the gauge means that Autopilot deems at least 1 node unhealthy.

vault.autopilot.node.healthy

Metric typeValueDescription
gaugebooleanIndicates whether the active node is healthy
  • A value of 1 on the gauge means that Autopilot deems the node indicated by node_id is healthy.
  • A value of 0 on the gauge means that Autopilot cannot communicate with the node indicated by node_id, or deems the node unhealthy.

Leadership change metrics

Leadership change metrics indicate the overall performance of the integrated storage on raft servers and the network connection between raft nodes.

vault.raft.leader.dispatchLog

Metric typeValueDescription
timermsTime required for the leader node to write a log entry to disk

vault.raft.leader.dispatchNumLogs

Metric typeValueDescription
gaugenumberNumber of logs committed to disk in the most recent batch

vault.raft.leader.lastContact

Metric typeValueDescription
summarymsTime since the leader was last able to contact the follower nodes when checking its leader lease

Raft replication metrics

vault.raft.replication.appendEntries.log

Metric typeValueDescription
summarynumberNumber of logs replicated to a node to establish parity with leader logs

vault.raft.replication.appendEntries.rpc

Metric typeValueDescription
timermsTime required to replicate leader node log entries to all follower nodes with appendEntries

vault.raft.replication.heartbeat

Metric typeValueDescription
timermsTime required to invoke appendEntries on a peer so the peer does not time out

vault.raft.replication.installSnapshot

Metric typeValueDescription
timermsTime required to process an installSnapshot RPC call

Only nodes currently in the follower state report vault.raft.replication.installSnapshot metrics.

Storage metrics

vault.raft_storage.bolt.cursor.count

Metric typeValueDescription
gaugenumberNumber of cursors created in the Bolt database

vault.raft_storage.bolt.freelist.allocated_bytes

Metric typeValueDescription
gaugebytesTotal space allocated for the freelist for the Bolt database

vault.raft_storage.bolt.freelist.free_pages

Metric typeValueDescription
gaugenumberNumber of free pages in the freelist for the Bolt database

vault.raft_storage.bolt.freelist.pending_pages

Metric typeValueDescription
gaugenumberNumber of pending pages in the freelist for the Bolt database

vault.raft_storage.bolt.freelist.used_bytes

Metric typeValueDescription
gaugebytesTotal space used by the freelist for the Bolt database

vault.raft_storage.bolt.node.count

Metric typeValueDescription
gaugenumberNumber of node allocations for the Bolt database

vault.raft_storage.bolt.node.dereferences

Metric typeValueDescription
gaugenumberTotal number of node dereferences by the Bolt database

vault.raft_storage.bolt.page.bytes_allocated

Metric typeValueDescription
gaugebytesTotal space allocated to the Bolt database

vault.raft_storage.bolt.page.count

Metric typeValueDescription
gaugenumberNumber of page allocations in the Bolt database

vault.raft_storage.bolt.rebalance.count

Metric typeValueDescription
gaugenumberNumber of node rebalances performed by the Bolt database

vault.raft_storage.bolt.rebalance.time

Metric typeValueDescription
summarymsTime required by the Bolt database to rebalance nodes

vault.raft_storage.bolt.spill.count

Metric typeValueDescription
gaugenumberNumber of nodes spilled by the Bolt database

vault.raft_storage.bolt.spill.time

Metric typeValueDescription
summarymsTotal time spent spilling by the Bolt database

vault.raft_storage.bolt.split.count

Metric typeValueDescription
gaugenumberNumber of nodes split by the Bolt database

vault.raft_storage.bolt.transaction.currently_open_read_transactions

Metric typeValueDescription
gaugenumberNumber of in-process read transactions for the Bolt DB

vault.raft_storage.bolt.transaction.started_read_transactions

Metric typeValueDescription
gaugenumberNumber of read transactions started by the Bolt DB

vault.raft_storage.bolt.write.count

Metric typeValueDescription
gaugenumberNumber of writes performed by the Bolt database

vault.raft_storage.bolt.write.time

Metric typeValueDescription
countermsTotal cumulative time the Bolt database has spent writing to disk.

vault.raft_storage.follower.applied_index_delta

Metric typeValueDescription
gaugenumberThe difference between the index applied by the leader and the index applied by the follower as reported by echoes

vault.raft_storage.follower.last_heartbeat_ms

Metric typeValueDescription
gaugemsTime since the follower last received a heartbeat request

vault.raft_storage.stats.applied_index

Metric typeValueDescription
gaugenumberHighest index of raft log last applied to the finite state machine or added to fsm_pending queue

vault.raft_storage.stats.commit_index

Metric typeValueDescription
gaugenumberIndex of the last raft log committed to disk on the node

vault.raft_storage.stats.fsm_pending

Metric typeValueDescription
gaugenumberNumber of raft logs queued by the node for the finite state machine to apply

vault.raft-storage.delete

Metric typeValueDescription
timermsTime required to insert log entry to delete path

vault.raft-storage.entry_size

Metric typeValueDescription
summarybytesThe total size of a raft entry during log application

vault.raft-storage.get

Metric typeValueDescription
timermsTime required to retrieve a value for the given path from the finite state machine

vault.raft-storage.list

Metric typeValueDescription
timermsTime required to list all entries under the prefix from the finite state machine

vault.raft-storage.put

Metric typeValueDescription
timermsTime required to insert a log entry to the persist path

vault.raft-storage.transaction

Metric typeValueDescription
timermsTime required to insert operations into a single log