All OpenBao telemetry metrics
For completeness, we provide a full list of available metrics below in
alphabetic order by name.
Full metric list
database.Close
| Metric type | Value | Description |
|---|
| summary | ms | Time required to close a database secret engine (across all database secrets engines) |
database.Close.error
| Metric type | Value | Description |
|---|
| counter | number | Number of errors encountered across all database secrets engines while closing database connections |
database.CreateUser
| Metric type | Value | Description |
|---|
| summary | ms | Time required to create a user across all database secrets engines |
database.CreateUser.error
| Metric type | Value | Description |
|---|
| counter | number | Number of errors encountered across all database secrets engines while creating users |
database.Initialize
| Metric type | Value | Description |
|---|
| summary | ms | Time required to initialize a database secret engine (across all database secrets engines) |
database.Initialize.error
| Metric type | Value | Description |
|---|
| counter | number | Number of errors encountered across all database secrets engines while initializing the database |
database.{NAME}.Close
| Metric type | Value | Description |
|---|
| summary | ms | Time required to close the database secrets engine {NAME} |
database.{NAME}.Close.error
| Metric type | Value | Description |
|---|
| counter | number | Number of errors encountered for the named database secrets engines while closing database connections |
database.{NAME}.CreateUser
| Metric type | Value | Description |
|---|
| summary | ms | Time required to create a user for the named database secrets engine |
database.{NAME}.CreateUser.error
| Metric type | Value | Description |
|---|
| counter | number | Number of errors encountered for the named database secrets engines while creating users |
database.{NAME}.Initialize
| Metric type | Value | Description |
|---|
| summary | ms | Time required to initialize a database secret engine for the named database |
database.{NAME}.Initialize.error
| Metric type | Value | Description |
|---|
| counter | number | Number of errors encountered for the named database secrets engines while initializing the database |
database.{NAME}.RenewUser
| Metric type | Value | Description |
|---|
| summary | ms | Time required to renew a user for the named database secrets engine |
database.{NAME}.RenewUser.error
| Metric type | Value | Description |
|---|
| counter | number | Number of errors encountered for the named database secrets engines while renewing users |
database.{NAME}.RevokeUser
| Metric type | Value | Description |
|---|
| summary | ms | Time required to revoke a user for the named database secrets engine |
database.{NAME}.RevokeUser.error
| Metric type | Value | Description |
|---|
| counter | number | Number of errors encountered for the named database secrets engines while revoking users |
database.RenewUser
| Metric type | Value | Description |
|---|
| summary | ms | Time required to renew a user across all database secrets engines |
database.RenewUser.error
| Metric type | Value | Description |
|---|
| counter | number | Number of errors encountered across all database secrets engines while renewing users |
database.RevokeUser
| Metric type | Value | Description |
|---|
| summary | ms | Time required to revoke a user across all database secrets engines |
database.RevokeUser.error
| Metric type | Value | Description |
|---|
| counter | number | Number of errors encountered across all database secrets engines while revoking users |
secrets.pki.tidy.cert_store_current_entry
| Metric type | Value | Description |
|---|
| gauge | number | Index of the certificate store entry currently being verified by the tidy operation |
secrets.pki.tidy.cert_store_deleted_count
| Metric type | Value | Description |
|---|
| counter | number | Number of entries deleted from the certificate store |
secrets.pki.tidy.cert_store_total_entries_remaining
| Metric type | Value | Description |
|---|
| gauge | number | Number of entries in the certificate store checked, but not removed, during the tidy operation |
secrets.pki.tidy.cert_store_total_entries
| Metric type | Value | Description |
|---|
| gauge | number | Number of entries in the certificate store to verify during the tidy operation |
secrets.pki.tidy.duration
| Metric type | Value | Description |
|---|
| summary | ms | Time required to complete the PKI tidy operation |
secrets.pki.tidy.failure
| Metric type | Value | Description |
|---|
| counter | number | Number of times the PKI tidy operation failed to finish due to errors |
secrets.pki.tidy.revoked_cert_current_entry
| Metric type | Value | Description |
|---|
| gauge | number | Index of the revoked certificate store entry currently being verified by the tidy operation |
secrets.pki.tidy.revoked_cert_deleted_count
| Metric type | Value | Description |
|---|
| counter | number | Number of entries deleted from the certificate store for revoked certificates |
secrets.pki.tidy.revoked_cert_total_entries_fixed_issuers
| Metric type | Value | Description |
|---|
| gauge | number | Number of entries in the certificate store found to have incorrect issuer information that were fixed during the tidy operation |
secrets.pki.tidy.revoked_cert_total_entries_incorrect_issuers
| Metric type | Value | Description |
|---|
| gauge | number | Total number of entries in the certificate store found to have incorrect issuer information |
secrets.pki.tidy.revoked_cert_total_entries_remaining
| Metric type | Value | Description |
|---|
| gauge | number | Number of revoked certificates in the certificate store checked, but not removed, during the tidy operation |
secrets.pki.tidy.revoked_cert_total_entries
| Metric type | Value | Description |
|---|
| gauge | number | Number of revoked certificate entries in the certificate store to be verified during the tidy operation |
secrets.pki.tidy.start_time_epoch
| Metric type | Value | Description |
|---|
| gauge | seconds | Epoch time (seconds since 1970-01-01) when the PKI tidy operation began |
The start time metric reports a value of 0 if the PKI tidy operation is not
currently active.
secrets.pki.tidy.success
| Metric type | Value | Description |
|---|
| counter | number | Number of times the PKI tidy operation completed successfully |
vault.audit.{DEVICE}.log_request_failure
| Metric type | Value | Description |
|---|
| counter | number | Number of audit log request failures |
vault.audit.{DEVICE}.log_request
| Metric type | Value | Description |
|---|
| summary | ms | Time required to complete all audit log requests across the device |
vault.audit.{DEVICE}.log_response_failure
| Metric type | Value | Description |
|---|
| counter | number | Number of audit log request failures |
vault.audit.{DEVICE}.log_response
| Metric type | Value | Description |
|---|
| summary | ms | Time required to complete all audit log responses across the device |
vault.audit.log_request_failure
| Metric type | Value | Description |
|---|
| counter | number | Number of audit log request failures across all devices |
The number of request failures is a crucial metric.
A non-zero value for vault.audit.log_request_failure indicates that all your
configured audit devices failed to log a request (or response). If OpenBao cannot
properly audit a request, or the response to a request, the original request
will fail.
Refer to the OpenBao logs and any device-specific metrics to troubleshoot the
failing audit log device.
vault.audit.log_request
| Metric type | Value | Description |
|---|
| summary | ms | Time required to complete all audit log requests across all audit log devices |
vault.audit.log_response_failure
| Metric type | Value | Description |
|---|
| counter | number | Number of audit log request failures across all devices |
The number of request failures is a crucial metric.
A non-zero value for vault.audit.log_response_failure indicates that one of
the configured audit log devices failed to respond to OpenBao. If OpenBao cannot
properly audit a request, or the response to a request, the original request
will fail.
Refer to the device-specific metrics and logs to troubleshoot the failing audit
log device.
vault.audit.log_response
| Metric type | Value | Description |
|---|
| summary | ms | Time required to complete audit log responses across all audit log devices |
vault.autopilot.failure_tolerance
| Metric type | Value | Description |
|---|
| gauge | nodes | The number of healthy nodes in excess of quorum |
The failure tolerance indicates how many currently healthy nodes can fail without losing quorum.
vault.autopilot.healthy
| Metric type | Value | Description |
|---|
| gauge | boolean | Indicates whether all nodes are healthy |
- A value of
1 on the gauge means that Autopilot deems all nodes healthy.
- A value of
0 on the gauge means that Autopilot deems at least 1 node
unhealthy.
vault.autopilot.node.healthy
| Metric type | Value | Description |
|---|
| gauge | boolean | Indicates whether the active node is healthy |
- A value of
1 on the gauge means that Autopilot deems the node indicated by
node_id is healthy.
- A value of
0 on the gauge means that Autopilot cannot communicate with the
node indicated by node_id, or deems the node unhealthy.
vault.barrier.delete
| Metric type | Value | Description |
|---|
| summary | ms | Time required to complete a DELETE operation at the barrier |
vault.barrier.get
| Metric type | Value | Description |
|---|
| summary | ms | Time required to complete a GET operation at the barrier |
vault.barrier.list
| Metric type | Value | Description |
|---|
| summary | ms | Time required to complete a LIST operation at the barrier |
vault.barrier.put
| Metric type | Value | Description |
|---|
| summary | ms | Time required to complete a PUT operation at the barrier |
vault.cache.delete
| Metric type | Value | Description |
|---|
| counter | number | Number of deletes from the LRU cache |
vault.cache.hit
| Metric type | Value | Description |
|---|
| counter | number | Number of hits against the LRU cache that avoided a read from configured storage |
vault.cache.miss
| Metric type | Value | Description |
|---|
| counter | number | Number of misses against the LRU cache that required a read from configured storage |
vault.cache.write
| Metric type | Value | Description |
|---|
| counter | number | Number of writes to the LRU cache |
vault.core.active
| Metric type | Value | Description |
|---|
| gauge | boolean | Indicates whether the OpenBao node is active |
- A value of
1 indicates that the node is active.
- A value of
0 indicates that the node is in standby.
vault.core.check_token
| Metric type | Value | Description |
|---|
| summary | ms | Time required to complete a token check |
vault.core.fetch_acl_and_token
| Metric type | Value | Description |
|---|
| summary | ms | Time required to fetch ACL and token entries |
vault.core.handle_login_request
| Metric type | Value | Description |
|---|
| summary | ms | Time required to complete a login request |
vault.core.handle_request
| Metric type | Value | Description |
|---|
| summary | ms | Time required to complete a non-login request |
vault.core.in_flight_requests
| Metric type | Value | Description |
|---|
| gauge | requests | Number of requests currently in progress |
vault.core.leadership_lost
| Metric type | Value | Description |
|---|
| summary | ms | Total time that a high-availability cluster node last maintained leadership |
Leadership time updates occur whenever leadership changes. Frequent updates to
vault.core.leadership_lost with low leadership times indicates flapping as
leader status rotates between nodes.
vault.core.leadership_setup_failed
| Metric type | Value | Description |
|---|
| summary | ms | Time taken by the most recent leadership setup failure |
Setup failure time is an important health metric for your high-availability
OpenBao installation. We strongly recommend that you closely monitor
vault.core.leadership_setup_failed and set alerts that keep you informed of
the overall cluster leadership status.
vault.core.locked_users
| Metric type | Value | Description |
|---|
| gauge | users | The number of users currently locked out of OpenBao |
The number of locked users refreshes every 15 minutes.
vault.core.mount_table.num_entries
| Metric type | Value | Description |
|---|
| gauge | objects | Number of mounts in the given mount table |
Mountpoint count metrics include labels to indicate whether the relevant table
is an authentication table or a logical table and whether the table is
replicated or local.
vault.core.mount_table.size
| Metric type | Value | Description |
|---|
| gauge | bytes | The current size of the relevant mount table. |
Table size metrics include labels to indicate whether the relevant table is an
authentication table or a logical table and whether the table is replicated or
local.
vault.core.post_unseal
| Metric type | Value | Description |
|---|
| summary | ms | Time required to complete post-unseal operations |
vault.core.pre_seal
| Metric type | Value | Description |
|---|
| summary | ms | Time required to complete pre-seal operations |
vault.core.seal-internal
| Metric type | Value | Description |
|---|
| summary | ms | Time required to complete internal OpenBao seal operations |
vault.core.seal-with-request
| Metric type | Value | Description |
|---|
| summary | ms | Time required to complete seal operations that were triggered by explicit request |
vault.core.step_down
| Metric type | Value | Description |
|---|
| summary | ms | Time required to step down cluster leadership |
vault.core.unseal
| Metric type | Value | Description |
|---|
| summary | ms | Time required to complete unseal operations |
vault.core.unsealed
| Metric type | Value | Description |
|---|
| gauge | boolean | Indicates whether OpenBao is currently unsealed |
- A value of
1 indicates OpenBao is currently unsealed and clients can
read secrets.
- A value of
0 indicates OpenBao is currently sealed and clients cannot
read secrets.
vault.expire.fetch-lease-times-by-token
| Metric type | Value | Description |
|---|
| summary | ms | Time taken to retrieve lease times by token |
vault.expire.fetch-lease-times
| Metric type | Value | Description |
|---|
| summary | ms | Time taken to retrieve lease times |
vault.expire.job_manager.queue_length
| Metric type | Value | Description |
|---|
| summary | leases | The total number of pending revocation jobs by queue_id |
The queue ID in the queue_id label indicates the mount accessor associated
with the expiring lease. For example, the secrets engine or authentication method.
vault.expire.job_manager.total_jobs
| Metric type | Value | Description |
|---|
| summary | leases | The total number of pending revocation jobs |
vault.expire.lease_expiration
| Metric type | Value | Description |
|---|
| counter | number | The number of lease expirations to date |
vault.expire.lease_expiration.error
| Metric type | Value | Description |
|---|
| counter | number | The total number of lease expiration errors |
vault.expire.lease_expiration.time_in_queue
| Metric type | Value | Description |
|---|
| summary | ms | Time taken for a lease to get to the front of the revoke queue |
vault.expire.leases.by_expiration
| Metric type | Value | Description |
|---|
| gauge | leases | The number of leases set to expire, grouped by the configured interval |
The relevant time intervals are defined in the telemetry stanza for your
OpenBao server configuration with the following parameters:
lease_metrics_epsilon: 1 hour (default)
num_lease_metrics_buckets: 168 hours (default)
add_lease_metrics_namespace_labels: false (default)
OpenBao reports the number of leases due to expire every lease_metrics_epsilon
interval in the time period current_time + num_lease_metrics_buckets.
vault.expire.num_irrevocable_leases
| Metric type | Value | Description |
|---|
| gauge | leases | The number of leases that cannot be automatically revoked |
vault.expire.num_leases
| Metric type | Value | Description |
|---|
| gauge | leases | The total number of leases eligible for eventual expiry |
vault.expire.register-auth
| Metric type | Value | Description |
|---|
| summary | ms | Time taken to register leases associated with new service tokens |
vault.expire.register
| Metric type | Value | Description |
|---|
| summary | ms | Time taken for register operations |
vault.expire.renew-token
| Metric type | Value | Description |
|---|
| summary | ms | Time taken to renew a token |
vault.expire.renew
| Metric type | Value | Description |
|---|
| summary | ms | Time taken to renew a lease |
vault.expire.revoke-by-token
| Metric type | Value | Description |
|---|
| summary | ms | Time taken to revoke all secrets issued with a given token |
vault.expire.revoke-force
| Metric type | Value | Description |
|---|
| summary | ms | Time taken to forcibly revoke a token |
vault.expire.revoke-prefix
| Metric type | Value | Description |
|---|
| summary | ms | Time taken to revoke all tokens on a prefix |
vault.expire.revoke
| Metric type | Value | Description |
|---|
| summary | ms | Time taken to revoke a token |
vault.ha.rpc.client.echo
| Metric type | Value | Description |
|---|
| summary | ms | Time taken to send an echo request from a standby to the active node (also emitted by perf standbys) |
vault.ha.rpc.client.echo.errors
| Metric type | Value | Description |
|---|
| counter | number | Number of standby echo request failures (also emitted by perf standbys) |
vault.ha.rpc.client.forward
| Metric type | Value | Description |
|---|
| summary | ms | Time taken to forward a request from a standby to the active node |
vault.ha.rpc.client.forward.errors
| Metric type | Value | Description |
|---|
| counter | number | Number of standby request forwarding failures |
vault.identity.entity.alias.count
| Metric type | Value | Description |
|---|
| gauge | aliases | The number of identity entities aliases (per authN mount) currently stored in OpenBao |
OpenBao updates the alias count every usage_gauge_period interval.
vault.identity.entity.count
| Metric type | Value | Description |
|---|
| gauge | entities | The number of identity entity aliases (per namespace) currently stored in OpenBao |
vault.identity.entity.creation
| Metric type | Value | Description |
|---|
| counter | number | The number of identity entities created per namespace |
vault.identity.num_entities
| Metric type | Value | Description |
|---|
| gauge | entities | The total number of identity entities currently stored in OpenBao |
vault.identity.upsert_entity_txn
| Metric type | Value | Description |
|---|
| summary | ms | Time required to upsert an entity to the in-memory database and, on the active node, persist the data to storage |
vault.identity.upsert_group_txn
| Metric type | Value | Description |
|---|
| summary | ms | Time required to upsert group membership to the in-memory database and, on the active node, persist the data to storage |
vault.logshipper.buffer.length
| Metric type | Value | Description |
|---|
| gauge | buffer entries | Current length of the log shipper buffer |
vault.logshipper.buffer.max_length
| Metric type | Value | Description |
|---|
| gauge | buffer entries | Maximum length of the log shipper buffer seen to date |
vault.logshipper.buffer.max_size
| Metric type | Value | Description |
|---|
| gauge | bytes | Maximum allowable size of the log shipper buffer |
vault.logshipper.buffer.size
| Metric type | Value | Description |
|---|
| gauge | bytes | Current size of the log shipper buffer |
vault.logshipper.streamWALs.guard_found
| Metric type | Value | Description |
|---|
| counter | number | Number of times OpenBao began streaming WAL entires and found a starting index in the merkle tree |
vault.logshipper.streamWALs.missing_guard
| Metric type | Value | Description |
|---|
| counter | number | Number of times OpenBao began streaming WAL entires without finding a starting index in the Merkle tree |
vault.logshipper.streamWALs.scanned_entries
| Metric type | Value | Description |
|---|
| summary | entries | Number of entries scanned in the buffer before OpenBao found the correct entry |
vault.merkle.flushDirty
| Metric type | Value | Description |
|---|
| summary | ms | The average time required to flush dirty pages to storage |
vault.merkle.flushDirty.num_pages
| Metric type | Value | Description |
|---|
| gauge | pages | Number of pages flushed |
vault.merkle.flushDirty.outstanding_pages
| Metric type | Value | Description |
|---|
| gauge | pages | Number of dirty pages waiting to be flushed |
vault.merkle.saveCheckpoint
| Metric type | Value | Description |
|---|
| summary | ms | The average time required to save a checkpoint |
vault.merkle.saveCheckpoint.num_dirty
| Metric type | Value | Description |
|---|
| gauge | pages | Number of dirty pages at checkpoint |
vault.metrics.collection
| Metric type | Value | Description |
|---|
| summary | ms | The average time required (per gauge type) to collect usage data |
vault.metrics.collection.error
| Metric type | Value | Description |
|---|
| counter | number | The total number of errors (per gauge type) that OpenBao encountered while collecting usage data |
vault.metrics.collection.interval
| Metric type | Units | Description |
|---|
| summary | time duration | The current value of usage_gauge_period |
vault.policy.delete_policy
| Metric type | Value | Description |
|---|
| summary | ms | Time required to delete a policy |
vault.policy.get_policy
| Metric type | Value | Description |
|---|
| summary | ms | Time required to read a policy |
vault.policy.list_policies
| Metric type | Value | Description |
|---|
| summary | ms | Time required to list all policies |
vault.policy.set_policy
| Metric type | Value | Description |
|---|
| summary | ms | Time required to set a policy |
vault.postgresql.delete
| Metric type | Value | Description |
|---|
| timer | ms | Time required to delete entry from storage |
vault.postgresql.get
| Metric type | Value | Description |
|---|
| timer | ms | Time required to get an entry |
vault.postgresql.list
| Metric type | Value | Description |
|---|
| timer | ms | Time required to list all entries under the prefix |
vault.postgresql.list-page
| Metric type | Value | Description |
|---|
| timer | ms | Time required to list all entries under the prefix (subject to pagination limits) |
vault.postgresql.put
| Metric type | Value | Description |
|---|
| timer | ms | Time required to insert an entry |
vault.quota.lease_count.counter
| Metric type | Value | Description |
|---|
| gauge | lease | Total number of leases associated with the named quota rule |
The number of leases reported is specific to the quota rule listed in the name
label, not the number of leases in general. For example, if the named rule
allows for 50 leases max and there are currently 40 leases in the scope of that
quota rule, the value of vault.quota.lease_count.counter is 40 even if there
are 1000 other leases that are unscoped or in the scope of other quota rules.
vault.quota.lease_count.max
| Metric type | Value | Description |
|---|
| gauge | lease | Maximum number of leases allowed by the named quota rule |
vault.quota.lease_count.violation
| Metric type | Value | Description |
|---|
| counter | number | Number of requests rejected due to exceeding the named lease count quota |
vault.quota.rate_limit.violation
| Metric type | Value | Description |
|---|
| counter | number | Number of requests rejected due to exceeding the named rate limit quota rule |
vault.raft_storage.bolt.cursor.count
| Metric type | Value | Description |
|---|
| gauge | number | Number of cursors created in the Bolt database |
vault.raft_storage.bolt.freelist.allocated_bytes
| Metric type | Value | Description |
|---|
| gauge | bytes | Total space allocated for the freelist for the Bolt database |
vault.raft_storage.bolt.freelist.free_pages
| Metric type | Value | Description |
|---|
| gauge | number | Number of free pages in the freelist for the Bolt database |
vault.raft_storage.bolt.freelist.pending_pages
| Metric type | Value | Description |
|---|
| gauge | number | Number of pending pages in the freelist for the Bolt database |
vault.raft_storage.bolt.freelist.used_bytes
| Metric type | Value | Description |
|---|
| gauge | bytes | Total space used by the freelist for the Bolt database |
vault.raft_storage.bolt.node.count
| Metric type | Value | Description |
|---|
| gauge | number | Number of node allocations for the Bolt database |
vault.raft_storage.bolt.node.dereferences
| Metric type | Value | Description |
|---|
| gauge | number | Total number of node dereferences by the Bolt database |
vault.raft_storage.bolt.page.bytes_allocated
| Metric type | Value | Description |
|---|
| gauge | bytes | Total space allocated to the Bolt database |
vault.raft_storage.bolt.page.count
| Metric type | Value | Description |
|---|
| gauge | number | Number of page allocations in the Bolt database |
vault.raft_storage.bolt.rebalance.count
| Metric type | Value | Description |
|---|
| gauge | number | Number of node rebalances performed by the Bolt database |
vault.raft_storage.bolt.rebalance.time
| Metric type | Value | Description |
|---|
| summary | ms | Time required by the Bolt database to rebalance nodes |
vault.raft_storage.bolt.spill.count
| Metric type | Value | Description |
|---|
| gauge | number | Number of nodes spilled by the Bolt database |
vault.raft_storage.bolt.spill.time
| Metric type | Value | Description |
|---|
| summary | ms | Total time spent spilling by the Bolt database |
vault.raft_storage.bolt.split.count
| Metric type | Value | Description |
|---|
| gauge | number | Number of nodes split by the Bolt database |
vault.raft_storage.bolt.transaction.currently_open_read_transactions
| Metric type | Value | Description |
|---|
| gauge | number | Number of in-process read transactions for the Bolt DB |
vault.raft_storage.bolt.transaction.started_read_transactions
| Metric type | Value | Description |
|---|
| gauge | number | Number of read transactions started by the Bolt DB |
vault.raft_storage.bolt.write.count
| Metric type | Value | Description |
|---|
| gauge | number | Number of writes performed by the Bolt database |
vault.raft_storage.bolt.write.time
| Metric type | Value | Description |
|---|
| counter | ms | Total cumulative time the Bolt database has spent writing to disk. |
vault.raft_storage.follower.applied_index_delta
| Metric type | Value | Description |
|---|
| gauge | number | The difference between the index applied by the leader and the index applied by the follower as reported by echoes |
vault.raft_storage.follower.last_heartbeat_ms
| Metric type | Value | Description |
|---|
| gauge | ms | Time since the follower last received a heartbeat request |
vault.raft_storage.stats.applied_index
| Metric type | Value | Description |
|---|
| gauge | number | Highest index of raft log last applied to the finite state machine or added to fsm_pending queue |
vault.raft_storage.stats.commit_index
| Metric type | Value | Description |
|---|
| gauge | number | Index of the last raft log committed to disk on the node |
vault.raft_storage.stats.fsm_pending
| Metric type | Value | Description |
|---|
| gauge | number | Number of raft logs queued by the node for the finite state machine to apply |
vault.raft-storage.delete
| Metric type | Value | Description |
|---|
| timer | ms | Time required to insert log entry to delete path |
vault.raft-storage.entry_size
| Metric type | Value | Description |
|---|
| summary | bytes | The total size of a raft entry during log application |
vault.raft-storage.get
| Metric type | Value | Description |
|---|
| timer | ms | Time required to retrieve a value for the given path from the finite state machine |
vault.raft-storage.list
| Metric type | Value | Description |
|---|
| timer | ms | Time required to list all entries under the prefix from the finite state machine |
vault.raft-storage.put
| Metric type | Value | Description |
|---|
| timer | ms | Time required to insert a log entry to the persist path |
vault.raft-storage.transaction
| Metric type | Value | Description |
|---|
| timer | ms | Time required to insert operations into a single log |
vault.raft.apply
| Metric type | Value | Description |
|---|
| counter | number | Number of transactions in the configured interval |
The vault.raft.apply metric is generally a good indicator of the write load
on your raft internal storage.
vault.raft.barrier
| Metric type | Value | Description |
|---|
| counter | number | Number of times the node started the barrier |
A node starts the barrier by issuing a blocking call when it wants to ensure
that all pending operations that need to be applied to the finite state machine
are properly queued.
vault.raft.candidate.electSelf
| Metric type | Value | Description |
|---|
| summary | ms | Time required for a node to send a vote request to a peer |
vault.raft.commitNumLogs
| Metric type | Value | Description |
|---|
| gauge | number | Number of logs processed for application to the finite state machine in a single batch |
vault.raft.commitTime
| Metric type | Value | Description |
|---|
| summary | ms | Time required to commit a new entry to the raft log on the leader node |
vault.raft.compactLogs
| Metric type | Value | Description |
|---|
| summary | ms | Time required to trim unnecessary logs |
vault.raft.fsm.apply
| Metric type | Value | Description |
|---|
| summary | number | Number of logs committed by the finite state machine since the last interval |
vault.raft.fsm.applyBatch
| Metric type | Value | Description |
|---|
| summary | ms | Time required by the finite state machine to apply the most recent batch of logs |
vault.raft.fsm.applyBatchNum
| Metric type | Value | Description |
|---|
| counter | number | Number of logs applied in the most recent batch |
vault.raft.fsm.enqueue
| Metric type | Value | Description |
|---|
| summary | ms | Time required to queue up a batch of logs for the finite state machine to apply |
vault.raft.fsm.restore
| Metric type | Value | Description |
|---|
| summary | ms | Time required by the finite state machine to complete a restore operation from a snapshot |
vault.raft.fsm.snapshot
| Metric type | Value | Description |
|---|
| summary | ms | Time required by the finite state machine to record state information for the current snapshot |
vault.raft.fsm.store_config
| Metric type | Value | Description |
|---|
| summary | ms | Time required to store the most recent raft configuration |
vault.raft.get
| Metric type | Value | Description |
|---|
| summary | ms | Time required to retrieve an entry from underlying storage |
vault.raft.leader.dispatchLog
| Metric type | Value | Description |
|---|
| timer | ms | Time required for the leader node to write a log entry to disk |
vault.raft.leader.dispatchNumLogs
| Metric type | Value | Description |
|---|
| gauge | number | Number of logs committed to disk in the most recent batch |
| Metric type | Value | Description |
|---|
| summary | ms | Time since the leader was last able to contact the follower nodes when checking its leader lease |
vault.raft.list
| Metric type | Value | Description |
|---|
| summary | ms | Time required to retrieve a list of keys from underlying storage |
vault.raft.peers
| Metric type | Value | Description |
|---|
| gauge | number | The number of peers in the raft cluster configuration |
vault.raft.replication.appendEntries.log
| Metric type | Value | Description |
|---|
| summary | number | Number of logs replicated to a node to establish parity with leader logs |
vault.raft.replication.appendEntries.rpc
| Metric type | Value | Description |
|---|
| timer | ms | Time required to replicate leader node log entries to all follower nodes with appendEntries |
vault.raft.replication.heartbeat
| Metric type | Value | Description |
|---|
| timer | ms | Time required to invoke appendEntries on a peer so the peer does not time out |
vault.raft.replication.installSnapshot
| Metric type | Value | Description |
|---|
| timer | ms | Time required to process an installSnapshot RPC call |
Only nodes currently in the follower state report
vault.raft.replication.installSnapshot metrics.
vault.raft.restore
| Metric type | Value | Description |
|---|
| counter | number | Number of times that the node performed a restore operation |
In the context of raft storage, a restore operation refers to the process where
raft consumes an external snapshot to restore its state.
vault.raft.restoreUserSnapshot
| Metric type | Value | Description |
|---|
| timer | ms | Time required to restore the finite state machine from a user snapshot |
vault.raft.rpc.appendEntries
| Metric type | Value | Description |
|---|
| timer | ms | Time required to process a remote appendEntries call from a node |
vault.raft.rpc.appendEntries.processLogs
| Metric type | Value | Description |
|---|
| timer | ms | Time required to completely process the outstanding logs for the given node |
vault.raft.rpc.appendEntries.storeLogs
| Metric type | Value | Description |
|---|
| timer | ms | Time required to record any outstanding logs since the last request to append entries for the given node |
vault.raft.rpc.installSnapshot
| Metric type | Value | Description |
|---|
| timer | ms | Time required to process an installSnapshot RPC call |
Only nodes currently in the follower state report
vault.raft.rpc.installSnapshot metrics.
vault.raft.rpc.processHeartbeat
| Metric type | Value | Description |
|---|
| timer | ms | Time required to process a heartbeat request |
vault.raft.rpc.requestVote
| Metric type | Value | Description |
|---|
| summary | ms | Time required to complete a requestVote call |
vault.raft.snapshot.create
| Metric type | Value | Description |
|---|
| timer | ms | Time required to capture a new snapshot |
vault.raft.snapshot.persist
| Metric type | Value | Description |
|---|
| timer | ms | Time required to record snapshot meta information to disk while taking snapshots |
vault.raft.snapshot.takeSnapshot
| Metric type | Value | Description |
|---|
| timer | ms | Total time required to create and persist the current snapshot |
In most cases, vault.raft.snapshot.takeSnapshot is approximately equal to
vault.raft.snapshot.create + vault.raft.snapshot.persist.
vault.raft.state.candidate
| Metric type | Value | Description |
|---|
| counter | number | Number of times the raft server initiated an election |
vault.raft.state.follower
| Metric type | Value | Description |
|---|
| summary | number | Number of times in the configured interval that the raft server became a follower |
Nodes transition to follower state under the following conditions:
- when the node joins the cluster
- when a leader is elected, but the node was not elected leader
vault.raft.state.leader
| Metric type | Value | Description |
|---|
| counter | number | Number of times the raft server became a leader |
vault.raft.transition.heartbeat_timeout
| Metric type | Value | Description |
|---|
| summary | number | Number of times that the node transitioned to candidate state after not receiving a heartbeat message from the last known leader |
vault.raft.transition.leader_lease_timeout
| Metric type | Value | Description |
|---|
| counter | number | The number of times the leader could not contact a quorum of nodes and therefore stepped down |
vault.raft.verify_leader
| Metric type | Value | Description |
|---|
| counter | number | Number of times in the configured interval that the node confirmed it is still the leader |
vault.rollback.attempt.{MOUNTPOINT}
| Metric type | Value | Description |
|---|
| summary | ms | Time required to perform a rollback operation on the given mount point |
vault.rollback.inflight
| Metric type | Value | Description |
|---|
| gauge | number | Number of rollback operations inflight |
vault.rollback.queued
| Metric type | Value | Description |
|---|
| gauge | number | The number of rollback operations waiting to be started |
vault.rollback.waiting
| Metric type | Value | Description |
|---|
| summary | ms | Time between queueing a rollback operation and the operation starting |
vault.route.create.{MOUNTPOINT}
| Metric type | Value | Description |
|---|
| summary | ms | Time required to send a create request to the backend and for the backend to complete the operation for the given mount point |
vault.route.delete.{MOUNTPOINT}
| Metric type | Value | Description |
|---|
| summary | ms | Time required to send a delete request to the backend and for the backend to complete the operation for the given mount point |
vault.route.list.{MOUNTPOINT}
| Metric type | Value | Description |
|---|
| summary | ms | Time required to send a list request to the backend and for the backend to complete the operation for the given mount point |
vault.route.read.{MOUNTPOINT}
| Metric type | Value | Description |
|---|
| summary | ms | Time required to send a read request to the backend and for the backend to complete the operation for the given mount point |
vault.route.rollback.{MOUNTPOINT}
| Metric type | Value | Description |
|---|
| summary | ms | Time required to send a rollback request to the backend and for the backend to complete the operation for the given mount point |
OpenBao automatically schedules and performs mount point rollback operations to
clean up partial errors.
vault.runtime.alloc_bytes
| Metric type | Value | Description |
|---|
| gauge | bytes | Space currently allocated to OpenBao processes |
The number of allocated bytes may peak from time to time, but should
always return to a steady state value in a health OpenBao installation.
vault.runtime.free_count
| Metric type | Value | Description |
|---|
| gauge | number | Number of freed objects |
vault.runtime.gc_pause_ns
| Metric type | Value | Description |
|---|
| summary | ns | Time required to complete the last garbage collection run |
vault.runtime.heap_objects
| Metric type | Value | Description |
|---|
| gauge | number | Total number of objects on the heap in memory |
The vault.runtime.heap_objects metric is a good memory pressure indicator. We
recommend monitoring vault.runtime.heap_objects to establish an accurate
baseline and thresholds for alerting on the health of your OpenBao installation.
vault.runtime.malloc_count
| Metric type | Value | Description |
|---|
| gauge | number | Total number of allocated heap objects in memory |
vault.runtime.num_goroutines
| Metric type | Value | Description |
|---|
| gauge | number | Total number of Go routines running in memory |
The vault.runtime.num_goroutines metric is a good system load indicator. We
recommend monitoring vault.runtime.num_goroutines to establish an accurate
baseline and thresholds for alerting on the health of your OpenBao installation.
vault.runtime.sys_bytes
| Metric type | Value | Description |
|---|
| gauge | number | Total number of bytes allocated to OpenBao |
The total number of allocated system bytes includes space currently used by the
heap plus space that has been reclaimed by, but not returned to, the operating
system.
vault.runtime.total_gc_pause_ns
| Metric type | Value | Description |
|---|
| gauge | ns | The total garbage collector pause time since OpenBao was last started |
vault.runtime.total_gc_runs
| Metric type | Value | Description |
|---|
| gauge | number | The total number of garbage collection runs since OpenBao was last started |
vault.secret.kv.count
| Metric type | Value | Description |
|---|
| gauge | number | Number of entries in each key-value secrets engines |
OpenBao organizes the key-value pair count by cluster, namespace, and mount point.
vault.secret.lease.creation
| Metric type | Value | Description |
|---|
| counter | number | Number of leases created by secrets engines |
OpenBao organizes the lease count by cluster, namespace, secret engine, mount
point, and time to live (TTL).
vault.token.count
| Metric type | Value | Description |
|---|
| gauge | number | Number of un-expired and un-revoked tokens available for use in the token store |
OpenBao updates the token count every 10 minutes organizes the result by cluster
and namespace.
vault.token.count.by_auth
| Metric type | Value | Description |
|---|
| gauge | number | Total number of service tokens created by a particular auth method |
OpenBao organizes the token count by cluster, namespace, and authentication
method.
vault.token.count.by_policy
| Metric type | Value | Description |
|---|
| gauge | number | Total number of service tokens with a particular policy attached |
OpenBao organizes the token count by cluster, namespace, and policy. Tokens with
more than one policy attached appear in the gauge for each associated policy.
vault.token.count.by_ttl
| Metric type | Value | Description |
|---|
| gauge | number | Total number of service tokens assigned a particular time to live (TTL) |
OpenBao organizes the token count by cluster, namespace, and the TTL
range assigned at creation.
vault.token.create_root
| Metric type | Value | Description |
|---|
| counter | number | Number of root tokens created |
The vault.token.create_root counts the total number of root tokens created
over time, not the number of root tokens currently in use. As a result, the
value of vault.token.create_root does not decrease when a root token is
revoked.
vault.token.create
| Metric type | Value | Description |
|---|
| summary | ms | Time required to create a token in OpenBao |
vault.token.createAccessor
| Metric type | Value | Description |
|---|
| summary | ms | Time required to create a token accessor in OpenBao |
vault.token.creation
| Metric type | Value | Description |
|---|
| counter | number | Number of service or batch tokens created |
OpenBao organizes the creation count by cluster, namespace, authentication method,
mount point, time to live (TTL), and token type.
vault.token.lookup
| Metric type | Value | Description |
|---|
| summary | ms | Time required to look up a token in OpenBao |
vault.token.revoke-tree
| Metric type | Value | Description |
|---|
| summary | ms | Time required to fully revoke a token tree in OpenBao |
vault.token.revoke
| Metric type | Value | Description |
|---|
| summary | ms | Time required to revoke a token in OpenBao |
vault.token.store
| Metric type | Value | Description |
|---|
| summary | ms | Time required to store an updated token entry without writing to the secondary index |
vault.wal.deleteWALs
| Metric type | Value | Description |
|---|
| summary | ms | Time required to fully delete a write-ahead log |
vault.wal.flushReady
| Metric type | Value | Description |
|---|
| summary | ms | Time required to fully flush a write-ahead log that is ready for storage |
vault.wal.flushReady.queue_len
| Metric type | Value | Description |
|---|
| summary | number | Current size of the write queue in the WAL system |
vault.wal.gc.deleted
| Metric type | Value | Description |
|---|
| gauge | number | Number of write-ahead logs deleted during garbage collection |
vault.wal.gc.total
| Metric type | Value | Description |
|---|
| gauge | number | Total number of write-ahead logs currently on disk |
vault.wal.loadWAL
| Metric type | Value | Description |
|---|
| summary | ms | Time required to load a write-ahead log |
vault.wal.persistWALs
| Metric type | Value | Description |
|---|
| summary | ms | Time required to persist a write-ahead log |