Kafka Metrics
- Prometheus plugin: kafka_exporter (opens in a new tab), jmx_exporter (opens in a new tab)
- Prometheus Built-in Chart: Grafana Kafka Overvie (opens in a new tab)
Note: The sign of behind Metric indicates that the platform provides Built-in Chart.
| Description | Metrics |
|---|---|
| Number of active controllers in the cluster | MegaEasekafka-controller-activecontrollercount-value-metricDatadog kafka.replication.active_controller_countPrometheus kafka_controller_kafkacontroller_activecontrollercountTelegraf kafka_controller_activecontrollercount_valueMegaEase Dashboard - kafka-controller-activecontrollercount-value-max-metric |
| Number of partitions across all topics in the cluster | MegaEasekafka-controller-globalpartitioncount-value-metricDatadog kafka.replication.partition_countPrometheus kafka_controller_kafkacontroller_globalpartitioncountTelegraf kafka_controller_globalpartitioncount_valueMegaEase Dashboard - kafka-controller-globalpartitioncount-value-max-metric |
| Total number of topics across all brokers in the cluster | MegaEasekafka-controller-globaltopiccount-value-metricDatadog NotSupportPrometheus kafka_controller_kafkacontroller_globaltopiccountTelegraf kafka_controller_globaltopiccount_valueMegaEase Dashboard - kafka-controller-globaltopiccount-value-max-metric |
| the max of Leader election rate and latency | MegaEasekafka-controller-leaderelectionrateandtimems-max-metricDatadog kafka.replication.leader_elections.ratePrometheus kafka_controller_controllerstats_leaderelectionrateandtimemsTelegraf kafka_controller_leaderelectionrateandtimems_maxMegaEase Dashboard - kafka-controller-leaderelectionrateandtimems-max-max-metric |
| m1 rate of unclean leader elections | MegaEasekafka-controller-uncleanleaderelectionspersec-oneminuterate-metricDatadog kafka.replication.unclean_leader_elections.ratePrometheus kafka_controller_controllerstats_uncleanleaderelectionspersecTelegraf kafka_controller_uncleanleaderelectionspersec_oneminuterateMegaEase Dashboard - kafka-controller-uncleanleaderelectionspersec-oneminuterate-max-metric |
| Controller requests queue size (per broker) | MegaEasekafka-controller-eventqueuesize-value-metricDatadog NotSupportPrometheus kafka_controller_controllereventmanager_eventqueuesizeTelegraf kafka_controller_eventqueuesize_valueMegaEase Dashboard - kafka-controller-eventqueuesize-value-max-metric |
| Number of partitions that don't have an active leader and are hence not writable or readable | MegaEasekafka-controller-offlinepartitionscount-value-metricDatadog kafka.replication.offline_partitions_countPrometheus kafka_controller_kafkacontroller_offlinepartitionscountTelegraf kafka_controller_offlinepartitionscount_valueMegaEase Dashboard - kafka-controller-offlinepartitionscount-value-max-metric |
| Total number of controller requests to be sent out to brokers | MegaEasekafka-controller-totalqueuesize-value-metricDatadog NotSupportPrometheus kafka_controller_controllerchannelmanager_totalqueuesizeTelegraf kafka_controller_totalqueuesize_valueMegaEase Dashboard - kafka-controller-totalqueuesize-value-max-metric |
| the total number of collections that have occurred | MegaEasekafka-java-garbage-collector-collectioncount-metricDatadog jmx.java.lang.collectioncountPrometheus java_lang_garbagecollector_collectioncountTelegraf kafka_java_garbage_collector_collectioncountMegaEase Dashboard - kafka-java-garbage-collector-collectioncount-ratio-metric |
| the approximate accumulated collection elapsed time in milliseconds | MegaEasekafka-java-garbage-collector-collectiontime-metricDatadog jmx.java.lang.collectiontimePrometheus java_lang_garbagecollector_collectiontimeTelegraf kafka_java_garbage_collector_collectiontimeMegaEase Dashboard - kafka-java-garbage-collector-collectiontime-ratio-metric |
| the amount of used memory in bytes of the memory usage | MegaEasekafka-java-memory-pool-usage-used-metricDatadog jmx.java.lang.usage.usedPrometheus jvm_memory_pool_bytes_usedTelegraf kafka_java_memory_pool_usage_usedMegaEase Dashboard - kafka-java-memory-pool-usage-used-max-by-tag-name-metric |
| the current number of live daemon threads | MegaEasekafka-java-threading-daemonthreadcount-metricDatadog jmx.java.lang.daemon_thread_countPrometheus java_lang_threading_daemonthreadcountTelegraf kafka_java_threading_daemonthreadcountMegaEase Dashboard - kafka-java-threading-daemonthreadcount-max-metric |
| the peak live thread count | MegaEasekafka-java-threading-peakthreadcount-metricDatadog jmx.java.lang.daemon_thread_countPrometheus java_lang_threading_peakthreadcountTelegraf kafka_java_threading_peakthreadcountMegaEase Dashboard - kafka-java-threading-peakthreadcount-max-metric |
| the current number of live threads | MegaEasekafka-java-threading-threadcount-metricDatadog jmx.java.lang.thread_countPrometheus java_lang_threading_threadcountTelegraf kafka_java_threading_threadcountMegaEase Dashboard - kafka-java-threading-threadcount-max-metric |
| the maximum amount of memory in bytes that can be used of the peak memory usage | MegaEasekafka-java-memory-pool-peakusage-max-metricDatadog jmx.java.lang.peak_usage.maxPrometheus java_lang_memorypool_peakusage_maxTelegraf kafka_java_memory_pool_peakusage_maxMegaEase Dashboard - kafka-java-memory-pool-peakusage-max-max-metric |
| the amount of used memory in bytes of the peak memory usage | MegaEasekafka-java-memory-pool-peakusage-used-metricDatadog jmx.java.lang.peak_usage.usedPrometheus java_lang_memorypool_peakusage_usedTelegraf kafka_java_memory_pool_peakusage_usedMegaEase Dashboard - kafka-java-memory-pool-peakusage-used-max-metric- kafka-java-memory-pool-peakusage-used-avg-metric- kafka-java-memory-pool-peakusage-used-min-metric |
| the maximum amount of memory in bytes that can be used of the memory usage | MegaEasekafka-java-memory-pool-usage-max-metricDatadog jmx.java.lang.usage.maxPrometheus java_lang_memorypool_usage_maxTelegraf kafka_java_memory_pool_usage_maxMegaEase Dashboard - kafka-java-memory-pool-usage-max-max-metric- kafka-java-memory-pool-usage-max-avg-metric- kafka-java-memory-pool-usage-max-min-metric |
| the total number of threads started | MegaEasekafka-java-threading-totalstartedthreadcount-metricDatadog jmx.java.lang.total_started_thread_countPrometheus java_lang_threading_totalstartedthreadcountTelegraf kafka_java_threading_totalstartedthreadcountMegaEase Dashboard - kafka-java-threading-totalstartedthreadcount-ratio-metric- kafka-java-threading-totalstartedthreadcount-max-metric |
| Number of leaders on this broker | MegaEasekafka-replica-manager-leadercount-value-metricDatadog kafka.replication.leader_countPrometheus kafka_server_replicamanager_leadercountTelegraf kafka_replica_manager_leadercount_valueMegaEase Dashboard - kafka-replica-manager-leadercount-value-max-metric |
| Offline Replica counts | MegaEasekafka-replica-manager-offlinereplicacount-value-metricDatadog NotSupportPrometheus kafka_server_replicamanager_offlinereplicacountTelegraf kafka_replica_manager_offlinereplicacount_valueMegaEase Dashboard - kafka-replica-manager-offlinereplicacount-value-max-metric |
| Number of partitions on this broker | MegaEasekafka-replica-manager-partitioncount-value-metricDatadog kafka.replication.partition_countPrometheus kafka_server_replicamanager_partitioncountTelegraf kafka_replica_manager_partitioncount_valueMegaEase Dashboard - kafka-replica-manager-partitioncount-value-max-metric |
| Number of under-replicated partitions | MegaEasekafka-replica-manager-underreplicatedpartitions-value-metricDatadog kafka.replication.under_replicated_partitionsPrometheus kafka_server_replicamanager_underreplicatedpartitionsTelegraf kafka_replica_manager_underreplicatedpartitions_valueMegaEase Dashboard - kafka-replica-manager-underreplicatedpartitions-value-max-metric |
| Number of partitions whose in-sync replicas count is less than minIsr | MegaEasekafka-replica-manager-underminisrpartitioncount-value-metricDatadog kafka.replication.under_min_isr_partition_countPrometheus kafka_server_replicamanager_underminisrpartitioncountTelegraf kafka_replica_manager_underminisrpartitioncount_valueMegaEase Dashboard - kafka-replica-manager-underminisrpartitioncount-value-min-metric- kafka-replica-manager-underminisrpartitioncount-value-max-metric- kafka-replica-manager-underminisrpartitioncount-value-avg-metric |
| Byte in rate from clients per topic | MegaEasekafka-topic-bytesinpersec-oneminuterate-metricDatadog kafka.topic.net.bytes_in.ratePrometheus rate(kafka_server_brokertopicmetrics_bytesout_total[1m])Telegraf kafka_topic_bytesinpersec_oneminuterateMegaEase Dashboard - kafka-topic-bytesinpersec-oneminuterate-max-by-topic-metric |
| Byte out rate to clients per topic | MegaEasekafka-topic-bytesoutpersec-oneminuterate-metricDatadog kafka.topic.net.bytes_out.ratePrometheus rate(kafka_server_brokertopicmetrics_bytesout_total[1m])Telegraf kafka_topic_bytesoutpersec_oneminuterateMegaEase Dashboard - kafka-topic-bytesoutpersec-oneminuterate-max-by-topic-metric |
| Aggregate incoming message rate per topic | MegaEasekafka-topic-messagesinpersec-oneminuterate-metricDatadog kafka.topic.messages_in.ratePrometheus rate(kafka_server_brokertopicmetrics_messagesin_total[1m])Telegraf kafka_topic_messagesinpersec_oneminuterateMegaEase Dashboard - kafka-topic-messagesinpersec-oneminuterate-max-by-topic-metric |
| Byte in rate from clients | MegaEasekafka-topics-bytesinpersec-oneminuterate-metricDatadog kafka.net.bytes_in.ratePrometheus rate(kafka_server_brokertopicmetrics_bytesout_total[1m])Telegraf kafka_topics_bytesinpersec_oneminuterateMegaEase Dashboard - kafka-topics-bytesinpersec-oneminuterate-max-metric |
| Byte out rate to clients | MegaEasekafka-topics-bytesoutpersec-oneminuterate-metricDatadog kafka.net.bytes_out.ratePrometheus rate(kafka_server_brokertopicmetrics_bytesout_total[1m])Telegraf kafka_topics_bytesoutpersec_oneminuterateMegaEase Dashboard - kafka-topics-bytesoutpersec-oneminuterate-max-metric |
| Aggregate incoming message rate | MegaEasekafka-topics-messagesinpersec-oneminuterate-metricDatadog kafka.messages_in.ratePrometheus rate(kafka_server_brokertopicmetrics_messagesin_total[1m])Telegraf kafka_topics_messagesinpersec_oneminuterateMegaEase Dashboard - kafka-topics-messagesinpersec-oneminuterate-max-metric |
| Byte-in rate from other brokers | MegaEasekafka-topics-replicationbytesinpersec-count-metricDatadog NotSupportPrometheus rate(kafka_server_brokertopicmetrics_replicationbytesin_total[1m])Telegraf kafka_topics_replicationbytesinpersec_countMegaEase Dashboard - kafka-topics-replicationbytesinpersec-count-max-metric- kafka-topics-replicationbytesinpersec-count-avg-metric- kafka-topics-replicationbytesinpersec-count-min-metric |
| Byte-out rate to other brokers | MegaEasekafka-topics-replicationbytesoutpersec-count-metricDatadog NotSupportPrometheus rate(kafka_server_brokertopicmetrics_replicationbytesout_total[1m])Telegraf kafka_topics_replicationbytesoutpersec_countMegaEase Dashboard - kafka-topics-replicationbytesoutpersec-count-avg-metric- kafka-topics-replicationbytesoutpersec-count-min-metric- kafka-topics-replicationbytesoutpersec-count-max-metric |
| the highest CurrentLag value across all partitions | MegaEasekafka-group-topic-lag-metricDatadog kafka.consumer_lagPrometheus kafka_consumergroup_lagTelegraf kafka_group_topic_lagMegaEase Dashboard - kafka-group-topic-lag-max-by-group-metric |
| total offset of all partitions for the group | MegaEasekafka-group-topic-offset-sum-metricDatadog avg:kafka.consumer_offset{*}Prometheus kafka_consumergroup_current_offset_sumTelegraf kafka_group_topic_offset_sumMegaEase Dashboard - kafka-group-topic-offset-sum-avg-by-group-metric- kafka-group-topic-offset-sum-min-by-group-metric- kafka-group-topic-offset-sum-max-by-group-metric |
| A count of the total number of partitions that the group has committed offsets for | MegaEasekafka-group-topic-partition-count-metricDatadog kafka.replication.partition_countPrometheus kafka_topic_partitionsTelegraf kafka_group_topic_partition_countMegaEase Dashboard - kafka-group-topic-partition-count-max-by-group-metric |
| The sum of all partition CurrentLag values for the group | MegaEasekafka-group-topic-total-lag-metricDatadog kafka.consumer_lagPrometheus kafka_consumergroup_lagTelegraf kafka_group_topic_total_lagMegaEase Dashboard - kafka-group-topic-total-lag-max-by-group-metric |
| offset of single partition in single topic | MegaEasekafka-topic-offset-offset-metricDatadog kafka.consumer_offsetPrometheus kafka_topic_partition_current_offsetTelegraf kafka_topic_offset_offsetMegaEase Dashboard - kafka-topic-offset-offset-ratio-metric |
| the CurrentLag value for single partition | MegaEasekafka-topic-partition-lag-metricDatadog kafka.consumer_lagPrometheus kafka_consumergroup_lagTelegraf kafka_topic_partition_lagMegaEase Dashboard - kafka-topic-partition-lag-max-by-group-metric |
| the number of failed produce requests per second in a Kafka topic | MegaEasekafka-topics-failedproducerequestspersec-count-metricDatadog jmx.kafka.server.FailedProduceRequestsPerSec.countPrometheus kafka_topics_failedproducerequestspersec_countTelegraf kafka_topics_failedproducerequestspersec_countMegaEase Dashboard - kafka-topics-failedproducerequestspersec-count-ratio-metric |
| the rate at which messages are converted by a Kafka producer before they are sent to a topic When a producer sends a message to a Kafka topic, it may need to convert the message to a different format, such as Avro or Protobuf, before it can be written to the topic This metric tracks the number of message conversions per second that occur during this process. | MegaEasekafka-topics-producemessageconversionspersec-count-metricDatadog jmx.kafka.server.ProduceMessageConversionsPerSec.countPrometheus kafka_topics_producemessageconversionspersec_countTelegraf kafka_topics_producemessageconversionspersec_countMegaEase Dashboard - kafka-topics-producemessageconversionspersec-count-ratio-metric |
| the total number of produce requests per second in a Kafka topic | MegaEasekafka-topics-totalproducerequestspersec-count-metricDatadog jmx.kafka.server.TotalProduceRequestsPerSec.countPrometheus kafka_topics_totalproducerequestspersec_countTelegraf kafka_topics_totalproducerequestspersec_countMegaEase Dashboard - kafka-topics-totalproducerequestspersec-count-ratio-metric |
| the total amount of CPU time consumed by a process | MegaEasekafka-process-cpu-seconds-total-metricDatadog jmx.java.lang.process_cpu_timePrometheus rate(process_cpu_seconds_total[1m])Telegraf kafka_process_cpu_seconds_totalMegaEase Dashboard - kafka-process-cpu-seconds-total-ratio-metric |
| the amount of memory used by a Java Virtual Machine (JVM) | MegaEaseNotSupportDatadog jmx.java.lang.memory_usage.usedPrometheus jvm_memory_bytes_used |
| the approximate accumulated collection elapsed time in seconds | MegaEaseNotSupportDatadog NotSupportPrometheus jvm_gc_collection_seconds_sum |
| Rate of replicas joining the ISR pool | MegaEasekafka-replica-manager-isrexpandspersec-count-metricDatadog max:kafka.replication.isr_expands.ratePrometheus NotSupportTelegraf kafka_replica_manager_isrexpandspersec_countMegaEase Dashboard - kafka-replica-manager-isrexpandspersec-count-min-metric- kafka-replica-manager-isrexpandspersec-count-avg-metric- kafka-replica-manager-isrexpandspersec-count-max-metric |
| Rate of replicas leaving the ISR pool | MegaEasekafka-replica-manager-isrshrinkspersec-count-metricDatadog max:kafka.replication.isr_shrinks.ratePrometheus NotSupportTelegraf kafka_replica_manager_isrshrinkspersec_countMegaEase Dashboard - kafka-replica-manager-isrshrinkspersec-count-min-metric- kafka-replica-manager-isrshrinkspersec-count-avg-metric- kafka-replica-manager-isrshrinkspersec-count-max-metric |
| Total time in ms to serve the specified request | MegaEaseNotSupportDatadog avg:kafka.request.fetch_follower.time.avgPrometheus NotSupport |
| Average time for a produce request | MegaEaseNotSupportDatadog avg:kafka.request.produce.time.avgPrometheus NotSupport |
| Total time in ms to serve the specified request | MegaEaseNotSupportDatadog avg:kafka.request.fetch_consumer.time.avgPrometheus NotSupport |
| Number of requests waiting in the producer purgatory | MegaEaseNotSupportDatadog sum:kafka.request.producer_request_purgatory.sizePrometheus NotSupport |
| Number of requests waiting in the producer purgatory | MegaEaseNotSupportDatadog sum:kafka.request.fetch_request_purgatory.sizePrometheus NotSupport |
| Average time for a produce request | MegaEaseNotSupportDatadog avg:kafka.request.produce.time.avgPrometheus NotSupport |
| Total time in ms to serve the specified request | MegaEaseNotSupportDatadog avg:kafka.request.fetch_consumer.time.avgPrometheus NotSupport |
| Total time in ms to serve the specified request | MegaEaseNotSupportDatadog avg:kafka.request.fetch_follower.time.avgPrometheus NotSupport |
| Producer bytes out rate | MegaEaseNotSupportDatadog sum:kafka.producer.bytes_outPrometheus NotSupport |
| Number of producer requests per second | MegaEaseNotSupportDatadog sum:kafka.producer.request_ratePrometheus NotSupport |
| Number of producer responses per second | MegaEaseNotSupportDatadog sum:kafka.producer.response_ratePrometheus NotSupport |
| Producer average request latency | MegaEaseNotSupportDatadog sum:kafka.producer.request_latency_avgPrometheus NotSupport |
| Producer I/O wait time | MegaEaseNotSupportDatadog sum:kafka.producer.io_waitPrometheus NotSupport |
| Consumer bytes in rate | MegaEaseNotSupportDatadog sum:kafka.consumer.bytes_inPrometheus NotSupport |
| Rate of consumer message consumption | MegaEaseNotSupportDatadog sum:kafka.consumer.messages_inPrometheus NotSupport |
| The minimum rate at which the consumer sends fetch requests to a broker | MegaEaseNotSupportDatadog sum:kafka.consumer.fetch_ratePrometheus NotSupport |