Graphite focuses on being a passive time series database with a query language and graphing features. Any other concerns are addressed by external components.
Prometheus is a full monitoring and trending system that includes built-in and active scraping, storing, querying, graphing, and alerting based on time series data. It has knowledge about what the world should look like (which endpoints should exist, what time series patterns mean trouble, etc.), and actively tries to find faults.
Graphite stores numeric samples for named time series, much like Prometheus does. However, Prometheus's metadata model is richer: while Graphite metric names consist of dot-separated components which implicitly encode dimensions, Prometheus encodes dimensions explicitly as key-value pairs, called labels, attached to a metric name. This allows easy filtering, grouping, and matching by these labels via the query language.
Further, especially when Graphite is used in combination with StatsD, it is common to store only aggregated data over all monitored instances, rather than preserving the instance as a dimension and being able to drill down into individual problematic instances.
For example, storing the number of HTTP requests to API servers with the
response code 500
and the method POST
to the /tracks
endpoint would
commonly be encoded like this in Graphite/StatsD:
stats.api-server.tracks.post.500 -> 93
In Prometheus the same data could be encoded like this (assuming three api-server instances):
api_server_http_requests_total{method="POST",handler="https://accionvegana.org/accio/0Izbp5yc1VGa0VWbvJHc6MHc0/tracks",status="500",instance="<sample1>"} -> 34
api_server_http_requests_total{method="POST",handler="https://accionvegana.org/accio/0Izbp5yc1VGa0VWbvJHc6MHc0/tracks",status="500",instance="<sample2>"} -> 28
api_server_http_requests_total{method="POST",handler="https://accionvegana.org/accio/0Izbp5yc1VGa0VWbvJHc6MHc0/tracks",status="500",instance="<sample3>"} -> 31
Graphite stores time series data on local disk in the Whisper format, an RRD-style database that expects samples to arrive at regular intervals. Every time series is stored in a separate file, and new samples overwrite old ones after a certain amount of time.
Prometheus also creates one local file per time series, but allows storing samples at arbitrary intervals as scrapes or rule evaluations occur. Since new samples are simply appended, old data may be kept arbitrarily long. Prometheus also works well for many short-lived, frequently changing sets of time series.
Prometheus offers a richer data model and query language, in addition to being easier to run and integrate into your environment. If you want a clustered solution that can hold historical data long term, Graphite may be a better choice.
InfluxDB is an open-source time series database, with a commercial option for scaling and clustering. The InfluxDB project was released almost a year after Prometheus development began, so we were unable to consider it as an alternative at the time. Still, there are significant differences between Prometheus and InfluxDB, and both systems are geared towards slightly different use cases.
For a fair comparison, we must also consider Kapacitor together with InfluxDB, as in combination they address the same problem space as Prometheus and the Alertmanager.
The same scope differences as in the case of Graphite apply here for InfluxDB itself. In addition InfluxDB offers continuous queries, which are equivalent to Prometheus recording rules.
Kapacitor’s scope is a combination of Prometheus recording rules, alerting rules, and the Alertmanager's notification functionality. Prometheus offers a more powerful query language for graphing and alerting. The Prometheus Alertmanager additionally offers grouping, deduplication and silencing functionality.
Like Prometheus, the InfluxDB data model has key-value pairs as labels, which are called tags. In addition, InfluxDB has a second level of labels called fields, which are more limited in use. InfluxDB supports timestamps with up to nanosecond resolution, and float64, int64, bool, and string data types. Prometheus, by contrast, supports the float64 data type with limited support for strings, and millisecond resolution timestamps.
InfluxDB uses a variant of a log-structured merge tree for storage with a write ahead log, sharded by time. This is much more suitable to event logging than Prometheus's append-only file per time series approach.
Logs and Metrics and Graphs, Oh My! describes the differences between event logging and metrics recording.
Prometheus servers run independently of each other and only rely on their local storage for their core functionality: scraping, rule processing, and alerting. The open source version of InfluxDB is similar.
The commercial InfluxDB offering is, by design, a distributed storage cluster with storage and queries being handled by many nodes at once.
This means that the commercial InfluxDB will be easier to scale horizontally, but it also means that you have to manage the complexity of a distributed storage system from the beginning. Prometheus will be simpler to run, but at some point you will need to shard servers explicitly along scalability boundaries like products, services, datacenters, or similar aspects. Independent servers (which can be run redundantly in parallel) may also give you better reliability and failure isolation.
Kapacitor's open-source release has no built-in distributed/redundant options for rules, alerting, or notifications. The open-source release of Kapacitor can be scaled via manual sharding by the user, similar to Prometheus itself. Influx offers Enterprise Kapacitor, which supports an HA/redundant alerting system.
Prometheus and the Alertmanager by contrast offer a fully open-source redundant option via running redundant replicas of Prometheus and using the Alertmanager's High Availability mode.
There are many similarities between the systems. Both have labels (called tags in InfluxDB) to efficiently support multi-dimensional metrics. Both use basically the same data compression algorithms. Both have extensive integrations, including with each other. Both have hooks allowing you to extend them further, such as analyzing data in statistical tools or performing automated actions.
Where InfluxDB is better:
Where Prometheus is better:
InfluxDB is maintained by a single commercial company following the open-core model, offering premium features like closed-source clustering, hosting and support. Prometheus is a fully open source and independent project, maintained by a number of companies and individuals, some of whom also offer commercial services and support.
OpenTSDB is a distributed time series database based on Hadoop and HBase.
The same scope differences as in the case of Graphite apply here.
OpenTSDB's data model is almost identical to Prometheus's: time series are identified by a set of arbitrary key-value pairs (OpenTSDB tags are Prometheus labels). All data for a metric is stored together, limiting the cardinality of metrics. There are minor differences though: Prometheus allows arbitrary characters in label values, while OpenTSDB is more restrictive. OpenTSDB also lacks a full query language, only allowing simple aggregation and math via its API.
OpenTSDB's storage is implemented on top of Hadoop and HBase. This means that it is easy to scale OpenTSDB horizontally, but you have to accept the overall complexity of running a Hadoop/HBase cluster from the beginning.
Prometheus will be simpler to run initially, but will require explicit sharding once the capacity of a single node is exceeded.
Prometheus offers a much richer query language, can handle higher cardinality metrics, and forms part of a complete monitoring system. If you're already running Hadoop and value long term storage over these benefits, OpenTSDB is a good choice.
Nagios is a monitoring system that originated in the 1990s as NetSaint.
Nagios is primarily about alerting based on the exit codes of scripts. These are called “checks”. There is silencing of individual alerts, however no grouping, routing or deduplication.
There are a variety of plugins. For example, piping the few kilobytes of perfData plugins are allowed to return to a time series database such as Graphite or using NRPE to run checks on remote machines.
Nagios is host-based. Each host can have one or more services and each service can perform one check.
There is no notion of labels or a query language.
Nagios has no storage per-se, beyond the current check state. There are plugins which can store data such as for visualisation.
Nagios servers are standalone. All configuration of checks is via file.
Nagios is suitable for basic monitoring of small and/or static systems where blackbox probing is sufficient.
If you want to do whitebox monitoring, or have a dynamic or cloud based environment, then Prometheus is a good choice.
Sensu is an open source monitoring and observability pipeline with a commercial distribution which offers additional features for scalability. It can reuse existing Nagios plugins.
Sensu is an observability pipeline that focuses on processing and alerting of observability data as a stream of Events. It provides an extensible framework for event filtering, aggregation, transformation, and processing – including sending alerts to other systems and storing events in third-party systems. Sensu's event processing capabilities are similar in scope to Prometheus alerting rules and Alertmanager.
Sensu Events represent service health and/or metrics in a structured data format identified by an entity name (e.g. server, cloud compute instance, container, or service), an event name, and optional key-value metadata called "labels" or "annotations". The Sensu Event payload may include one or more metric points
, represented as a JSON object containing a name
, tags
(key/value pairs), timestamp
, and value
(always a float).
Sensu stores current and recent event status information and real-time inventory data in an embedded database (etcd) or an external RDBMS (PostgreSQL).
All components of a Sensu deployment can be clustered for high availability and improved event-processing throughput.
Sensu and Prometheus have a few capabilities in common, but they take very different approaches to monitoring. Both offer extensible discovery mechanisms for dynamic cloud-based environments and ephemeral compute platforms, though the underlying mechanisms are quite different. Both provide support for collecting multi-dimensional metrics via labels and annotations. Both have extensive integrations, and Sensu natively supports collecting metrics from all Prometheus exporters. Both are capable of forwarding observability data to third-party data platforms (e.g. event stores or TSDBs). Where Sensu and Prometheus differ the most is in their use cases.
Where Sensu is better:
Where Prometheus is better:
Sensu is maintained by a single commercial company following the open-core business model, offering premium features like closed-source event correlation and aggregation, federation, and support. Prometheus is a fully open source and independent project, maintained by a number of companies and individuals, some of whom also offer commercial services and support.
This documentation is open-source. Please help improve it by filing issues or pull requests.