An Icinga check plugin to check Prometheus.
Usage:
check_prometheus [flags]
check_prometheus [command]
Available Commands:
alert Checks the status of a Prometheus alert
health Checks the health or readiness status of the Prometheus server
query Checks the status of a Prometheus query
Flags:
-H, --hostname string Hostname of the Prometheus server (CHECK_PROMETHEUS_HOSTNAME) (default "localhost")
-p, --port int Port of the Prometheus server (default 9090)
-U, --url string URL/Path to append to the Promethes Hostname (CHECK_PROMETHEUS_URL) (default "/")
-s, --secure Use a HTTPS connection
-i, --insecure Skip the verification of the server's TLS certificate
-b, --bearer string Specify the Bearer Token for server authentication (CHECK_PROMETHEUS_BEARER)
-u, --user string Specify the user name and password for server authentication <user:password> (CHECK_PROMETHEUS_BASICAUTH)
--ca-file string Specify the CA File for TLS authentication (CHECK_PROMETHEUS_CA_FILE)
--cert-file string Specify the Certificate File for TLS authentication (CHECK_PROMETHEUS_CERT_FILE)
--key-file string Specify the Key File for TLS authentication (CHECK_PROMETHEUS_KEY_FILE)
-t, --timeout int Timeout in seconds for the CheckPlugin (default 30)
--header strings Additional HTTP header to include in the request. Can be used multiple times.
Keys and values are separated by a colon (--header "X-Custom: example").
-h, --help help for check_prometheus
-v, --version version for check_prometheusThe check plugin respects the environment variables HTTP_PROXY, HTTPS_PROXY and NO_PROXY.
Various flags can be set with environment variables, refer to the help to see which flags.
In the case Prometheus runs behind a reverse proxy, the --url parameter can be used:
# https://monitoring.example.com:443/subpath
$ check_prometheus health -H 'monitoring.example.com' --port 443 --secure --url /subpath
OK - Prometheus Server is Healthy. | statuscode=200Checks the health or readiness status of the Prometheus server.
Health: Checks the health of an endpoint, which returns OK if the Prometheus server is healthy.Ready: Checks the readiness of an endpoint, which returns OK if the Prometheus server is ready to serve traffic (i.e. respond to queries).
Usage:
check_prometheus health [flags]
Examples:
$ check_prometheus health --hostname 'localhost' --port 9090 --insecure
OK - Prometheus Server is Healthy. | statuscode=200
Flags:
-r, --ready Checks the readiness of an endpoint
-I, --info Displays various build information properties about the Prometheus server
-h, --help help for health$ check_prometheus health --hostname 'localhost' --port 9090 --insecure
OK - Prometheus Server is Healthy. | statuscode=200
$ check_prometheus health --ready
OK - Prometheus Server is Ready. | statuscode=200Checks the status of a Prometheus query and evaluates the result of the alert.
The warning and critical support thresholds in the common Nagios format (e.g. ~:10).
Note: Time range values e.G. 'go_memstats_alloc_bytes_total[10s]', only the latest value will be evaluated, other values will be ignored!
Usage:
check_prometheus query [flags]
Examples:
$ check_prometheus query -q 'go_gc_duration_seconds_count' -c 5000 -w 2000
CRITICAL - 2 Metrics: 1 Critical - 0 Warning - 1 Ok
\_[OK] go_gc_duration_seconds_count{instance="localhost:9090", job="prometheus"} - value: 1599
\_[CRITICAL] go_gc_duration_seconds_count{instance="node-exporter:9100", job="node-exporter"} - value: 79610
| value_go_gc_duration_seconds_count_localhost:9090_prometheus=1599 value_go_gc_duration_seconds_count_node-exporter:9100_node-exporter=79610
Flags:
-q, --query string An Prometheus query which will be performed and the value result will be evaluated
-w, --warning string The warning threshold for a value (default "10")
-c, --critical string The critical threshold for a value (default "20")
-h, --help help for query$ check_prometheus query -q 'go_goroutines{job="prometheus"}' -c 40 -w 27
WARNING - 1 Metrics: 0 Critical - 1 Warning - 0 Ok
\_[WARNING] go_goroutines{instance="localhost:9090", job="prometheus"} - value: 37
| value_go_goroutines_localhost:9090_prometheus=37$ check_prometheus query -q 'go_goroutines' -c 40 -w 27
WARNING - 2 Metrics: 0 Critical - 1 Warning - 1 Ok
\_[WARNING] go_goroutines{instance="localhost:9090", job="prometheus"} - value: 37
\_[OK] go_goroutines{instance="node-exporter:9100", job="node-exporter"} - value: 7
| value_go_goroutines_localhost:9090_prometheus=37 value_go_goroutines_node-exporter:9100_node-exporter=7Hint: Currently only the latest value will be evaluated, other values will be ignored.
$ check_prometheus query -q 'go_goroutines{job="prometheus"}[10s]' -c5 -w 10
CRITICAL - 1 Metrics: 1 Critical - 0 Warning - 0 Ok
\_[CRITICAL] go_goroutines{instance="localhost:9090", job="prometheus"} - value: 37
| value_go_goroutines_localhost:9090_prometheus=37
$ check_prometheus query -q 'go_goroutines[10s]' -c 50 -w 40
OK - 2 Metrics OK | value_go_goroutines_localhost:9090_prometheus=37 value_go_goroutines_node-exporter:9100_node-exporter=7Checks the status of a Prometheus alert and evaluates the status of the alert.
Usage:
check_prometheus alert [flags]
Examples:
$ check_prometheus alert --name "PrometheusAlertmanagerJobMissing"
CRITICAL - 1 Alerts: 1 Firing - 0 Pending - 0 Inactive
\_[CRITICAL] [PrometheusAlertmanagerJobMissing] - Job: [alertmanager] is firing - value: 1.00
| firing=1 pending=0 inactive=0
$ check_prometheus a alert --name "PrometheusAlertmanagerJobMissing" --name "PrometheusTargetMissing"
CRITICAL - 2 Alerts: 1 Firing - 0 Pending - 1 Inactive
\_[OK] [PrometheusTargetMissing] is inactive
\_[CRITICAL] [PrometheusAlertmanagerJobMissing] - Job: [alertmanager] is firing - value: 1.00
| total=2 firing=1 pending=0 inactive=1
Flags:
-S, --label-key-state string Use the given AlertRule label to override the exit state for firing alerts.
If this flag is set the plugin looks for warning/critical/ok in the provided label key
--exclude-alert stringArray Alerts to ignore. Can be used multiple times and supports regex.
--exclude-label stringArray The label of one or more specific alerts to exclude.
This parameter can be repeated e.g.: '--exclude-label prio=high --exclude-label another=example'
-g, --group strings The name of one or more specific groups to check for alerts.
This parameter can be repeated e.g.: '--group group1 --group group2'
If no group is given, all groups will be scanned for alerts
-h, --help help for alert
--include-label stringArray The label of one or more specific alerts to include.
This parameter can be repeated e.g.: '--include-label prio=high --include-label another=example'
Note that repeated --include-label are combined using a union.
-n, --name strings The name of one or more specific alerts to check.
This parameter can be repeated e.g.: '--name alert1 --name alert2'
If no name is given, all alerts will be evaluated
-T, --no-alerts-state string State to assign when no alerts are found (0, 1, 2, 3, OK, WARNING, CRITICAL, UNKNOWN). If not set this defaults to OK (default "OK")
-P, --problems Display only alerts which status is not inactive/OK. Note that in combination with the --name flag this might result in no alerts being displayedThe --label-key-state can be used to override the exit code for firing alerts.
When the flag is set, the plugin looks for the given label key on the AlertRule and uses
the specified as label value (warning/critical/ok) as exit code.
An invalid value will result in an UNKNOWN exit code.
$ check_prometheus alert
CRITICAL - 6 Alerts: 3 Firing - 0 Pending - 3 Inactive
\_[OK] [PrometheusTargetMissing] is inactive
\_[CRITICAL] [PrometheusAlertmanagerJobMissing] - Job: [alertmanager] is firing - value: 1.00
\_[OK] [HostOutOfMemory] - Job: [alertmanager]
\_[OK] [HostHighCpuLoad] - Job: [alertmanager]
\_[CRITICAL] [HighResultLatency] - Job: [prometheus] on Instance: [localhost:9090] is firing - value: 11.00
\_[CRITICAL] [HighResultLatency] - Job: [node-exporter] on Instance: [node-exporter:9100] is firing - value: 10.00
| total=6 firing=3 pending=0 inactive=3
$ check_prometheus alert --name "HostHighCpuLoad" --name "HighResultLatency"
CRITICAL - 3 Alerts: 2 Firing - 0 Pending - 1 Inactive
\_[OK] [HostHighCpuLoad] is inactive
\_[CRITICAL] [HighResultLatency] - Job: [prometheus] on Instance: [localhost:9090] is firing - value: 11.00
\_[CRITICAL] [HighResultLatency] - Job: [node-exporter] on Instance: [node-exporter:9100] is firing - value: 10.00
| total=3 firing=2 pending=0 inactive=1$ check_prometheus alert --name "HostHighCpuLoad" --name "PrometheusTargetMissing"
OK - Alerts inactive | total=2 firing=0 pending=0 inactive=2Copyright (c) 2022 NETWAYS GmbH
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see gnu.org/licenses.