Skip to content

PrometheusMiddleware /metrics endpoint only serves metrics from one worker process #590

@pegaslee

Description

@pegaslee

Taskiq version

0.11.20

Python version

Python 3.12

OS

Linux

What happened?

Description

PrometheusMiddleware correctly sets up PROMETHEUS_MULTIPROC_DIR and uses mmap-backed metrics for multi-process support. However, the HTTP server started in startup() calls prometheus_client.start_http_server() with the default REGISTRY, which does not include a MultiProcessCollector. This means the /metrics endpoint only serves metrics from the single child process that successfully bound the port — metrics recorded by sibling worker processes are silently missing.

Root Cause

In prometheus_middleware.py, the startup() method:

def startup(self) -> None:
    from prometheus_client import start_http_server
    if self.broker.is_worker_process:
        try:
            start_http_server(port=self.server_port, addr=self.server_addr)
        except OSError as exc:
            logger.debug("Cannot start prometheus server: %s", exc)

start_http_server() defaults to using REGISTRY (the global default registry). In multiprocess mode, the prometheus_client documentation states:

The application must initialize a new CollectorRegistry, and store the multi-process collector inside. It is a best practice to create this registry inside the context of a request to avoid metrics registering themselves to a collector used by a MultiProcessCollector. If a registry with metrics registered is used by a MultiProcessCollector duplicate metrics may be exported, one for multiprocess, and one for the process serving the request.

Since start_http_server uses the default REGISTRY rather than a per-request CollectorRegistry with MultiProcessCollector, the /metrics endpoint only serves metrics visible to the single process that owns the HTTP server socket.

Impact

With the default workers=2, taskiq spawns 2 child processes per pod. Only one successfully binds the metrics port (the other catches OSError). The process serving /metrics only reports its own Counters, Histograms, and Gauges — all metrics from the other worker process are invisible to Prometheus.

This affects both:

  • The middleware's built-in metrics (received_tasks, success_tasks, execution_time, etc.)
  • Any user-defined metrics registered in the same multiprocess directory

Proof

On a running worker pod with workers=2 (PIDs 8 and 9), we compared the middleware's own received_tasks and success_tasks Counters as seen by MultiProcessCollector (which reads all mmap files) vs the actual /metrics endpoint (which uses the default REGISTRY).

Script run on the pod:

import os
os.environ['PROMETHEUS_MULTIPROC_DIR'] = '/tmp/taskiq_worker'
from prometheus_client import CollectorRegistry, multiprocess, generate_latest
import urllib.request

# MultiProcessCollector reads ALL mmap files
registry = CollectorRegistry()
multiprocess.MultiProcessCollector(registry)

print('=== MultiProcessCollector (reads all mmap files) ===')
for metric in registry.collect():
    for sample in metric.samples:
        if 'received_tasks' in sample.name or 'success_tasks' in sample.name:
            print(f'{sample.name}{sample.labels} = {sample.value}')

print()
print('=== curl localhost:9000 (what Prometheus actually scrapes) ===')
data = urllib.request.urlopen('http://localhost:9000').read().decode()
for line in data.splitlines():
    if ('received_tasks' in line or 'success_tasks' in line) and not line.startswith('#'):
        print(line)

Results — MultiProcessCollector sees 6 received tasks across 3 task types:

received_tasks_total{'task_name': 'myapp.collector.tasks:run_collector'} = 4.0
received_tasks_total{'task_name': 'myapp.report.tasks:run_report'} = 1.0
received_tasks_total{'task_name': 'myapp.tracking.tasks:sync_event_tracking'} = 1.0
success_tasks_total{'task_name': 'myapp.collector.tasks:run_collector'} = 4.0
success_tasks_total{'task_name': 'myapp.report.tasks:run_report'} = 1.0
success_tasks_total{'task_name': 'myapp.tracking.tasks:sync_event_tracking'} = 1.0

But curl localhost:9000 only sees 3 received tasks across 2 task types:

received_tasks_total{task_name="myapp.collector.tasks:run_collector"} 2.0
received_tasks_total{task_name="myapp.report.tasks:run_report"} 1.0
success_tasks_total{task_name="myapp.collector.tasks:run_collector"} 2.0
success_tasks_total{task_name="myapp.report.tasks:run_report"} 1.0

The sync_event_tracking task and 2 of the 4 run_collector invocations are completely missing from the scraped output — these ran on the sibling worker process (PID 9), not the process serving HTTP (PID 8). The middleware recorded them correctly to the mmap files, but start_http_server using the default REGISTRY never reads them.

Steps to Reproduce

  1. Start a taskiq worker with --workers 2 (the default) and PrometheusMiddleware
  2. Send tasks that get distributed across both child processes
  3. Scrape /metrics via curl localhost:9000
  4. Compare with reading mmap files directly via MultiProcessCollector:
import os
os.environ['PROMETHEUS_MULTIPROC_DIR'] = '/tmp/taskiq_worker'
from prometheus_client import CollectorRegistry, multiprocess
registry = CollectorRegistry()
multiprocess.MultiProcessCollector(registry)
for metric in registry.collect():
    for sample in metric.samples:
        print(f'{sample.name}{sample.labels} = {sample.value}')

The MultiProcessCollector script shows metrics from both PIDs, but curl localhost:9000 only shows metrics from the process that owns the HTTP server.

Relevant log output

Broker initialization code

# broker.py — minimal reproduction
from taskiq import PrometheusMiddleware
from taskiq_redis import ListQueueBroker, RedisAsyncResultBackend

redis_url = "redis://localhost:6379/0"

broker = ListQueueBroker(redis_url).with_result_backend(
    RedisAsyncResultBackend(redis_url),
).with_middlewares(
    PrometheusMiddleware(
        server_addr="0.0.0.0",
        server_port=9000,
    ),
)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions