Open
Description
I am studying how to use Prometheus in multiprocessing python code (without Gunicorn).
For now i came to following simple example with two child processes:
from multiprocessing import Process
import shutil
import time, os
os.environ["PROMETHEUS_MULTIPROC_DIR"] = './PMDir' # This environment variable must be set BEFORE the first import
# from prometheus_client to ensure the library uses MultiProcessValue
# (metric value class with mmap support). Otherwise,
# MutexValue (non-mmap) will be used, and files won't be created.
from prometheus_client import start_http_server, multiprocess, CollectorRegistry, Counter
os.environ["PROMETHEUS_MULTIPROC_DIR"] = './PMDir' # If we set it AFTER import, the library initializes
# the value class = MutexValue, mmap files will not be created,
# and MultiProcessCollector will have nothing to work with.
# This matches the documentation's recommendation to set it
# before app startup.
COUNTER1 = None
COUNTER2 = None
COUNTER3 = None
COUNTER4 = None
def init_counters(registry):
"""
Initialize counters (or other metrics) and register them in the specified registry.
According to MetricWrapperBase's code, if registry=None, metrics won't be registered in any registry.
"""
global COUNTER1, COUNTER2, COUNTER3, COUNTER4
COUNTER1 = Counter('counter1', 'Incremented by the first child process', registry=registry)
COUNTER2 = Counter('counter2', 'Incremented by the second child process', registry=registry)
COUNTER3 = Counter('counter3', 'Incremented by all processes', registry=registry)
COUNTER4 = Counter('counter4', 'Incremented by main process', registry=registry)
# We are free not to create registry object in child processes. Both f1 and f2 works as process targets.
# The mmap file handling is managed at the metric object level, not the collector level.
# Variation 1: create registry or not to create registry - both works as I expect.
def f1():
"""First child process body. Works without manual registry creation."""
init_counters(None)
while True:
time.sleep(1)
print("Child process 1", os.getpid())
COUNTER1.inc()
COUNTER3.inc()
def f2():
"""Second child process body. Works with manual registry creation."""
registry = CollectorRegistry()
init_counters(registry)
while True:
time.sleep(2)
print("Child process 2", os.getpid())
COUNTER2.inc()
COUNTER3.inc()
if __name__ == '__main__':
# Ensure the multiprocess directory exists and is empty
prome_stats = os.environ["PROMETHEUS_MULTIPROC_DIR"]
if os.path.exists(prome_stats):
shutil.rmtree(prome_stats)
os.mkdir(prome_stats)
# Variation 2: When using MultiProcessCollector directly (see Variation 4), registry creation is optional
# registry = CollectorRegistry() # Create registry for HTTP server
registry = None # Works without registry between mpc and http server
# Create MultiProcessCollector object. It reads and aggregates mmap files from PROMETHEUS_MULTIPROC_DIR.
# Registering it in our registry means thar registry.collect() calls mpc.collect() and thus metrics from
# mpc aggregated by registry.
# MultiProcessCollector ONLY reads (mmap) files; metric saving is handled by MultiProcessValue in metrics obj.
mpc = multiprocess.MultiProcessCollector(registry)
# Variation 3: If main process have to report its own metrics it can use both separate CollectorRegistry or None.
# init_counters(CollectorRegistry()) # Use separate registry for main process metrics.
init_counters(None) # Works without registry registration
# init_counters(registry) # Metrics will duplicate if we use same registry as that one mpc registered in.
# # More precisely, in this situation registry will export both mpc metrics and
# # main process metrics, including overlaps
# Variation 4: HTTP server can use both registry or mpc
# start_http_server(8000, registry=registry) # Standard approach
start_http_server(8000, registry=mpc) # Works despite mpc not being a Collector subclass (but implements collect())
p1 = Process(target=f1, args=())
p1.start()
p2 = Process(target=f2, args=())
p2.start()
print("collect")
try:
while True:
print('main process ', os.getpid())
time.sleep(1)
COUNTER3.inc()
COUNTER4.inc()
except KeyboardInterrupt:
p1.terminate()
p2.terminate()
shutil.rmtree(prome_stats)
(I left my findings as comments because documentation didnt answer my questions and I can make mistakes in my assumptions)
I noticed that MultiProcessCollector
implements collect()
but doesn't inherit from the Collector
class.
The Custom collectors documentation suggests that custom collectors should implement this interface. However, MultiProcessCollector
is defined as:
class MultiProcessCollector:
"""Collector for files for multi-process mode."""
def __init__(self, registry, path=None):
...
This seems contradictory since:
- The class name contains "Collector"
- It implements the core
collect()
method - It's designed to be registered in a
CollectorRegistry
Could you clarify:
- Is this intentional design? If yes, what's the rationale behind not inheriting from
Collector
? - Are there any potential compatibility risks in not following the
Collector
interface contract? - What are potential risks of using
MultiProcessCollector
object instart_http_server
directly, withoutregistry = CollectorRegistry()
between them?
I want to ensure I understand this correctly for proper integration. Thanks for your insights!
Metadata
Metadata
Assignees
Labels
No labels