Monitoring Apache with SQL and Grafana
Briefly

"All our metrics and even our billing was based on the Apache logs. We had a system that ingested the logs into a PostgreSQL database, and we tried to create Grafana panels and alerts based on that info. At the same time, I wanted to reproduce in Grafana, and found it was almost impossible."
"Another problem is that the usual tools to solve this, Loki or Prometheus, have big problems to handle this type of too arbitrary data (think of the referer or user_agent columns) or whose space is too big ( client is an IPv4, with 4Bi different values). They effectively suffer (in principle) of what they call "cardinality bomb": since they build one time series database (TSDB) per combination of fields (they call them "labels"), storage use is big and aggregation operations inter TSDBs are expensive."
"Last night I sat down to reimplement the ingestion side. Instead of PostgreSQL I used SQLite mostly because almost all of my services (with low traffic and mostly only me as user) already use it. To be fair, and one really can't expect anything else, the script is quite straight forward. It uses regexps to parse the logs, which for the moment is good enough."
"I'm "releasing" it as is, because I'm tired, but you'll find some surprises around parsing the request line (see request_re and its handling); some janky ways to convert from str to int or datetime; and an iteration trick to use dataclasses as execute() argument. I omited some comments and all the testing:"
An internet-facing API used Apache as a router, with metrics and billing derived from Apache access logs. A log ingestion pipeline stored parsed log data in PostgreSQL, but Grafana panels and alerts were difficult to reproduce from that setup. Common observability tools like Loki or Prometheus struggle with highly arbitrary fields such as referer and user_agent, and with very large label spaces like client IP addresses, leading to cardinality issues and expensive aggregation. A reimplementation focused on the ingestion side, using SQLite for simplicity because most services already use it. The script parses logs with regex, includes some rough handling for request-line parsing, type conversions, and a dataclass iteration approach, and omits comments and testing.
[
|
]