Recently I merged a pull request into Prometheus that adds support to create past data for new recording rules. This work was for Prometheus Issue 11 which was created back in January 2013! Only 8 years later and this issue is finally implemented.
This post explains how to use this new capability.
The term “backfill” means to fill in missing time series data for a specific time range.
Recording rules allow you to precompute frequently needed or computationally expensive expressions and save their result as a new set of time series (ref: docs). When recording rules are created, their data only exists from creation time on, there is no historical data created for new rules. There are a few different use cases where having past rule data would be useful. For example,
- wanting to test the new recording rule right away instead of wait for it to populate
- wanting to create an alert using the recording rule right away
- making a dashboard using the new recording rule data
Promtool is a command line tool that ships with Prometheus that provides tooling for Prometheus. As of version 2.27 (estimated release date May 12, 2021),
promtool has a new subcommand that creates recording rule data for any period of time. This new subcommand is what makes it possible to fill in missing data for any recording rule.
Below is a walk thru on how to use this new feature. If interested in more details about how it all works under the hood, then read on after the examples.
Here are the high level steps to fill in past data for new recording rules:
- Have a running Prometheus server with flag
- Create a new recording rule
- Run the command
promtool tsdb create-blocks-from rulesto fill in past data for the new rule. Provide the Prometheus server address and port with the
- Validate the output of the command, make sure it contains the correct data. The output of the
promtool tsdb create-blocks-from rulesare blocks of data. By default they are created in 2 hour chunks.
- Manually move the blocks over to the running Prometheus server at the same location of
- Wait briefly for the next compaction to hit, then query for the new rule and confirm past data is there. Warning: after compaction runs, the data cannot be removed or reverted.
There should be a Prometheus server running. For this example, I will run a local Prometheus instance that is configured to scrape itself and a node exporter running on my laptop. I will let this run for an hour to collect data for this example:
$ cat prometheus.yml
- job_name: 'prometheus'
- targets: ['localhost:9090']
- job_name: 'node-exporter'
- targets: ['localhost:9100']$ prometheus --storage.tsdb.allow-overlapping-blocks
Create a rule file with a few example rules:
Now that the recording rules exist, I will fill in data for the past hour. Right now the time series data starts just a few minutes ago at 13:04. I will run this command to backfill from an hour ago to 13:04:
// the time to start filling in from as unix timestamp
$ date --date="1 hour ago" +%s
1617563353// end time, the first timestamp with no series data
// for the new recording rule
$ date --date="2021-04-04 13:04:06" +%s
1617566646$ promtool tsdb create-blocks-from rules \
--start 1617563353 --end 1617566646 \
--url http://localhost:9090 \
example-rule.yml// by default the output is written to ./data directory
$ ls data/
01F2F87M7M94Z0B0WY18ER5VKA 01F2F87M9HKVVRGENWZMR5KYQ1 01F2F87MAR7YEX8NZ0WA77YNZ4 01F2F87MB6A582X8A4A9H36VHB 01F2F87MBW23QN3M8NWNF2Y4HQ 01F2F87MCMNBZTPMRRW9GW4XN1
The output is blocks of time series data for the recording rules. The content can be inspected with:
$ promtool tsdb list -r data/
There are 6 blocks total, 2 for each rule that was backfilled. By default blocks are created in 2 hour chunks, but here the first set of blocks are only 52 mins to maintain block alignment. The default evaluation interval is 60s so there should be 1 sample per minute.
$ promtool tsdb dump data/ will show the raw data.
The final step is to manually move these newly created blocks over to the running Prometheus data directory at
$ mv data/ $PROM_DATA_DIR
After the new blocks are moved to be with Prometheus data, looking in the Prometheus logs, there should be a compaction event relatively quickly:
level=info ts=2021-04-04T20:23:51.035Z caller=compact.go:686 component=tsdb msg="Found overlapping blocks during compaction" ulid=01F2F8WMZ71M4VVZA3JJWEH3DNlevel=info ts=2021-04-04T20:23:51.043Z caller=compact.go:448 component=tsdb msg="compact blocks" count=6 mint=1617563241870 maxt=1617566601871 ulid=01F2F8WMZ71M4VVZA3JJWEH3DN sources="[01F2F87M7M94Z0B0WY18ER5VKA 01F2F87MAR7YEX8NZ0WA77YNZ4 01F2F87MBW23QN3M8NWNF2Y4HQ 01F2F87M9HKVVRGENWZMR5KYQ1 01F2F87MB6A582X8A4A9H36VHB 01F2F87MCMNBZTPMRRW9GW4XN1]" duration=28.268502ms
Once Prometheus runs compaction, the recording rules can be viewed in the Prometheus graph to confirm all the data is now there for the backfilled time range.
Also to confirm it worked well I will compare the recording rule data to executing the query in the Prometheus graph:
- Issue to Persist Retroactive Rule Reevaluations
- Documentation to backfill recording rules.
- Prometheus documentation
- Recording rule documentation
Under the Hood
A brief description of what the code is doing to accomplish this:
- A Rule Manager is created to parse the recording rule files. This is the same code that Prometheus uses to process recording rules.
- Requests are made to the Prometheus API
QueryRangeendpoint using the Prometheus Go client library (ref). The
QueryRangeAPI evaluates the recording rule expression against existing time series data.
- The response returned from the API contains samples with the timestamps and values for the recording rule. This is used to create new series that are written to tsdb blocks.
See original PR for more details if interested.
While this feature does work, there are still many manual steps. It would be nice to have another command that can validate the output of the rule backfiller and also move the blocks over to the Prometheus server data directory.
I hope this new feature is helpful! If you end up using this feature and have any recommendations on how to improve, please reach out to me and let me know.