Four Minute Paper: Facebook’s time series database, Gorilla

Four minute papers (inspired by fourminutebooks.com) aims to condense computing white papers down to a four minute summary.

Here is a four minute summary for Facebook’s time series database Gorilla white paper.

Premise for the paper

Whats unique about Gorilla

Use cases for Gorilla

  • Identify issues with new software releases.
  • Be highly available and continue to operate with network failures.
  • Be fault tolerant with multi-region architecture.

Attributes of Gorilla

  • It’s aggregate data is more important than individual data points.
  • Recent data points are of higher value (85% of queries read past 26 hrs of data).
  • ACID guarantees are not necessary, but a high percentage of writes must succeed.
  • Optimized for being available for reads and writes in the face of failures prioritized over availability of any older data.

Challenges

  • Total data quantity.
  • Real-time aggregation.
  • Reliability requirements.

Solutions

  • Compressed data to reduce size of cache 12x.
  • Multi-instance Gorilla architecture (reads go to closest instance) for fault tolerance.

Architecture

ref: https://www.vldb.org/pvldb/vol8/p1816-teller.pdf

Details for these 4 parts of the architecture:

  • Compression algorithm
  • In-memory data structure
  • Persistent on-disk data structure
  • Fault tolerance

Monitoring data is a 3 tuple — string, 64 bit time stamp integer, double precision floating point value. A new time series compression algorithm is used to compress each by series down from 16 bytes to an average of 1.37 bytes, a 12x reduction in size. A single data point is a pair of 64 bit values, one is the timestamp and the other is the value. The timestamp is compressed with delta-of-delta compression. The values use XOR compression.

The in-memory data structure used is a time series map (TSmap) and it allows for fast scans with constant time lookup for individual series. Data is written to two hosts in different regions for fault tolerance. GlusterFS is used for persistent storage.

Conclusion

Live simply. Program stuff.