Four Minute Paper: Facebook’s time series database, Gorilla

Four minute papers (inspired by fourminutebooks.com) aims to condense computing white papers down to a four minute summary.

Here is a four minute summary for Facebook’s time series database Gorilla white paper.

Premise for the paper

Back in 2013 Facebook’s monitoring system, an HBase time series db (TSDB), was not scaling sufficiently. Reads from the system were becoming too slow. Since there wasn’t a TSDB solution on the market that addressed their need of storing massive amounts of data in real-time, Facebook developed Gorilla, an in-memory TSDB.

Whats unique about Gorilla

The attributes that set Gorilla apart from other TSDBs were that its an in-memory TSDB that functions as a write through cache for monitoring data and that it prioritizes reads/writes over availability of older data.

Use cases for Gorilla

  • Observability for hundreds of different systems, focusing on reliable real-time monitoring of production systems.
  • Identify issues with new software releases.
  • Be highly available and continue to operate with network failures.
  • Be fault tolerant with multi-region architecture.

Attributes of Gorilla

  • Gorilla is a write thru cache.
  • It’s aggregate data is more important than individual data points.
  • Recent data points are of higher value (85% of queries read past 26 hrs of data).
  • ACID guarantees are not necessary, but a high percentage of writes must succeed.
  • Optimized for being available for reads and writes in the face of failures prioritized over availability of any older data.

Challenges

  • High data insertion rate.
  • Total data quantity.
  • Real-time aggregation.
  • Reliability requirements.

Solutions

  • In-memory cache (past 26 hrs) of the on-disk persistent store.
  • Compressed data to reduce size of cache 12x.
  • Multi-instance Gorilla architecture (reads go to closest instance) for fault tolerance.

Architecture

Details for these 4 parts of the architecture:

  • Compression algorithm
  • In-memory data structure
  • Persistent on-disk data structure
  • Fault tolerance

Monitoring data is a 3 tuple — string, 64 bit time stamp integer, double precision floating point value. A new time series compression algorithm is used to compress each by series down from 16 bytes to an average of 1.37 bytes, a 12x reduction in size. A single data point is a pair of 64 bit values, one is the timestamp and the other is the value. The timestamp is compressed with delta-of-delta compression. The values use XOR compression.

The in-memory data structure used is a time series map (TSmap) and it allows for fast scans with constant time lookup for individual series. Data is written to two hosts in different regions for fault tolerance. GlusterFS is used for persistent storage.

Conclusion

Gorilla reduced Facebook’s production latency by 70x compared to their on-disk TSDB and has doubled in size with relatively low admin costs to do so all while maintaining high availability.

Live simply. Program stuff.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store