I have been confused about this for half a month until i figured this out. So what is Anti-Caching? And why is it needed? How is it different from the traditional buffer-pool system in all the databases today?
Dr. Stonebraker gave a talk where a small part of it talks about “Anti-Caching”. It is supposed to be a better way to use memory with databases dealing in large datasets
One of the things that came out of the SIGMOD 08 paper was that a traditional Buffer pool based systems has a lot of overhead like you can see from the figure below
As per the SIGMOD 2008 paper titled “OLTP Through the Looking Glass, and What We Found There” the traditional buffer pool based model implemented in the common off the shelf and open source databases was found to have prohibitions and penalties which looked something like above. Real work on average was done 12% of the time, and a large amount of time is spent on locking / latching and recovery mechanisms like logging and so on.
People have used other techniques to scale out better , like the famous “mysql + memcached” model. That’s one way to make use of extra memory you have in a better way.
While this model works well in many cases, there are issues relating to consistency , updates and writes, which require the cache to be refreshed at certain intervals, or the database needs to communicate with the cache about changes over a certain protocol. This setup also becomes increasingly difficult to scale and maintain as the size of the deployment increases.
A traditional buffer manager keeps “hot disk blocks” in main memory in disk format. . In an “Anti-Cache” system main memory data is kept in main-memory format even on disk, and this requires rethinking the concept of a buffer pool. Instead of keeping “hot” data in main memory, the DBMS instead pushes “cold” data out of the main memory address space. In effect, this is an “anti-cache”. When a query needs the “cold” data, it can request it to be swapped back in.
Researchers at brown university have been implementing an experimental database system called H-Store which seems to be the only database out there which implements the Anti-Caching system, and their benchmark shows some incredible results i.e their Anti-Cache implementation beats MySQL and MySQL + memcached by a factor of 15.
Just to sum it up, AntiCaching has the following differences from the buffer-pool based systems
- Anti-Cache system uses memory as primary storage, while the buffer pool system uses disk as the primary storage
- Anti-Cache keeps the memory segments / pages/ blocks on disk in the main memory format . Buffer pool system keeps data blocks/ pages/ segments in memory buffers in the original disk format
- Anti-Cache system is designed to “evict” “cold” data actively. Buffer Pool system is designed to “keep” “hot” data in memory.
- Anti-Cache system has only one copy of data, which is either in memory or on disk. Buffer Pool system has two copies, one in memory and the other on disk, therefore they need synchronization from time to time.
- S. Harizopoulos, D. J. Abadi, S. Madden, and M. Stonebraker, “OLTP through the looking glass, and what we found there,” inSIGMOD ’08: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, New York, NY, USA, 2008, pp. 981-992. [PDF] [BIBTEX]
- M. Stonebraker, S. Madden, D. J. Abadi, S. Harizopoulos, N. Hachem, and P. Helland, “The end of an Architectural Era: (It’s Time for a Complete Rewrite),” in VLDB ’07: Proceedings of the 33rd international conference on Very large data bases, 2007, pp. 1150-1160. [PDF] [BIBTEX]