Judy array

Summary

In computer science, a Judy array is a data structure implementing a type of associative array with high performance and low memory usage.[1] Unlike most other key-value stores, Judy arrays use no hashing, leverage compression on their keys (which may be integers or strings), and can efficiently represent sparse data; that is, they may have large ranges of unassigned indices without greatly increasing memory usage or processing time. They are designed to remain efficient even on structures with sizes in the peta-element range, with performance scaling on the order of O(log n).[2] Roughly speaking, Judy arrays are highly optimized 256-ary radix trees.[3]

Judy trees are usually faster than AVL trees, B-trees, hash tables and skip lists because they are highly optimized to maximize usage of the CPU cache. In addition, they require no tree balancing and no hashing algorithm is used.[4]

History edit

The Judy array was invented by Douglas Baskins and named after his sister.[5]

Benefits edit

Memory allocation edit

Judy arrays are dynamic and can grow or shrink as elements are added to, or removed from, the array. The memory used by Judy arrays is nearly proportional to the number of elements in the Judy array.

Speed edit

Judy arrays are designed to minimize the number of expensive cache-line fills from RAM, and so the algorithm contains much complex logic to avoid cache misses as often as possible. Due to these cache optimizations, Judy arrays are fast, especially for very large datasets. On data sets that are sequential or nearly sequential, Judy arrays can even outperform hash tables, since, unlike hash tables, the internal tree structure of Judy arrays maintains the ordering of the keys.[6]

Drawbacks edit

Judy arrays are extremely complicated. The smallest implementations are thousands of lines of code.[5] In addition, Judy arrays are optimized for machines with 64 byte cache lines, making them essentially unportable without a significant rewrite.[6]

See also edit

References edit

  1. ^ Robert Gobeille and Douglas Baskins' patent
  2. ^ "Debian -- Details of package libjudy-dev in buster".
  3. ^ Alan Silverstein, "Judy IV Shop Manual", 2002
  4. ^ "A 10-Minute Description of How Judy Arrays Work and Why They Are So Fast".
  5. ^ a b "Home". judy.sourceforge.net.
  6. ^ a b "A performance comparison of Judy to hash tables".

External links edit

  • Main Judy arrays site
  • How Judy arrays work and why they are so fast
  • A complete technical description of Judy arrays
  • An independent performance comparison of Judy to Hash Tables
  • A compact implementation of Judy arrays in 1250 lines of C code