hll-rs
hll-rs is a high-performance, thread-safe implementation of the HyperLogLog
algorithm for cardinality estimation.
Why HyperLogLog?
If you need to count the number of unique elements (cardinality) in a data stream
of billions of items, a standard [std::collections::HashSet] would consume
gigabytes of memory.
HyperLogLog provides a probabilistic estimate with a fixed, tiny memory footprint.
For example, with a precision of P=14, you can estimate cardinalities into the
billions with an error rate of ~0.8% using only 16 KiB of memory.
Key Features
- Lock-Free Concurrency: Uses atomic registers ([
std::sync::atomic::AtomicU8]), allowing multiple threads to update the same estimator simultaneously without a [std::sync::Mutex]. - Zero-Cost Abstractions: Leverages Const Generics to ensure that register sizes and bit-shifting logic are optimized at compile time.
- Flexible Hashing: Compatible with any [
std::hash::BuildHasher], allowing you to swap between secure (SipHash) and fast (FxHash/AHash) algorithms. - Memory Efficient: Uses an [
std::sync::Arc] over a boxed slice to minimize heap metadata and allow cheap cloning.
Quick Start
use hll_rs::Hll;
// Create a new HyperLogLog with precision 14 (16,384 registers)
let hll = Hll::<14>::new();
// Add items from multiple threads if needed
hll.add("rust");
hll.add("programming");
hll.add("rust"); // Duplicate
println!("Estimated unique items: {}", hll.estimate_count());
Mathematical Bounds
Precision (P) |
Registers (m) |
Memory Usage | Standard Error |
|---|---|---|---|
| 10 | 1,024 | 1 KiB | ~3.25% |
| 14 | 16,384 | 16 KiB | ~0.81% |
| 16 | 65,536 | 64 KiB | ~0.40% |
| 18 | 262,144 | 256 KiB | ~0.20% |