Lossy Conservative Update (LCU) Sketch: Succinct Approximate Count Storage

Authors

  • Amit Goyal University of Maryland
  • Hal Daume University of Maryland

Abstract

In this paper, we propose a variant of the conservativeupdate Count-Min sketch to further reduce the overestimation error incurred. Inspired by ideas from lossy counting, we divide a stream of items into multiple windows, and decrement certain counts in the sketch at window boundaries. We refer to this approach as a lossy conservative update (LCU). The reduction in overestimation error of counts comes at the cost of introducing under-estimation error in counts. However, in our intrinsic evaluations, we show that the reduction in overestimation is much greater than the under-estimation error introduced by our method LCU. We apply our LCU framework to scale distributional similarity computations to web-scale corpora. We show that this technique is more efficient in terms of memory, and time, and more robust than conservative update with Count-Min (CU) sketch on this task.

Downloads

Published

2011-08-04

How to Cite

Goyal, A., & Daume, H. (2011). Lossy Conservative Update (LCU) Sketch: Succinct Approximate Count Storage. Proceedings of the AAAI Conference on Artificial Intelligence, 25(1), 878-883. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/7976

Issue

Section

AAAI Technical Track: Natural Language Processing