Home → Magazine Archive → August 2011 (Vol. 54, No. 8) → Theory and Applications of b-Bit Minwise Hashing → Abstract

Theory and Applications of b-Bit Minwise Hashing

By Ping Li, Arnd Christian König

Communications of the ACM, Vol. 54 No. 8, Pages 101-109
10.1145/1978542.1978566

[article image]


Efficient (approximate) computation of set similarity in very large datasets is a common task with many applications in information retrieval and data management. One common approach for this task is minwise hashing.

The full text of this article is premium content

2 Comments

Joan Puigcerver i Perez

Isn't there an error in the first formula, when the article describes the "resemblance or Jaccard similarity, denoted by R"?

The numerator and denominator of the division are the same number (cardinalty of the union of S1 and S2), the result should be 1.

Regards.

Anonymous

Thanks, Joan

You are correct about the first formula; instead, please refer to equation (1), which has the correct definition of the Jaccard-Overlap.

We just checked and our final submission really did not have that error in the first formula. This error must have occurred later and we did not catch it during the review of the page proofs.

Hopefully the online version will be corrected.

Regards,
Ping and Christian

Displaying all 2 comments