Hardness of bichromatic closest pair with Jaccard similarity

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

Hardness of bichromatic closest pair with Jaccard similarity. / Pagh, Rasmus; Stausholm, Nina Mesing; Thorup, Mikkel.

27th Annual European Symposium on Algorithms, ESA 2019. ed. / Michael A. Bender; Ola Svensson; Grzegorz Herman. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2019. 74 (Leibniz International Proceedings in Informatics, LIPIcs, Vol. 144).

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Pagh, R, Stausholm, NM & Thorup, M 2019, Hardness of bichromatic closest pair with Jaccard similarity. in MA Bender, O Svensson & G Herman (eds), 27th Annual European Symposium on Algorithms, ESA 2019., 74, Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, Leibniz International Proceedings in Informatics, LIPIcs, vol. 144, 27th Annual European Symposium on Algorithms, ESA 2019, Munich/Garching, Germany, 09/09/2019. https://doi.org/10.4230/LIPIcs.ESA.2019.74

APA

Pagh, R., Stausholm, N. M., & Thorup, M. (2019). Hardness of bichromatic closest pair with Jaccard similarity. In M. A. Bender, O. Svensson, & G. Herman (Eds.), 27th Annual European Symposium on Algorithms, ESA 2019 [74] Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. Leibniz International Proceedings in Informatics, LIPIcs Vol. 144 https://doi.org/10.4230/LIPIcs.ESA.2019.74

Vancouver

Pagh R, Stausholm NM, Thorup M. Hardness of bichromatic closest pair with Jaccard similarity. In Bender MA, Svensson O, Herman G, editors, 27th Annual European Symposium on Algorithms, ESA 2019. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. 2019. 74. (Leibniz International Proceedings in Informatics, LIPIcs, Vol. 144). https://doi.org/10.4230/LIPIcs.ESA.2019.74

Author

Pagh, Rasmus ; Stausholm, Nina Mesing ; Thorup, Mikkel. / Hardness of bichromatic closest pair with Jaccard similarity. 27th Annual European Symposium on Algorithms, ESA 2019. editor / Michael A. Bender ; Ola Svensson ; Grzegorz Herman. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2019. (Leibniz International Proceedings in Informatics, LIPIcs, Vol. 144).

Bibtex

@inproceedings{87c4076583a84222b363689c18c93cd2,
title = "Hardness of bichromatic closest pair with Jaccard similarity",
abstract = "Consider collections A and B of red and blue sets, respectively. Bichromatic Closest Pair is the problem of finding a pair from A × B that has similarity higher than a given threshold according to some similarity measure. Our focus here is the classic Jaccard similarity |a ∩ b|/|a ∪ b| for (a, b) ∈ A × B. We consider the approximate version of the problem where we are given thresholds j1 > j2 and wish to return a pair from A × B that has Jaccard similarity higher than j2 if there exists a pair in A × B with Jaccard similarity at least j1. The classic locality sensitive hashing (LSH) algorithm of Indyk and Motwani (STOC{\textquoteright}98), instantiated with the MinHash LSH function of Broder et al., solves this problem in {\~O}(n2−δ) time if j1 ≥ j21−δ. In particular, for δ = Ω(1), the approximation ratio j1/j2 = 1/j2δ increases polynomially in 1/j2. In this paper we give a corresponding hardness result. Assuming the Orthogonal Vectors Conjecture (OVC), we show that there cannot be a general solution that solves the Bichromatic Closest Pair problem in O(n2−Ω(1)) time for j1/j2 = 1/j2o(1). Specifically, assuming OVC, we prove that for any δ > 0 there exists an ε > 0 such that Bichromatic Closest Pair with Jaccard similarity requires time Ω(n2−δ) for any choice of thresholds j2 < j1 < 1 − δ, that satisfy j1 ≤ j21−ε",
keywords = "Bichromatic closest pair, Fine-grained complexity, Jaccard similarity, Set similarity search",
author = "Rasmus Pagh and Stausholm, {Nina Mesing} and Mikkel Thorup",
year = "2019",
doi = "10.4230/LIPIcs.ESA.2019.74",
language = "English",
series = "Leibniz International Proceedings in Informatics, LIPIcs",
publisher = "Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing",
editor = "Bender, {Michael A.} and Ola Svensson and Grzegorz Herman",
booktitle = "27th Annual European Symposium on Algorithms, ESA 2019",
note = "27th Annual European Symposium on Algorithms, ESA 2019 ; Conference date: 09-09-2019 Through 11-09-2019",

}

RIS

TY - GEN

T1 - Hardness of bichromatic closest pair with Jaccard similarity

AU - Pagh, Rasmus

AU - Stausholm, Nina Mesing

AU - Thorup, Mikkel

PY - 2019

Y1 - 2019

N2 - Consider collections A and B of red and blue sets, respectively. Bichromatic Closest Pair is the problem of finding a pair from A × B that has similarity higher than a given threshold according to some similarity measure. Our focus here is the classic Jaccard similarity |a ∩ b|/|a ∪ b| for (a, b) ∈ A × B. We consider the approximate version of the problem where we are given thresholds j1 > j2 and wish to return a pair from A × B that has Jaccard similarity higher than j2 if there exists a pair in A × B with Jaccard similarity at least j1. The classic locality sensitive hashing (LSH) algorithm of Indyk and Motwani (STOC’98), instantiated with the MinHash LSH function of Broder et al., solves this problem in Õ(n2−δ) time if j1 ≥ j21−δ. In particular, for δ = Ω(1), the approximation ratio j1/j2 = 1/j2δ increases polynomially in 1/j2. In this paper we give a corresponding hardness result. Assuming the Orthogonal Vectors Conjecture (OVC), we show that there cannot be a general solution that solves the Bichromatic Closest Pair problem in O(n2−Ω(1)) time for j1/j2 = 1/j2o(1). Specifically, assuming OVC, we prove that for any δ > 0 there exists an ε > 0 such that Bichromatic Closest Pair with Jaccard similarity requires time Ω(n2−δ) for any choice of thresholds j2 < j1 < 1 − δ, that satisfy j1 ≤ j21−ε

AB - Consider collections A and B of red and blue sets, respectively. Bichromatic Closest Pair is the problem of finding a pair from A × B that has similarity higher than a given threshold according to some similarity measure. Our focus here is the classic Jaccard similarity |a ∩ b|/|a ∪ b| for (a, b) ∈ A × B. We consider the approximate version of the problem where we are given thresholds j1 > j2 and wish to return a pair from A × B that has Jaccard similarity higher than j2 if there exists a pair in A × B with Jaccard similarity at least j1. The classic locality sensitive hashing (LSH) algorithm of Indyk and Motwani (STOC’98), instantiated with the MinHash LSH function of Broder et al., solves this problem in Õ(n2−δ) time if j1 ≥ j21−δ. In particular, for δ = Ω(1), the approximation ratio j1/j2 = 1/j2δ increases polynomially in 1/j2. In this paper we give a corresponding hardness result. Assuming the Orthogonal Vectors Conjecture (OVC), we show that there cannot be a general solution that solves the Bichromatic Closest Pair problem in O(n2−Ω(1)) time for j1/j2 = 1/j2o(1). Specifically, assuming OVC, we prove that for any δ > 0 there exists an ε > 0 such that Bichromatic Closest Pair with Jaccard similarity requires time Ω(n2−δ) for any choice of thresholds j2 < j1 < 1 − δ, that satisfy j1 ≤ j21−ε

KW - Bichromatic closest pair

KW - Fine-grained complexity

KW - Jaccard similarity

KW - Set similarity search

U2 - 10.4230/LIPIcs.ESA.2019.74

DO - 10.4230/LIPIcs.ESA.2019.74

M3 - Article in proceedings

AN - SCOPUS:85074870798

T3 - Leibniz International Proceedings in Informatics, LIPIcs

BT - 27th Annual European Symposium on Algorithms, ESA 2019

A2 - Bender, Michael A.

A2 - Svensson, Ola

A2 - Herman, Grzegorz

PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing

T2 - 27th Annual European Symposium on Algorithms, ESA 2019

Y2 - 9 September 2019 through 11 September 2019

ER -

ID: 238368920