Statistical-Computational Tradeoffs in Graph Clustering

Mon., April 28, 2014

3:00 - 4:00 PM

521 Cory Hall

3:00 - 4:00 PM

521 Cory Hall

Abstract:

The problem of graph clustering and community detection concerns with identifying densely connected groups of nodes in a graph. In this talk we show that one can use computationally more expensive algorithms to achieve better statistical performance. In particular, we consider the classical planted clustering model (aka stochastic blockmodel), and show that the parameter space can be partitioned into four regimes -- "simple", "easy", "hard", and "impossible" -- which correspond to progressively noisier problems such that: (1) a near-linear time algorithm succeeds in the "simple" regime, but provably fails in the "easy" and "hard" regimes; (2) a polynomial time algorithm succeeds in the "easy" regime, but provably fails in the "hard" regime; (3) a super-polynomial time algorithm succeeds in the "hard" regime, for which no polynomial time algorithm is known; (4) all algorithms fail in the "impossible" regime. Our results apply to the setting with an unbounded number of clusters.

Bio:

Yudong Chen is currently a postdoc in EECS at UC Berkeley with Martin Wainwright. He obtained his Ph.D. in ECE from the University of Texas at Austin in 2013. His research interests include high-dimensional and robust statistics, convex optimization, and applications in community detection.

The problem of graph clustering and community detection concerns with identifying densely connected groups of nodes in a graph. In this talk we show that one can use computationally more expensive algorithms to achieve better statistical performance. In particular, we consider the classical planted clustering model (aka stochastic blockmodel), and show that the parameter space can be partitioned into four regimes -- "simple", "easy", "hard", and "impossible" -- which correspond to progressively noisier problems such that: (1) a near-linear time algorithm succeeds in the "simple" regime, but provably fails in the "easy" and "hard" regimes; (2) a polynomial time algorithm succeeds in the "easy" regime, but provably fails in the "hard" regime; (3) a super-polynomial time algorithm succeeds in the "hard" regime, for which no polynomial time algorithm is known; (4) all algorithms fail in the "impossible" regime. Our results apply to the setting with an unbounded number of clusters.

Bio:

Yudong Chen is currently a postdoc in EECS at UC Berkeley with Martin Wainwright. He obtained his Ph.D. in ECE from the University of Texas at Austin in 2013. His research interests include high-dimensional and robust statistics, convex optimization, and applications in community detection.