domingo, 5 de diciembre de 2010

Statistical ranking and combinatorial Hodge theory

Statistical ranking and combinatorial Hodge theory: "

Abstract
We propose a technique that we call HodgeRank for ranking data that may be incomplete and imbalanced, characteristics common in modern datasets coming from e-commerce
and internet applications. We are primarily interested in cardinal data based on scores or ratings though our methods also
give specific insights on ordinal data. From raw ranking data, we construct pairwise rankings, represented as edge flows on
an appropriate graph. Our statistical ranking method exploits the graph Helmholtzian, which is the graph theoretic analogue
of the Helmholtz operator or vector Laplacian, in much the same way the graph Laplacian is an analogue of the Laplace operator
or scalar Laplacian. We shall study the graph Helmholtzian using combinatorial Hodge theory, which provides a way to unravel
ranking information from edge flows. In particular, we show that every edge flow representing pairwise ranking can be resolved
into two orthogonal components, a gradient flow that represents the l
2-optimal global ranking and a divergence-free flow (cyclic) that measures the validity of the global ranking obtained—if this
is large, then it indicates that the data does not have a good global ranking. This divergence-free flow can be further decomposed
orthogonally into a curl flow (locally cyclic) and a harmonic flow (locally acyclic but globally cyclic); these provides information
on whether inconsistency in the ranking data arises locally or globally. When applied to statistical ranking problems, Hodge
decomposition sheds light on whether a given dataset may be globally ranked in a meaningful way or if the data is inherently
inconsistent and thus could not have any reasonable global ranking; in the latter case it provides information on the nature
of the inconsistencies. An obvious advantage over the NP-hardness of Kemeny optimization is that HodgeRank may be easily computed
via a linear least squares regression. We also discuss connections with well-known ordinal ranking techniques such as Kemeny
optimization and Borda count from social choice theory.


  • Content Type Journal Article
  • DOI 10.1007/s10107-010-0419-x
  • Authors

    • Xiaoye Jiang, Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA 94305, USA
    • Lek-Heng Lim, Department of Mathematics, University of California, Berkeley, CA 94720, USA
    • Yuan Yao, School of Mathematical Sciences, LMAM and Key Lab of Machine Perception (MOE), Peking University, 100871 Beijing, People’s Republic of China
    • Yinyu Ye, Department of Management Science and Engineering, Stanford University, Stanford, CA 94305, USA


"

No hay comentarios:

Publicar un comentario

Nota: solo los miembros de este blog pueden publicar comentarios.