Lead CI: Peter Bartlett
Many large scale decision problems that arise in a variety of areas (including biological science, social network analysis, retail recommender systems, web information retrieval, physical sciences, law-enforcement, and telecommunications) involve data that can be viewed as arranged in a graph, with edges representing information about relationships. These data present several key challenges to big data analysis, because they are often heterogeneous, sparsely populated, and of variable quality. This project aims to develop design and analysis techniques for large scale decision methods, with an emphasis on data in the form of large graphs. Key issues include combining information from many sources, despite their heterogeneity and variable quality, and adapting the complexity of predictions to local properties of data (for instance, in a metric defined by a graph). The research project will involve probabilistic and game-theoretic formulations of decision problems (such as regression, classification and ranking problems), the design of methods for graphical data based on optimization of criteria involving discrete smoothness functionals, the development of algorithms appropriate for large-scale and potentially distributed problems (through convex optimization and linear algebraic methods), and analysis of the performance of these algorithms (particularly the development of optimal estimation rates for smoothness classes appropriate to the graphical setting and understanding the computational and statistical trade-offs in large scale cases). We anticipate that issues of model selection will emerge as a common thread in the key challenges of integrating information from multiple sources and adapting to local conditions.