Postdoc position in Algorithms Research (closes February 14th 2018)

We are seeking to recruit a postdoctoral research fellow to work in the area of designing algorithms for analysing large data sets.

You will be expected to perform high quality research under the supervision of Professor Graham Cormode, as part of the ERC funded project ‘Small Summaries for Big Data’. This can encompass streaming algorithms, sketching and dimensionality reduction, distributed monitoring and mergeable summaries, verification of outsourced computation, or other related topics. The expectation is that you will produce breakthrough research results in the summarisation of large volumes of data, and publish these results in top rated venues.

You will possess a PhD or an equivalent qualification in Computer Science or a very closely-related discipline (or you will shortly be obtaining it). You should have a strong background in one or more of the following areas: randomized and approximation algorithms; communication complexity and lower bounds; streaming or sublinear algorithms.

The post is based in the Department of Computer Science at University of Warwick, but collaborations with closely related research organisations such as the Centre for Discrete Mathematics and its Applications (DIMAP), the Warwick Institute for the Science of Cities (WISC); and the newly formed Alan Turing Institute (ATI) will be strongly encouraged. You will join a team of researchers led by Professor Cormode including other postdoctoral researchers and PhD students.

Candidates should provide with their application form a CV, a list of publications and a research statement.

Closing date: 14th February 2018

More information: https://atsv7.wcn.co.uk/search_engine/jobs.cgi?owner=5062452&ownertype=fair&jcode=1710356&vt_template=1457&adminview=1

Advertisements

Conference and Journal publications

A number of conference and journal papers have been accepted for publication over the winter break.  These include:

G. Cormode and J. Dark. Fast sketch-based recovery of correlation outliers. In International Conference on Database Theory, 2018.

Many data sources can be interpreted as time-series, and a key problem is to identify which pairs out of a large collection of signals are highly correlated. We expect that there will be few, large, interesting correlations, while most signal pairs do not have any strong correlation. We abstract this as the problem of identifying the highly correlated pairs in a collection of n mostly pairwise uncorrelated random variables, where observations of the variables arrives as a stream. Dimensionality reduction can remove dependence on the number of observations, but further techniques are required to tame the quadratic (in n) cost of a search through all possible pairs. We develop a new algorithm for rapidly finding large correlations based on sketch techniques with an added twist: we quickly generate sketches of random combinations of signals, and use these in concert with ideas from coding theory to decode the identity of correlated pairs. We prove correctness and compare performance and effectiveness with the best LSH (locality sensitive hashing) based approach.

G. Cormode and C. Hickey. Cheap checking for cloud computing: Statistical analysis via annotated data streams. In AISTATS, 2018.

As the popularity of outsourced computation increases, questions of accuracy and trust between the client and the cloud computing services become ever more relevant. Our work aims to provide fast and practical methods to verify analysis of large data sets, where the client’s computation and memory and costs are kept to a minimum. Our verification protocols are based on defining “proofs” which are easy to create and check. These add only a small overhead to reporting the result of the computation itself. We build up a series of protocols for elementary statistical methods, to create more complex protocols for Ordinary Least Squares, Principal Component Analysis and Linear Discriminant Analysis. We show that these are very efficient in practice.

G. Cormode, T. Kulkarni, and D. Srivastava. Constrained differential privacy for count data. In International Conference on Data Engineering (ICDE), 2018.

Concern about how to aggregate sensitive user data without compromising individual privacy is a major barrier to greater availability of data. The model of differential privacy has emerged as an accepted model to release sensitive information while giving a statistical guarantee for privacy. Many different algorithms are possible to address different target functions. We focus on the core problem of count queries, and seek to design mechanisms to release data associated with a group of n individuals. Prior work has focused on designing mechanisms by raw optimization of a loss function, without regard to the consequences on the results. This can leads to mechanisms with undesirable properties, such as never reporting some outputs (gaps), and overreporting others (spikes). We tame these pathological behaviors by introducing a set of desirable properties that mechanisms can obey. Any combination of these can be satisfied by solving a linear program (LP) which minimizes a cost function, with constraints enforcing the properties. We focus on a particular cost function, and provide explicit constructions that are optimal for certain combinations of properties, and show a closed form for their cost. In the end, there are only a handful of distinct optimal mechanisms to choose between: one is the well-known (truncated) geometric mechanism; the second a novel mechanism that we introduce here, and the remainder are found as the solution to particular LPs. These all avoid the bad behaviors we identify. We demonstrate in a set of experiments on real and synthetic data which is preferable in practice, for different combinations of data distributions, constraints, and privacy parameters.

G. Cormode, A. Dasgupta, A. Goyal, and C. H. Lee. An evaluation of multi-probe locality sensitive hashing for computing similarities over web-scale query logs. PLOS ONE, 2018.

Many modern applications of AI such as web search, mobile browsing, image processing, and natural language processing rely on finding similar items from a large database of complex objects. Due to the very large scale of data involved (e.g., users’ queries from commercial search engines), computing such near or nearest neighbors is a non-trivial task, as the computational cost grows significantly with the number of items. To address this challenge, we adopt Locality Sensitive Hashing (a.k.a, LSH) methods and evaluate four variants in a distributed computing environment (specifically, Hadoop). We identify several optimizations which improve performance, suitable for deployment in very large scale settings. The experimental results demonstrate our variants of LSH achieve the robust performance with better recall compared with “vanilla” LSH, even when using the same amount of space.

The three conference presentations will take place over the coming months.

Rajesh Chitnis joins as postdoc

Rajesh Chitnis has joined the ERC project “Small Summaries for Big Data” as a postdoctoral research fellow, under the supervision of Professor Graham Cormode. Dr Chitnis has made foundational contributions to the area of data summarization and fixed parameter tractability, most notably his work on the complexity of graph problems in various models.  Prior to joining Warwick, Rajesh completed his PhD at the University of Maryland College Park, and spent time as a researcher at the Weizman Institute.  He will work on problems relating to the intersection of streaming, distributed computation, and kernelization.

Streaming algorithms for matching size estimation in sparse graphs at ESA 2017

The paper “Streaming algorithms for matching size estimation in sparse graphs” by G. Cormode, H. Jowhari, M. Monemizadeh, and S. Muthukrishnan has been selected for publication in ESA 2017, in Vienna in September.

The abstract is as follows:

Estimating the size of the maximum matching is a canonical problem in graph analysis, and one that has attracted extensive study over a range of different computational models. We present improved streaming algorithms for approximating the size of maximum matching with sparse (bounded arboricity) graphs.

(Insert-Only Streams) We present a one-pass algorithm that takes O(clogn) space and approximates the size of the maximum matching in graphs with arboricity c within a factor of O(c). This improves significantly upon the state-of-the-art O~(c n2/3)-space streaming algorithms, and is the first poly-logarithmic space algorithm for this problem.

(Dynamic Streams) Given a dynamic graph stream (i.e., inserts and deletes) of edges of an underlying c-bounded arboricity graph, we present an one-pass algorithm that uses space O~(c10/3n2/3) and returns an O(c)-estimator for the size of the maximum matching on the condition that the number edge deletions in the stream is bounded by O(c n). For this class of inputs, our algorithm improves the state-of-the-art O~(c n4/5)-space algorithms, where the O~(.) notation hides logarithmic in n dependencies.

In contrast to prior work, our results take more advantage of the streaming access to the input and characterize the matching size based on the ordering of the edges in the stream in addition to the degree distributions and structural properties of the sparse graphs.

Adams Prize 2017

Professor Graham Cormode has been awarded the 2017 Adams Prize by the Cambridge Faculty of Mathematics. The award recognizes his work on “Statistical Analysis of Big Data”, and is awarded jointly with Professor Richard Samworth of Cambridge. Professor Cormode says,

My work, in common with Prof Samworth’s, is about finding mathematical representations of data that allow useful information to be extracted effectively and accurately. These techniques allow ever larger quantities of data to be handled on ordinary computers.

Professor Cormode’s work on “data sketches” has been used in companies such as Netflix, Yahoo, Twitter, Google, AT&T and Sprint. He is currently leading Warwick’s involvement in the Alan Turing Institute at London, and working on questions to do with verification of machine learning, and privacy.

The prize is worth £15,000 and will be split equally between the two recipients.

 

Postdoc position in Algorithms Research (closes April 26th 2017)

We are seeking to recruit a postdoctoral research fellow to work in the area of designing algorithms for analysing large data sets.

You will be expected to perform high quality research under the supervision of Professor Graham Cormode, as part of the ERC funded project ‘Small Summaries for Big Data’. This can encompass streaming algorithms, sketching and dimensionality reduction, distributed monitoring and mergeable summaries, verification of outsourced computation, or other related topics. The expectation is that you will produce breakthrough research results in the summarisation of large volumes of data, and publish these results in top rated venues.

You will possess a PhD or an equivalent qualification in Computer Science or a very closely-related discipline (or you will shortly be obtaining it). You should have a strong background in one or more of the following areas: randomized and approximation algorithms; communication complexity and lower bounds; streaming or sublinear algorithms.

The post is based in the Department of Computer Science at University of Warwick, but collaborations with closely related research organisations such as the Centre for Discrete Mathematics and its Applications (DIMAP), the Warwick Institute for the Science of Cities (WISC); and the newly formed Alan Turing Institute (ATI) will be strongly encouraged. You will join a team of researchers led by Professor Cormode including other postdoctoral researchers and PhD students.

Candidates should provide with their application form a CV, a list of publications and a research statement.

Closing date: 26th April 2017 (this position is being readvertised)

More information:https://atsv7.wcn.co.uk/search_engine/jobs.cgi?owner=5062452&ownertype=fair&jcode=1640701&vt_template=1457&adminview=1

Postdoc position in algorithms (closing March 31st 2016)

We are seeking to recruit a postdoctoral research fellow to work in the area of designing algorithms for analysing large data sets.

You will be expected to perform high quality research under the supervision of Professor Graham Cormode, as part of the ERC funded project ‘Small Summaries for Big Data’. This can encompass streaming algorithms, sketching and dimensionality reduction, distributed monitoring and mergeable summaries, verification of outsourced computation, or other related topics. The expectation is that you will produce breakthrough research results in the summarisation of large volumes of data, and publish these results in top rated venues.

You will possess a PhD or an equivalent qualification in Computer Science or a very closely-related discipline (or you will shortly be obtaining it). You should have a strong background in one or more of the following areas: randomized and approximation algorithms; communication complexity and lower bounds; streaming or sublinear algorithms.

The post is based in the Department of Computer Science at University of Warwick, but collaborations with closely related research organisations such as the Centre for Discrete Mathematics and its Applications (DIMAP), the Warwick Institute for the Science of Cities (WISC); and the newly formed Alan Turing Institute (ATI) will be strongly encouraged. You will join a team of researchers led by Professor Cormode including other postdoctoral researchers and PhD students.

Candidates should provide with their application form a CV, a list of publications and a research statement.

Closing date: 31st March 2016

More information:http://www.jobs.ac.uk/job/AUC577/research-fellow-77780-026/