New Postdoctoral Researchers

Two new researchers have joined the ERC project “Small Summaries for Big Data” as a postdoctoral research fellow, under the supervision of Professor Graham Cormode.

pvPavel Vesely works on problems related to scheduling and bin packing, and will work on these questions in the context of streaming data.  Prior to coming to Warwick, he was a PhD student at Computer Science Institute of Charles University in Prague, where his adviser was professor Jiří Sgall.

 

Michael_Shekelyan

 

Michael Shekelyan works on data summarization, including histograms, sampling, and other randomized summaries.  Before coming to Warwick, Michael completed his PhD at the Faculty of Computer Science, Free University of Bozen-Bolzano, Italy.

Workshop on Algorithms for data summarization, March 2018

There is a workshop on algorithms for data summarization (streaming, sampling, sketching, property testing, sublinear algorithms and loosely related topics beyond) to be held at the University of Warwick, UK during March 2018 (19th – 22nd), organized by Graham Cormode and Artur Czumaj. The format of the workshop will be talks from experts, with plenty of opportunities for discussion and collaboration. Funding supporting the workshop is from European Research Council and UK EPSRC.

For more information, please contact the organizers.

Keynote in Symposium on Experimental Algorithms

Engineering streaming algorithms, June 2017.
Invited talk at Symposium on Experimental Algorithms.

Streaming algorithms must process a large quantity of small updates quickly to allow queries about the input to be answered from a small summary. Initial work on streaming algorithms laid out theoretical results, and subsequent efforts have involved engineering these for practical use. Informed by experiments, streaming algorithms have been widely implemented and used in practice. This talk will survey this line of work, and identify some lessons learned.

PhD positions in Data Summarization available

A funded studentship is available in the area of computer science to work on algorithms and data structures for summarization of massive data sets. Funding is provided through the prestigious ERC program, under project 647557, “Small Summaries for Big Data”.

Project Overview:
A fundamental challenge in processing the massive quantities of information generated by modern applications is in extracting suitable representations of the data that can be stored, manipulated and interrogated on a single machine. A promising approach is in the design and analysis of compact summaries: data structures which capture key features of the data, and which can be created effectively over distributed data sets. Popular summary structures include the Bloom filter, which compactly represents a set of items, and sketches which allow vector norms and products to be estimated.
Such structures are very attractive, since they can be computed in parallel and combined to yield a single, compact summary of the data. Yet the full potential of summaries is far from being fully realized. Professor Cormode is recruiting a team to work on important problems around creating Small Summaries for Big Data. The goal is to substantially advance the state of the art in data summarization, to the point where accurate and effective summaries are available for a wide array of problems, and can be used seamlessly in applications that process big data. PhD studentships can work on a variety of topics related to the project, including:
• The design and evaluation of new summaries for fundamental computations such as large matrix computations
• Summary techniques for complex structures such as massive graphs
• Summaries that allow the verification of outsourced computation over big data.
• Application of summaries in the context of monitoring distributed, evolving streams of data
The expectation is that this will lead to novel results in the summarization of large volumes of data, which will be published in top-rated venues.
You will possess a degree in Computer Science, mathematics or very closely related discipline (or you will shortly be obtaining it). You should have good knowledge of one or more of the following areas: algorithm design and analysis; randomized and approximation algorithms; communication complexity and lower bounds; streaming or sublinear algorithms. The post is based in the Department of Computer Science at the University of Warwick, but collaborations with closely related research organizations such as the centre for Discrete Mathematics and its Applications (DIMAP), the Warwick Institute for the Science of Cities (WISC); and the newly formed Alan Turing Institute (ATI) will be strongly encouraged.
For examples of relevant research and related topics, please consult Prof. Cormode’s web pages at http://www2.warwick.ac.uk/fac/sci/dcs/people/Graham_CormodeEligibility:
Candidates should hold a degree in Computer Science, Mathematics or closely related discipline, or expect to complete one before the commencement of the studentship. The degree should show a high level of achievement (1st or 2.1 level).
Funding level:
Funding is available to support stipend and fees at the UK/EU level for 4 years (this does not cover fees for non-EU students, see http://www2.warwick.ac.uk/study/postgraduate/funding/fees/ for more information).
Application details:
Please send a CV to giving details of your education and achievements to date, including details of performance in relevant university-level subjects (such as Algorithms, Data Structures, Complexity, Mathematical analysis of algorithms, linear algebra and so on). Please also include a covering note explaining how your background and interests make you relevant to the aims of the project.
Applications will be reviewed as they are received, with an initial deadline of November 30th 2015, and a final deadline of 31st March 2016.This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 647557).

Funding Notes

Funding is available to support stipend and fees at the UK/EU level for 4 years (this does not cover fees for non-EU students, see View Website for more information).

Sampling for Big Data tutorial at KDD

Nick Duffield (Texas A&M University) and Graham Cormode presented their tutorial on Sampling for Big Data at KDD 2014.  The abstract is as follows:

One response to the proliferation of large datasets has been to develop ingenious ways to throw resources at the problem, using massive fault tolerant storage architectures, parallel and graphical computation models such as MapReduce, Pregel and Giraph. However, not all environments can support this scale of resources, and not all queries need an exact response. This motivates the use of sampling to generate summary datasets that support rapid queries, and prolong the useful life of the data in storage. To be effective, sampling must mediate the tensions between resource constraints, data characteristics, and the required query accuracy. The state-of-the-art in sampling goes far beyond simple uniform selection of elements, to maximize the usefulness of the resulting sample. This tutorial reviews progress in sample design for large datasets, including streaming and graph-structured data. Applications are discussed to sampling network traffic and social networks.

Video and slides from the tutorial is now available.