Tutorials on Local Differential Privacy

A tutorial on local differential privacy, “Privacy at scale: Local differential privacy in practice” prepared by G. Cormode, S. Jha, T. Kulkarni, N. Li, D. Srivastava, and T. Wang, will be presented at the ACM SIGMOD and SIGKDD conferences this summer.  SIGMOD takes place in Houston, TX in June, and KDD will be in London, UK in August.

Local differential privacy (LDP), where users randomly perturb their inputs to provide plausible deniability of their data without the need for a trusted party, has been adopted recently by several major technology organizations, including Google, Apple and Microsoft. This tutorial aims to introduce the key technical underpinnings of these deployed systems, to survey current research that addresses related problems within the LDP model, and to identify relevant open problems and research directions for the community.

Draft slides are available, and video recordings should be available after they are presented.

Sampling for Big Data tutorial at KDD

Nick Duffield (Texas A&M University) and Graham Cormode presented their tutorial on Sampling for Big Data at KDD 2014.  The abstract is as follows:

One response to the proliferation of large datasets has been to develop ingenious ways to throw resources at the problem, using massive fault tolerant storage architectures, parallel and graphical computation models such as MapReduce, Pregel and Giraph. However, not all environments can support this scale of resources, and not all queries need an exact response. This motivates the use of sampling to generate summary datasets that support rapid queries, and prolong the useful life of the data in storage. To be effective, sampling must mediate the tensions between resource constraints, data characteristics, and the required query accuracy. The state-of-the-art in sampling goes far beyond simple uniform selection of elements, to maximize the usefulness of the resulting sample. This tutorial reviews progress in sample design for large datasets, including streaming and graph-structured data. Applications are discussed to sampling network traffic and social networks.

Video and slides from the tutorial is now available.