This is a one-full-day workshop (1 hp) on Tuesday April 23 2019 on Privacy-Aware Decisions. The workshop includes introductory tutorials, research talks/seminars and culminates in an open discussion led by domain experts in privacy, security and geospatial analytics.
Signing up is necessary at https://simpleeventsignup.com/event/146419 to attend the workshop.
Who is this workshop for? It is for PhD students at Uppsala University and for researchers and developers from various sectors of data industry who want to up/re-skill in privacy-aware and GDPR-compliant data science process, including machine learning. Students from other universities in Sweden may also take it for possibly transferable credits. Attendance and completion of assignments can lead to an industrially-endorsed certificate in Privacy-Aware Decisions from The Department of Mathematics, Uppsala University. See Course Details of the Workshop below for more information.
Support: This is partly supported by the Centre for Interdisciplinary Mathematics jointly with the Department of Mathematics and Combient Competence Centre for Data Engineering Sciences via Combient-MIX.
Schedule of the Workshop
- Time and Place:
- Time: 0830-1700 hours April 23, 2019.
- Place: Rooms 80121 and Häggsalen, Ångströmlaboratoriet, Lägerhyddsvägen 1, 752 37 Uppsala, Sweden. https://goo.gl/maps/kbCkddQitzE2.
- Morning Events: Located at 80121, Ångströmlaboratoriet, House 8, Floor 0.
- 1. 0830-1000 hours; Tutorial on Privacy-preserving Data Mining: An Introduction by Julián Salas Piñón of K-riptography and Information Security for Open Networks (KISON) Research Group at The Internet Interdisciplinary Institute (IN3), Open University of Catalunya, Barcelona, Spain.
- F. 1000-1030 hours; fika and discussions
- 2. 1030-1110 hours; GDPR-compliant Learning in Apache Spark by Christoffer Långström, Department of Mathematics, Uppsala University
- 3. 1110-1150 hours; Statistic Preserving Sanitizers for Markov Models of Co-trajectories by Joel Dahne, Department of Mathematics, Uppsala University
- talk slides (1.7 MB)
- Afternoon Events: Located at Häggsalen, 10132, Ångströmlaboratoriet, House 1, Floor 1.
- 4. 1200-1300 hours; CIM CosY Lunch Seminar (lunch sandwich with water/juice+tea/coffee):
- 5. 1310-1400 hours; Privacy-preserving federated machine learning by Andreas Hellander, Associate Professor, Department of Information Technology, Uppsala University and Head of Research Scaleout systems, and Salman Toor, Assistant Professor, Department of Information Technology, Uppsala University.
- talk slides (1.4 MB)
- 6. 1405-1455 hours; Space and place - the grammar of geography, with a focus on smart card data by John Östh and Marina Toger, Department of Social and Economic Geography, Uppsala University, and Maria Marinov, Industrial Telecom Researcher, Tel Aviv, Israel.
- F. 1500-1530 hours; fika and discussions
- 7. 1530-1555 hours; Freedom in the Digital World by Martin Monperrus, Division of Theoretical Computer Science, School of Electrical Engineering and Computer Science, KTH.
- 8. 1555-1700; Open Discussions with Invited Panel Members facilitated by Raazesh Sainudiin
- Invited Discussion Panelists include:
- Itzhak Bennenson, Faculty of Exact Sciences, Tel Aviv University.
- Sonja Buchegger, Division of Theoretical Computer Science, School of Electrical Engineering and Computer Science, KTH, Stockholm.
- Panos Papadimitratos, Division of Communications System, School of Electrical Engineering and Computer Science, KTH, Stockholm.
- Julián Salas Piñón of K-riptography and Information Security for Open Networks (KISON) Research Group at The Internet Interdisciplinary Institute (IN3), Open University of Catalunya, Barcelona, Spain.
- John Östh, Department of Social and Economic Geography, Uppsala University, Uppsala, SE.
- Andreas Hellander, Associate Professor, Department of Information Technology, Uppsala University and Head of Research Scaleout systems.
- Invited Discussion Panelists include:
Abstracts of Tutorials and Seminars
Tutorial on Privacy-preserving Data Mining: Data mining via structured querying and statistical machine learning or AI algorithms may have enormous benefits to science, business and society. At the same time, the large amount of data collected and analyzed may reveal a lot about our private lives, from our habits or personality traits to personal information that we may prefer to keep private. To motivate privacy enhancing techniques, we will present examples of attacks to privacy that were possible due to naïve anonymization of medical records, AOL internet search queries, Netflix movie ratings and NYC-taxis’ geo-location data. We will review some well known methods for privacy protection (such as k-anonymity or differential privacy) to prevent attribute and identity disclosure. Then, we will discuss the trade-offs between the risk of disclosure and the utility loss in different contexts such as databases, online social networks and geo-located data.
GDPR-compliant Learning in Apache Spark: The increasing concern on privacy issues in data handling has led the EU to legislate data holders to provide proper protection for individuals in data sets using a combination of best-practices and anonymization techniques. This talk will look at Apache Spark implementations of existing methods for scalable anonymisation techniques using pseudo-anonymisation and K-anonymity, detailing a GDPR compliant framework for protecting an individual’s data. Further, the current applications and uses for differential privacy techniques will be discussed, where privacy is protected by obfuscating each individual’s relevance to a data set, in particular in the context of ML and Deep Learning applications. In parallel, we will discuss the statistical consequences of differentially private mechanisms, quantifying the information lost and privacy protection gained from privatisation of the data and estimators.
Statistic Preserving Sanitizers for Markov Models of Co-trajectories: With today’s easy access to location aware devices, such as mobile phones, huge amounts of location based data is generated. Analysis of this data could contribute with a lot information for example in city planning or driving directions. However, location data contains a lot of private information that we may not want to reveal. The goal is to have tools for analysis that also allows us to keep sensitive parts of the data private. In relation to this, we introduce the concept of sanitizers and statistic-preserving sanitizers. We given an example when the analysis is done using a Markov chain model for the data and using a sanitizer called SwapMob.
Simple Complex Phenomenon of Urban Parking: Parking is a core phenomenon of the modern urban transportation system. For the travelers, parking search time and parking price are the major factors that define the choice of the transportation mode, and, in a longer run, the decision on car ownership. For the city transportation planners and managers parking prices and constraints are the easiest tool for affecting travelers’ behavior and, eventually, urban traffic. I investigate several analytical and simulation models of urban parking dynamics and propose easy and practical algorithms for estimating parking search time and parking prices as dependent on standard high-resolution urban GIS data and, if necessary, simple field studies. Based on the model study, I propose parking policies that improve the state of urban transportation and discuss the ability of society to implement these policies.
Privacy-preserving federated machine learning: Federated privacy-preserving machine learning allows parties that are not able to share data with each other for privacy reasons, for competitive reasons, or for practical reasons (such as the size of data), to form machine learning alliances with objective to train a joint global model without moving or disclosing any local private data. As such, federated learning algorithms hold great promise to become fundamental components of privacy-aware artificial intelligence. In this talk we will give an overview of approaches to federated learning, and discuss the research challenges involved in building a production-grade platform and runtime to support such algorithms.
Space and place - the grammar of geography: A talk about geography, core features in spatial analysis, geographical scale, opportunities and ethics. We will talk about definitions of neighbourhoods, the implications of spatial aggregation, and MAUP (Modifiable Areal Unit Problem). The interaction between distance, cost, and time decay - fuzzy definition of neighbourhood. Neighbourhood and scale affect measurements such as segregation/integration. Economic sorting of landscapes - crash intro to urban economics and geographical theories. Intro to time geography - space-time and activity space. Spatial mismatch hypothesis - Kains negative feedback loop.
Freedom in the Digital World: The digital world is a new world, with as many opportunities for freedom as for alienation. In this talk, I will go over those challenges, with many concrete examples. In particular, I will emphasize on the fundamental tension between the softness of software and the super rigid walls of the digital world. I’ll conclude with a conjecture: “there must be software design principles that encourage freedom in the digital world”.
Course Details of the Workshop
Candidate Thesis students or Masters students at Uppsala University must contact their study counsellor in order to see if it is possible to take the course as part of their candidate thesis preparation or as “Selected Topics Course” in their Department, respectively.
If you are working in industry and want to re-skill in privacy-aware data science processes then you can join the workshop (free of cost) and receive a certificate from the Department of Mathematics upon successful completion. You need to choose the appropriate option when signing up for the event.
If you are a PhD student at Uppsala University (or another Swedish University) and are interested in obtaining 1hp of PhD course credits (that may be transferable to students at other universities in Sweden) and/or an industrially-endorsed certificate of completion, then you need to choose the appropriate option when signining up for the event.
Physical presence in all the planned events of the day is mandatory for the course credits and/or certificate of completion.
What is expected in a 1hp Course at Uppsala University?
Learning concepts in depth and familiarizing oneself with new syntax requires completion of watching/reading assignments and completion of homework exercises. At Uppsala University, 1hp is about 25 hours of work and there is plenty of time, outside the 6-8 hours of face-to-face lab/lecture meetings, for deepening one’s understanding.
Workshop Coorganisers and Coordinator
This workshop is coorganised by:
- Marina Toger, Department of Social and Economic Geography, Uppsala University, and
- Tilo Wiklund, Department of Mathematics, Uppsala University,
and coordinated by Raazesh Sainudiin, Associate Professor of Mathematics with Specialisation in Data Science at the Department of Mathematics, Uppsala University in Uppsala and Consuting Principal Data Scientist at Combient AB in Stockholm (coordinator’s cv and contact), with the generous support of invited speakers and discussion panel members.
Dinner for Speakers and Discussion Panelists
We will be going for dinner at Aaltos, Italian Grill and Garden located at Sysslomansgatan 14, 75311 Uppsala starting 1830 hours. It is about 30 minutes by foot along the river or about 20 minutes by bus.