Open Data @ BFH CAS DA 18.1.18

Study notes and slides for the Open Data lecture by Oleg Lavrovsky at the Berne University of Applied Sciences CAS in Data Analysis. The course of study is designed for professionals interested in data projects, building experience in the analysis of data using desktop tools.

The intent of this lecture is to present a practitioner perspective as well as some introductory background on open data, the open data movement, and several real-world projects - with details of the data involved, legal conditions and technical challenges.

„Eine Überfülle an Information ist kein nebensächliches Problem. Große Mengen an Rohdaten bilden eine politische Tatsache. Die wachsenden Datenmengen führen zu einer Zentralisierung der Kontrolle. In der Kommunikation verringert sich dagegen die Informationsmenge durch die Interaktion der Menschen und ihre Interpretationen..“

-- Richard Sennett, Wikipedia - Informationsüberflutung

1. Attention

... is in short supply in our information overloaded society.

Data is effectively put to use when there is the possibility of change in the information. Cycles of transforming data to useful information, lead us to knowledge.

Question: how accessible and trustworthy are the filters to our knowledge?

The open data movement, is concerned with sustainable and more universal access to data, leading to growth in each of these domains. We may even purport that the value of data analysis grows in correlation with the number of degrees of openness (i.e. openness to fellow experts, to colleagues, the wider organization, fellow citizens, entrusted algorithms) that are enabled by the transformation of data to knowledge.

Put another way: we are interested in this virtuous cycle common to information systems:

Attention -> [ Data -> Information ] -> Knowledge -> Attention

The cycle above is dramatically boosted when data can flow directly to the end-user, through machine and human usable ways, creating feedback loops of information and knowledge. It still requires people to discover and pay attention to your message, then creates new opportunities for shared knowledge with constituents, customers, etc.

Attention -> Open Data -> New Information -> Shared Knowledge

While a similar problem is being addressed in more technical ways in various domains of information security such as computational trust, most of the open data movement is focused on the rewiring of interpersonal and organisational borders through data sharing. A leading light in this area is the Open Knowledge network, represented by the association Opendata.ch in Switzerland.

"Where there is perfect certainty, there is no information: There is nothing to be said."

-- Jimmy Soni & Rob Goodman on Claude Schannon

"At the core of Bayesian statistics is the idea that prior beliefs should be updated as new data is acquired."

-- Above image and quote from Seeing Theory, Daniel Kunin & al., Brown University

2. Definitions

The concept of Open Data can be defined for our purposes as follows:

“A piece of content or data is open if anyone is free to use,
reuse, and redistribute it - subject only, at most, to the
requirement to attribute and share-alike."

-- The Open Definition

Open Data is legally supported by licenses and guidelines such as ...

-- Five stars of Linked Open Data, via Cafepress

Linked Open Data sounds complicated, but is one of the key mechanism enabling relevant, detailed information searches - such as what we have gotten used to seeing daily in Google results. See Moz blog for a colorful explanation.

Publishing data as 5-star Linked RDF is not especially hard, it just requires some awareness of the idea of the semantic web and ontologies, as we briefly covered in class. At a basic level it can be used in any web page with markup such as microformats and schema.org, which can be mapped to RDF. A good introduction in German can be found in Linked (Open) Data - Von der Theorie zur Praxis (HTW Chur)

The discussion of search engines in class raised the question of whether Open Data from web crawlers is available. We talked a bit about the Common Crawl project, related to Internet Archive and itself based on open source technologies. Check out some of the great example uses of such data.

3. In Switzerland

Data portals build upon the experience of numerous community projects and prior efforts to organize information online relevant to a diverse user base. The main function is to make important metadata - such as time of update, terms of use, ownership, typology, schema - available in one place, searchable and cross-referenceable.

--Screenshot of opendata.swiss

They also host an important dialogue, serving to illustrate the challenges of publishing and reusing complex data, such as geographic data ("geodata").

--Screenshot of map.geo.admin.ch

Starting with government departments who were early adopters of online publishing, like BFS and Swisstopo, the central Swiss Open Government Data portal, opendata.swiss, harvests datasets from numerous public organizations into one place and supports efforts in data publication.

--Screenshot of opendata.swiss

Portals help users to understand and adopt the terms of use, both to be able to negotiate the various limitations and responsibilities placed on data reuse, as well as to consider the possible conditions under which future datasets are accessible.

--Screenshot of opendata.swiss

Note that these Terms (Nutzungsbedingungen), while similar in form to the Creative Commons levels, are not the same as licenses. These are often applied to open data internationally, providing firmer legal grounding for further use and support. See Open Licenses Service for examples.

Data authorship, protection, and general rights of data producers and users are in Switzerland currently undergoing intense development, and are targets of legal scrutiny and debate. Stay tuned!

We spent time in class going through a bunch of data sets from portals, loading them in web and desktop software, talking about the implications of the licensing constraints and file formats.

Screenshot of CSV importer in Libreoffice Calc

Screenshot of various open datasets we loaded into QGIS layers

4. Community building

Das Thema Open Data bewegt eine grosse Vielfalt von Akteuren in Behörden, Medien, Firmen und der wachsenden Schweizer Community einzelner Entwickler, Designer und Aktivisten. Die Dynamik ist da, der politische Wille entsteht, den Austausch findet statt.

-- make.opendata.ch

From the activity described above, partly an outcome of large international movements affecting all fields of business, academic and the civic sphere, partly the hard work of local changemakers. The result is fertile ground for an 'ecosystem' of open data providers and builders.

Here are some examples of Swiss open data community projects:

Open Budgets

Food Data Packages

Transport Open Data

... and there are many more to discover and contribute to. In class, we looked at several of these open data showcases, and talked about how social impact and business value continues to be generated in this way.

Such communities endeavour to make data - already open data but in principle any data - even more usable and accessible to a wider public. One important vehicle is the Hackathon, a public event where data owners and users meet to work on brainstorming and prototyping possible new uses for data.

Video: What is the value of open data? - Interview with Oleg Lavrovsky in English by infoclio.ch

At such hackdays or hackathons we focus on the "Data" and the "Use" in the equation above, trying to solve the chicken-and-egg problem of having no reasons to make data available which nobody knows anything about. Visit hack.opendata.ch to learn about past and upcoming events.

In each case, understanding and using such projects - as well as creating new ones - requires special competencies, a fundamental one is the ability to think critically with abstract, factual knowledge. Data literacy means being an active user of data, being aware of possible "bugs" in the facts and opinions of others - ultimately the ability to base one's own decisions on verifiable evidence.

There are several projects in Switzerland to improve educational material and create shared resources for data literacy. The OGD Handbook at handbook.opendata.swiss provides guidance for government and people who work with the public sector. A working group of Opendata.ch, schoolofdata.ch is part of a civic society initiative involved in research programs with a grassroots international organization.

-- From R survey responses, School of Data on GitHub

5. Hands-on

In this part of the introductory lecture, we collect ideas and discuss how to get to the data in a number of interesting scenarios. Like the law, open data is personal, and we will learn about some of the boundaries between private and public data, the mechanisms with which it is published, and the forms in which it leads to effective collaboration.

Open Data is also about more open ...

Data Packages are complementary to open data portals, in that they foster exchange of metadata within a wider community, encourage simple standards of universal access, and provide a mechanism for data validation, stricter attribution and better referencing of terms of use.

6. Next

In Moodle, I have suggested an exercise to create a Data Package from any one of the datasets we pooled together in class. Along with the brief instructions, I shared links to a tool, template and example.

We will continue the module next week with a run down of mechanisms for using and publishing open data in the R environment and other analytical environments. For a sneak preview, try using the opendata.swiss CKAN API with this R script.


© Oleg Lavrovsky, January 2018

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.