/ coding

036 #frictionlessdata

First posted at https://frictionlessdata.io/articles/oleg-lavrovsky/

We are digital natives, dazzled by the boundless information and cultural resources of electronic networks, tuned in to a life on - and offline, dimly aware of all kinds of borders being rewritten. I was born in the Soviet Union and grew up in Canada, immersed in the wonders of creative code on Apple II and DOS-era personal computers, doing fun things in programming environments from BASIC to C++/C#/.NET (hey @ooswald!) to Perl (hey @virtualsue!) to Java (hey @timcolson!) to JavaScript (hey @jermolene!) to Python (hey @gasman!), all of which find some use in the freelance work I now do based in my adoptive home of Switzerland - a country of plurality.

Over the years, I have tried other languages like Clojure and Pascal, Groovy and Go, Erlang and Haskell, Scala and R, even ARM C/C++ and x86 assembly. Some have stuck in my dev chain, others have not. As far as possible, I hope to keep a beginner’s mind open to new paradigms, a solid craft of working on code and data with care, and the wisdom to avoid jumping off every tempting new thing on the horizon.

I first came across tendrils of Open Knowledge ten years ago while living in Oxford, a vibrant community of thinkers and civic reformers. After we started a hackspace, I got more involved in extracurricular open source activities, joined barcamps and hackathons, started contributing to projects. I started to see so-called 'big IT' or 'enterprise software' challenges to be, on many levels, problems of incompatible or intractable data standards. It was in the U.K. that I also discovered civic tech and open data activism.

Helping to start a Swiss Open Knowledge chapter presented me with the opportunity to be involved in an ambitious and exciting techno-political movement, and to learn from some of the most deeply ethical and forward-thinking people in Information Technology. Running the School of Data working group and supporting many projects in the Swiss Opendata.ch association and international network is today no longer just a weekend activity: it is my master branch.

I first heard the term frictionless from a philosopher who warned of a world where IT removes friction to the point where we live anywhere, and do anything, at the cost of social alienation - and, along with it, grave consequences to our well-being. There are parallels here to "closed datasets", which may well be padlocked for a reason. Throwing them into the wind may deprive them of the nurturing care of the original owners. The open data community offers them a softer landing.

Some of the conversations that led to Frictionless Data took place at OKCon 2013 in Geneva, where I was busy mining the Law. Max Ogden mentioned related ideas in his talk there on Dat Project. It later became a regular topic in the Open Knowledge Labs hangouts and elsewhere. My first impression was mixed: I liked the idea in principle, but found it hard to foresee what the standardization process could accomplish. It took me a couple of years to catch up, gain experience in putting the Open Definition to use, struggle with some of the fundamental issues myself - just to wholly accept the idea of an open data ecosystem.

Working with more unwieldy data as well as having an interest in Data Science, and the great vibe of a growing community all led me to test the waters with the Julia language. I quickly became a fan, and started looking for ways to include it in my workflow. Thanks to the collaboration enabled by the Frictionless Data Tool Fund, I will now be able to focus on this goal and start connecting the dots more quickly. More bridges need to be built to help open data users use Julia's computing environment, and Julia users could use sturdier access to open data.

There are two high level use cases which I think are particularly interesting when it comes to Frictionless Data: strongly typed and easy to validate dataset schema leading to a "light" version of semantic interoperability, helping data analysts, developers, even automated agents, to see at a glance how compatible datasets might be. Take a look at dataship, open power system data and other case studies at Frictionlessdata.io for examples. The other is the pipelines approach which, as a feature of Unix and other OS is the basis for an incredibly powerful system building tool, now laying the foundation of a rich and reliable world of shared data.

At a more practical level, I have been using Data Packages to publish data for hackathons, School of Data workshops and other activities in my Open Knowledge chapter, and regularly explaining the concepts and training people to use Frictionless Data tools in the Open Data module I teach at the Bern University of Applied Sciences. I have built support for them into Dribdat, a tool we use for connecting the dots between people, code and data.

Over the years, I have made small contributions to OKI’s codebases on projects like CKAN. Contributing to the Frictionless Data project clears the way to the frontlines of development: putting better tools in users’ hands, committing directly to the needs of the community, setting an elevated expectation of responsibility and quality. That said, I am a novice in Julia. But my initial ambition is modest: make a working set of tools, produce a stable v1.0 specification release. Run tests, get reviewed, interact with the community, and iterate. This project will be a learning process, and my intention is to widen the goalposts as much as I can for others to follow.

The Julia language also needs to be better known, so I will start threads on the OKI forums, at the School of Data, in technical and academic circles. I am likewise really looking forward to representing Frictionless Data in the diverse and wide-ranging Julia community, sharing whatever questions and needs arise both ways. The specifications, libraries and tools will help to preserve key information on widely used datasets, foster a more in-depth technical discussion between everyone involved in data sharing, and open the door to more critical feedback loops between creators, publishers and users of open data.

I will be developing the datapackage-jl and tableschema-jl libraries on GitHub, and you can follow my dev log to see how this develops and read stories about putting Frictionless Data libraries to use. Please feel free to write me a note, send in your use case, respond to anything I'm working on or writing about, share a tricky dataset or any other kind of challenge - and let's chat!