/ coding

036 #frictionlessdata

First posted at https://frictionlessdata.io/articles/oleg-lavrovsky/

We are digital natives, dazzled by the boundless information and cultural resources of electronic networks, tuned in to a life on - and offline, dimly aware of all kinds of borders being rewritten. I was born in the Soviet Union and grew up in Canada, immersed in the wonders of creative code on Apple II and DOS-era personal computers, doing fun things in programming environments from BASIC to C++/C#/.NET (hey @ooswald!) to Perl (hey @virtualsue!) to Java (hey @timcolson!) to JavaScript (hey @jermolene!) to Python (hey @gasman!), all of which find some use in the freelance work I now do based in my adoptive home of Switzerland - a country of plurality.

Over the years, I have tried other languages like Clojure and Pascal, Groovy and Go, Erlang and Haskell, Scala and R, even ARM C/C++ and x86 assembly. Some have stuck in my dev chain, others have not. As far as possible, I hope to keep a beginner’s mind open to new paradigms, a solid craft of working on code and data with care, and the wisdom to avoid jumping off every tempting new thing on the horizon.

I first came across tendrils of Open Knowledge ten years ago while living in Oxford, a vibrant community of thinkers and civic reformers. After we started a hackspace, I got more involved in extracurricular open source activities, joined barcamps and hackathons, started contributing to projects. I started to see so-called 'big IT' or 'enterprise software' challenges to be, on many levels, problems of incompatible or intractable data standards. It was in the U.K. that I also discovered civic tech and open data activism.

Helping to start a Swiss Open Knowledge chapter presented me with the opportunity to be involved in an ambitious and exciting techno-political movement, and to learn from some of the most deeply ethical and forward-thinking people in Information Technology. Supporting activities like the School of Data working group and various community projects in the Swiss Opendata.ch association has been an important anchor for me, a commitment to my adoptive country and the people around me. Being involved in Open Knowledge Labs, Global Open Data Index, Frictionless Data, helping to represent Switzerland in the international network is today no longer just a weekend activity: it is my master branch.

I first heard the term frictionless from a philosopher who warned of a world where IT removes friction to the point where we live anywhere, and do anything, at the cost of social alienation - and, along with it, grave consequences to our well-being. There are parallels here to "closed datasets", which may well be padlocked for a reason. Throwing them into the wind may deprive them of the nurturing care of the original owners. The open data community offers them a softer landing.

Some of the formative conversations took place at OKCon 2013 in Geneva, where I was busy mining the Law. Max Ogden mentioned some very interesting ideas on a distributed approach to open data in his talk there on the Dat Project. It later became a regular topic in community hangouts and elsewhere. I liked the idea in principle, but found it difficult to foresee what the standardization process could accomplish. With experience in putting the Open Definition to use, having taken the time to experience some of the fundamental issues myself - I came around to wholly accepting the idea of an open data ecosystem and how it will play out in combination on the basis of open networks and protocols.

Working with more unwieldy data as well as having an interest in Data Science, and the great vibe of a growing community all led me to test the waters with the Julia language, an exciting new open source computational environment for Data Science - or just everyday coding. I quickly became a fan, and started looking for ways to include it in my workflow. Thanks to the collaboration enabled by the Frictionless Data Tool Fund, I will now be able to focus on this goal and start connecting the dots more quickly. More bridges need to be built to help open data users use Julia's computing environment, and Julia users could use sturdier access to open data.

There are two high level use cases which I think are particularly interesting when it comes to Frictionless Data: strongly typed and easy to validate dataset schema leading to a "light" version of semantic interoperability, helping data analysts, developers, even automated agents, to see at a glance how compatible datasets might be. Take a look at dataship, open power system data and other case studies at Frictionlessdata.io for examples. The other is the pipelines approach which, as a feature of Unix and other OS is the basis for an incredibly powerful system building tool, now laying the foundation of a rich and reliable world of shared data.

At a more practical level, I have been using Data Packages to publish data for hackathons, School of Data workshops and other activities in my Open Knowledge chapter, and regularly explaining the concepts and training people to use Frictionless Data tools in the Open Data module I teach at the Bern University of Applied Sciences. I have built support for them into Dribdat, a tool we use for connecting the dots between people, code and data.

Over the years, I have made small contributions to OKI’s codebases on projects like CKAN. Contributing to the Frictionless Data project clears the way to the frontlines of development: putting better tools in users’ hands, committing directly to the needs of the community, setting an elevated expectation of responsibility and quality. That said, I am a novice in Julia. But my initial ambition is modest: make a working set of tools, produce a stable v1.0 specification release. Run tests, get reviewed, interact with the community, and iterate. This project will be a learning process, and my intention is to widen the goalposts as much as I can for others to follow.

The Julia language also needs to be better known, so I will start threads on the OKI forums, at the School of Data, in technical and academic circles. I am likewise really looking forward to representing Frictionless Data in the diverse and wide-ranging Julia community, sharing whatever questions and needs arise both ways. The specifications, libraries and tools will help to preserve key information on widely used datasets, foster a more in-depth technical discussion between everyone involved in data sharing, and open the door to more critical feedback loops between creators, publishers and users of open data.

I will be developing the datapackage-jl and tableschema-jl libraries on GitHub, and share stories about putting Frictionless Data libraries to use on my blog. Please feel free to write me a note, send in your use case, respond to anything I'm working on or writing about, share a tricky dataset or any other kind of challenge - and let's chat!