This article is aiming at giving an introduction to CKAN. What kind of projects we use it for and which Plugins we use to implement the features we need for our customers.
CKAN's Main Goal and Key Features
CKAN is an open source management system whose main goal is to provide a managed data-catalog-system for Open Data. It is mainly used by public institutions and governments. At Liip we use CKAN to mainly help governments to provide their data-catalog and publish data in an accessible fashion to the public. Part of our work is supporting data owners to get their data published in the required data-format. We’re doing this by providing interfaces and useable standards to enhance the user experience on the portal to make it easier to access, read and process the data.
Out of the box CKAN can be used to publish and manage different types of datasets. They can be clustered by organizations and topics. Each dataset can contain resources which themself consist of Files of different formats or links to other Data-Sources. The metadata-standard can be configured to represent the standard you need but the Plugin already includes a simple and useful Meta-Data-Standard that already can get you started. The data is saved into a Postgres-Database by default and is indexed using SOLR.
CKAN ships with an API which can be used to browse through the metadata-catalog and create advanced queries on the metadata. With authorization the API can also be used to add, import and update data with straight-forward requests.
The standard also includes a range of Cli-Commands which can be used to process or execute different tasks. Those can be very useful, e.g. to manage, automate or schedule backend-jobs.
CKAN offers the functionality to configure a preview of a number of different file-types, such as tabular-data (e.g. CSV, XLS), Text-Data (e.g. TXT), Images or PDFs. That way interested citizens can get a quick overview into the data itself without having to download it first and having to use local Software to merely get an better idea on how the data looks.
While CKAN itself acts as a CMS but for data, it really shines when making use of its extensibility and configure and develop it to your business needs and requirements. There is already a wide-ranging list of plugins that have been developed for CKAN, which covers a broad range of additional features or make it easier to adjust CKAN to fit your use cases and look and feel. A collection of most of the plugins can be found on CKAN-Extensions and on Github.
At Liip we also help maintaining a couple of CKAN's plugins. The most important ones that we use in production for our customers are:
The ckanext-harvest-plugin offers the possibility to export and import data. First of all, it enables you to exchange data between Portals that both use CKAN.
Furthermore we use this plugin to harvest data in a regular manner from different data-sources. At opendata.swiss we use two different types of harvesters. Our DCAT-Harvester consumes XML-/RDF-endpoints in DCAT-AP Switzerland-Format which is enforced on the Swiss Portal.
The Geocat-Harvester consumes data from geocat.ch. As the data from geocat is in ISO-19139_che-Format (Swiss version of ISO-19139) the harvester converts the data to the DCAT-AP Switzerland format and imports it.
Another feature of this plugin we use, is our DCAT-AP endpoint, to allow other portals to harvest our data and also serves as an example to Organizations that want to build an export that can be harvested by us.
The plugin ckanext-datastore stores the actual tabular data (opposing to 'just' the meta-data) in a seperate database. With it, we are able to offer an easy to use API on top of the CKAN-Standard-API to query the data and process it further. It provides basic functionalities on the resource-detail-page to display the data in simple graphs.
The datastore is the most interesting one for Data-Analysts, who want to build apps based on the data, or analyze the data on a deeper level. This is an API-example of the Freibäder-dataset on the portal of Statistik Stadt Zürich.
We use ckanext-showcase to provide a platform for Data-Analysts by displaying what has been built, based on the data the portal is offering. There you can find a good overview on how the data can be viewed in meaningful ways as statistics or used as sources in narrated videos or even in apps for an easier everyday life. For example you can browse through the Showcases on the Portal of the City of Zurich.
The ckanext-xloader is a fairly new plugin which we were able to adopt for the City of Zurich Portal. It enables us to automatically and asynchronously load data into the datastore to have the data available after it has been harvested.
The CKAN-Core and also a number of its major plugins are maintained by the CKAN-Core-Team. The developers are spread around the globe, working partly in companies that run their own open-data portals. The community that contribute to CKAN and its Plugins is always open to developers that would like to help with suggestions, report issues or provide Pull-Requests on Github. It offers a strong community which helps beginners, no matter their background. The ckan-dev-Mailing-List provides help in developing CKAN and is the platform for discussions and ideas about CKAN, too.
Roadmap and most recent Features
Since the Major-Release 2.7 CKAN requires Redis to use a new system of asynchronous background jobs. This helps CKAN to be more performant and reliable. Just a few weeks ago the new Major-Release 2.8 was released. A lot of work on this release went into driving CKAN forward by updating to a newer Version of Bootstrap and also deprecating old features that were holding back CKAN's progress.
Another rather new feature is the datatables-feature for tabular data. Its intention is to help the data-owner to describe the actual data in more detail by describing the values and how they gathered or calculated.
In the Roadmap of CKAN are many interesting features ahead. One example is the development of the CKAN Data Explorer which is a base component of CKAN. It allows to converge data from any dataset in the DataStore of a CKAN instance to analyze it.
It is important to us to support the Open Data Movement as we see value in publishing governmental data to the public. CKAN helps us to support this cause by working with several Organizations to publish their data and consult our customers while we develop and improve their portals together.
Personally, I am happy to be a part of the CKAN-Community which has always been very helpful and supportive. The cause to help different Organizations to make their data public to the people and the respectful CKAN-Community make it a lot of fun to contribute to the code and also the community.