Liip Blog Kirby Wed, 19 Sep 2018 00:00:00 +0200 Latest articles from the Liip Blog en Delivering Service Design with Scrum - 6 insights Wed, 19 Sep 2018 00:00:00 +0200 Starting something new is always inspiring and exciting.

Getting the chance to start from scratch designing a new and effective service, together with a team is something I like best in my job as Service Designer at Liip. Immersing myself in customers’ needs, developing great new ideas, making them tangible with prototypes and getting stimulating feedbacks. These parts are definitely the most inspiring and fun of service design projects.

But the delivery can be a really hard landing.

When working on service design projects, we break open existing silos. We align all the different parts involved in the service to create a better and more efficient service experience. For the delivery of the new service, that can also entail a high degree of complexity. In addition to the hard work of developing concrete solutions, we also have to deal with other challenges. For example, changing the habits and behavior of people or clarifying organizational uncertainties. The search for the right decision-makers and sponsors between the different parts of the company and technical restrictions as further examples. After the thrill of the first creative phases, delivery can mean a really hard landing.

Combining service design with agile methods helps facing the challenges of delivering.

Having worked in both Service Designer and Scrum Master roles in recent years, I tried several ways of combining Service Design with Scrum. My goal is to combine the best of the two ways of working to make this hard landing a little softer. Here are 6 learnings that proved to be very helpful:

1. Use epics and user stories to split the service into more “digestible” pieces.

Everyone probably knows the feeling of not seeing the wood for the trees when you’re standing in front of a wall full of sketches and stickies with ideas. Then it’s very helpful to create a list of epics. In the Scrum world, epics are “a large body of work that can be broken down into a number of smaller stories” (see Atlassian). In Service Design, epics can help dividing the entire service into smaller pieces. This reduces complexity, and allows dealing within specific and limited challenges of a single epic, rather than the whole. Also, the ability to clarify one epic gives good clues where to start with this big mountain of work.

2. Use the service blueprint as the master to create the backlog.

In software projects we often use user story maps to create epics and user stories. In service design projects, the service blueprint is a very powerful alternative to do user story mapping. Service blueprints help mapping and defining all aspects of the future service - from the targeted experience to internal processes, systems, people, tools, etc involved. This contains a lot of useful information for user stories e.g.

  • The actors involved, eg. the different types of users (as personas), different staff people, systems, tools, etc.
  • The required functions, as each step of a service blueprint usually contains a number of functions that will be written in the different user stories.
  • The purpose of the function, as you can read from each part of the blueprint what is triggered by this step.

After a first version of the user story backlog is created, you can reassign the user stories to the service blueprint. Mapping all the written stories to the blueprint is also great to determine if some user stories have been forgotten. This helps a lot to have a better overview of what to do and how it affects the service experience in the end.

3. Do technical spikes in an early stage of the project in order to make your service more feasible.

If the service contains digital parts, it’s highly recommended to face the technical crack nuts in the project as soon as possible. Scrum provides us with the so called technical spikes - a great chance to dive deeper into different possibilities of solving technical issues of the new service. Strictly timeboxed, they allow developers to explore different technical solutions and suggest the one that fits best. Furthermore the team can discuss the consequences and adapt the service. In order to still create a great experience but also find a feasible way of delivering it.

4. Estimate the business value of the different aspects of the service.

In Scrum, we use business value poker to prioritize user stories. A business value is a relative comparison of the value of different user stories. It helps to prioritize the delivery and to show where the most time and money needs to be invested. This process is also very healthy (and tough!) for service ideas. Knowing how much value each part of the service brings to the whole service vision is very valuable and allows the team focus on what really matters.

You can also do business value poker in combination with an adaption of the six thinking hats method, e.g. one of the team estimates the business value in the hat of the user, one in the hat of the top manager interested in return on investment, and one in the hat of the staff member interested in delivering a service experience that doesn’t mean additional work.

5. Deliver a “Minimum Viable Service” (MVS) before taking care of the rest.

Once we have the user story backlog rooted in the service blueprint and we know which story brings most value to our service vision, we start step by step to deliver the service. In agile software projects, the team starts by producing the Minimum Viable Product (MVP). Which means, delivering the smallest amount of features necessary in order to create a valuable, reduced product to users. For services, we are doing the same - creating a “Minimum Viable Service” (MVS). This allows the team developing a first basic version of the service in a short time to market. Delivering results in a early stage of the project is not only motivating the team but also allows continuous learning, adapting and evolving of the service.

6. Work in cross functional, self organised and fully empowered teams.

Scrum teams are self organised and include all skills needed. Without having a hierarchy based system. In a service design setting, many different fields of a company are involved and it’s hard to specify decision makers and people responsible. But that’s the key. Including each and every stakeholder of a whole service in the project is never ending and rarely contributing. Therefore dedicate a small and powerful team of experts involved, give them the full competence to decide and to organise themselves but also the responsibility to deliver value.

Scrum provides great ways to deliver complex service projects.

This blogpost highlights a few aspects of how we manage the challenges of delivering a complex service project. By combining service design with scrum - from the the tools and artifacts to the mindset and the way how teams work together.

Yet, also when following all these aspects, delivering a complex service remains a hard piece of work. But definitely an easier one to handle with the structured and well working delivery methods to bring our ideas to life. Step by step - sprint by sprint.

A personal view on what Holacracy has brought us Tue, 18 Sep 2018 00:00:00 +0200 I was invited to share on our adoption of Holacracy in a recent online gathering of the Siemens Grow2Glow network. It gave me the opportunity to think about the impact the Holacracy organisational system has had on the agency, its culture and the way we work. I first focused on the negative impacts, the things we need to fix, but then realized I wasn’t fair to the reality: the benefits outgrow the challenges by far.

Thinking of it, and having had the opportunity to be in the company through several “eras”, I realized how much things have changed. I have no scientific evidences of what I am sharing here, those are things I experienced and observed, a personal view.

Good stuff happened before

Liip always had a strong focus on values and human-centeredness: ethics always had a say, collaborators also. And before the adoption of Holacracy, we benefited a lot from the introduction of the Scrum agile framework (2009) and later from cross-functional teams and guilds (~ Spotify model, 2012).

That was before the founders adopted Holacracy (2016), a period of time which I will refer to, in this post, as the "partners-era". Family-like dynamics were the norm, partners were the “elders” of the organisation, employees the “grown-up children”.

The years since our adoption of Holacracy will hereunder be coined the "holacracy-era".

Good stuff happening, 3 years after Holacracy adoption

It is ok to try things, even serious things

Entrepreneurship in the partners-era summed up to: partners decided on life and death of the offering. Adventurous workers had to pitch their ideas and go get buy-in and approval from the partners.

In the holacracy-era, things are different: services are launched out of the initiative and authority of any employee, creating a momentum that attracts collaborators and clients. In the last two years we have indeed added several services to our portfolio through individual initiatives, and more are incubating.

It is ok to stop things, even serious things

If it has gotten ok to launch things, it’s also ok to stop things. In one of the circles I am involved in, we launched a new consulting service a few months ago, thinking it was a great complement to our existing offering. Although I still think it was, it turned out the market was not at the rendez-vous. We took the opportunity of someone leaving the company to consider dropping that new offering.

In the partners-era, I would have had to raise awareness of the partners to stop the thing – I wouldn’t have had the authority to do so – and the partners would have probably felt compelled to adopt a parental posture “you should have known better”, “be careful next time”, “what will the clients think?”, “how should I now communicate on that?”, …

Nowadays: I sensed the tension in my role in that circle, and took the initiative: first seeking advice from the ones impacted, clients included, and envisioning with them the possible scenarii. Then I made up my mind on a “closing” scenario. Yet, I sensed the need to give the rest of the company the opportunity to react and potentially on the decision before I would finally enact it. After all, I wasn’t sure all impacted persons had been involved. I launched an integrative decision process on our internal messaging app that read like: “In my role X, I’m about to close service Y, read more in this doc. Today you can ask questions, tomorrow you can give feedbacks, thursday you can raise objections.” It all went very well. Those who cared or were loosely impacted decided to participate, the other ones didn’t feel compelled to participate – great relief – and we closed this service smoothly.

Continuous, organic micro re-organisations

In the partners-era I participated and was impacted by several meetings during which we “reshaped teams”. In such meetings we would list all people in the office, and then try to shape a certain amount of “balanced” teams out of them.

In this Holacracy-era: we don’t “gather and redistribute the cards” anymore. Teams evolve slowly, roles and people come and go organically. An interesting example: a team of 20+ had a very rough year, their financial performance was bad for several months. The team could not agree on how to change things: should they focus? and if yes, on what market, technology, value proposition, ... Disagreement and the implicit need for consensus prevented the situation from evolving.

What unlocked the situation was that some people from that team literally raised their hands and pitched: “I wanna launch a mini team focused on tech X, I have the passion, we can make it happen, who’s in?” A few joined. Others saw the drive and copied: “I believe there’s a market for Y, who’s in?”. The big team slowly dissolved into smaller teams, that were more focused and with clearer motivation and purpose. The big team had re-purposed itself into smaller ones.

Organisational fitness

In the partners-era my job in the company was forcefully important to me, because it was my only job within the company. Like most employees in this world. This one-person-one-job relationship forces people to protect their job, in order to protect their belonging to the organisation, their salary, status, etc. And thus, organisations keep adding jobs, and almost never remove them, except in abrupt attempts: re-organisations.

In a holacratic organisation employeeship and roles are decoupled: I now see people suppressing some role they fill, because the role is not needed anymore. I see others merging their accountabilities in other roles, that they don't hold. The roles that exist are there because of an existing need, not a past one.

I am not my roles and my roles are not me, getting rid of my role doesn’t directly threaten me.

Talents, experience and passion flow to roles

With Holacracy, we see much more opportunities for employees to act in roles that fit their experience, motivation and/or talents. Talents, experience or passion get noticed, roles get proposed.

More demand for personal development

In the partner-era: just a few of us were interested in improving our soft skills. The only incentive would have actually come from a partner, in a talk like “Now that you are a Product Owner, you should improve on this and that soft skills”. In this holacracy-era: I believe the talk happens within each of us: “Now that I am self-org, I sense the need to be a better leader/colleague/collaborator/...”. We do see growing demand and attendance for trainings on social and leadership skills.

Less rants about the heads and the organisation

Those who process the annual Employee Meetings sense that there’s much less rants, and that the messages from the employees has moved from “We (meaning you, Boss) need to change this“, to “I know it’s in my hands to make this happen, where do I start?” We moved from expecting change from others to expecting change from ourselves.

A last word

I wanted this blogpost to focus on the benefits we are seeing at Liip from having adopted Holacracy. A post on the challenges, again a personal view on them, will follow.

Experimenting With Android Things Fri, 07 Sep 2018 00:00:00 +0200 Context

At Liip we have the great opportunity to experiment in new technologies when working on innovation projects. Within the last two years we developed Houston: a VOIP communication system for buses. The system includes an Android app which takes VOIP calls and collects sensor information like the GPS position. When I heard about the release of Android Things, I thought it’d be interesting to investigate if we could use it to make the app run on IoT devices like the Raspberry PI. Therefore I experimented with the platform for a few days. And I discovered an amazing operating system bringing all the benefits of Android to IoT devices.

How to start

The best thing about Android Things is that it is built for Android developers and comes with standard development tools, the ones you are used to. You will develop inside Android Studio. And being able to debug code using adb, like for a normal Android device, is great. Android Things extends the core Android framework with additional APIs provided by the Things Support Library (reference, platform differences). That means it will be easy to use the Android API to develop a user interface, communicate with a server, etc. It is also possible to use Android libraries like Google Firebase (real time datababe, push notifications, ...) or Tensorflow (machine learning).

To start developing for Android Things, you need a development board. Google supports different hardware among the well-known Raspberry Pi 3 Model B. In 2017, more boards where supported, such as the Intel Edison, but it seems they stopped supporting them (at least officially, list of supported platforms).

If you don’t have a Raspberry Pi 3 already, you can buy a starter kit like this one or this one. You will get a Raspberry Pi 3 with a power supply with the first kit. Furthermore there will be a SD card with the Android Things OS flashed and the Rainbow Hat. The hat is easy to plug on the Raspberry and contains a buffet of sensors (temperature, pressure, etc.), LEDs, alphanumeric display, etc.. To explore the platform, I wanted to be able to do my own electronic schemes. Therefore I bought the Raspberry itself and this kit that contains a breadboard (thin plastic board used to hold electronic components) with cables, resistors, LEDs and few sensors. Something you shouldn’t forget is buying a cable which allows you to connect the breadboard to the GPIO connector on the Raspberry (like this one).

If you want to use Android Things with a specific sensor or peripheral, be aware that usage without a driver is impossible. Therefore it is better to look at the list of available drivers first (available here, here and here). It is also possible to write a driver yourself. You’ll find a tutorial here. Once you have a Raspberry PI, it is easy to get Android Things running on it. You can follow this tutorial to flash the OS on a Micro SD card and connect the hardware.

Prototype 1: Weather station

The first prototype I built is a small weather station. The temperature in the office was high and my idea was to measure the temperature of the room. Furthermore I wanted to light up a led if the temperature reaches a given threshold. A next step would be to switch on a fan for example .

the weather station
the weather station

To build the electronic circuit on the development breadboard, you’ll need a bit of electronic knowledge. This feels like becoming an electronic engineer, which is great! Reading a tutorial is great as a memory refresher. This one was very helpful to me.

One import thing to remember is Ohm’s Law U = RI , to know which resistance you have to put between the Raspberry PI and the LED. The resistance needs to respect the intensity of the LED that switches on automatically . It was also not easy to understand how the Peripheral I/O interface works at the beginning (where is the ground (-), power (+), functionality of the different pins, etc.). I printed the scheme here to have it next to me at all times.

General Purpose Input/Output (GPIO) provides a programmable interface to read the state of a binary input device or control the on/off state of a binary output device (such as an LED). The red LED was connected on pin BCM6 and the green LED was connected on pin BMC 5, for example. To get the temperature via the sensor and display it on the alphanumeric display, I used BCM2 and BMC3 (inter-integrated circuit bus).

The steps to create the Android Things app are the following:

  1. Create a Android project with a blank Activity in Android Studio and add the Android Things library and drivers in build.gradle
provided ‘<version>'
implementation '<version>'
implementation '<version>'
  1. Add the following line in the AndroidManifest.xml under the application tag
<uses-library android:name=""/>
  1. In OnCreate of the MainActivity, register the peripherals
val service = PeripheralManagerService()
mSensorManager = getSystemService(Context.SENSOR_SERVICE) as SensorManager

//Temperature sensor
try {
    mEnvironmentalSensorDriver = Bmx280SensorDriver("I2C1")
    Log.d(TAG, "Initialized I2C BMP280")
} catch (e: IOException) {
    throw RuntimeException("Error initializing BMP280", e)

//Alphanumeric display
try {
    mDisplay = AlphanumericDisplay("I2C2")
    Log.d(TAG, "Initialized I2C Display")
} catch (e: IOException) {
    Log.e(TAG, "Error initializing display", e)
    Log.d(TAG, "Display disabled")
    mDisplay = null

//Green LED
try {
    mGreenLedGpio = service.openGpio("BCM5")

} catch (e: IOException) {
    throw RuntimeException("Problem connecting to IO Port", e)

//Red LED
try {
    mRedLedGpio = service.openGpio("BCM6")
} catch (e: IOException) {
    throw RuntimeException("Problem connecting to IO Port", e)
  1. And finally register for sensor events:
private val mTemperatureListener = object : SensorEventListener {
    override fun onSensorChanged(event: SensorEvent) {

        if (mTempCount % 100 !== 0) {

        val temperature = Temperature(System.currentTimeMillis(), event.values[0])
        Log.i(FragmentActivity.TAG, "Temperature: " + temperature.getTemperature())


        try {
            if (temperature.getTemperature() >= TEMPERATURE_THRESHOLD) {
            } else {
        } catch (e: IOException) {
            Log.e(TAG, "Error", e)

    override fun onAccuracyChanged(sensor: Sensor, accuracy: Int) {
        Log.d(TAG, "accuracy changed: $accuracy")

private fun updateDisplay(value: Float) {
    if (mDisplay != null) {
        try {
        } catch (e: IOException) {
            Log.e(TAG, "Error setting display", e)

In the onSensorChanged callback for the temperature sensor, we filter events (otherwise we receive too many temperature changes) and display the value on the display with updateDisplay() . Once the value reached a certain threshold (TEMPERATURE_THRESHOLD), we switch the red or the green LED with setValue(true)/setValue(false).

And that’s pretty much it. Thanks to the Android framework and the implemented drivers it is very simple to communicate with the Raspberry PI using simple interfaces and without the need for low level programming.

Prototype 2: GPS tracker

In the next prototype, I wanted to test Android Things with a GPS antenna. If we come back to the Houston project, the idea would be to have an embedded device that is able to track the positions of the buses.

the Raspberry PI with the Grove GPS
the Raspberry PI with the Grove GPS

It was laborious to find out which GPS sensor to buy. There are many models so I decided to use the Grove GPS with GrovePi+. GrovePi+ offers a plug-and-play module for the Raspberry PI. You can connect Grove sensors without cables and breadboards.

It is possible to use the Grove Pi+ with Android Things, but I am not sure if it will work with all Grove sensors. I guess it depends on the connection interface you have (I2C, UART, …) and if there is a driver available for it. In my case, I needed a UART connection and it was enough to connect the Grove GPS to the RPISER port of the GrovePi+.

To create the Android Things app, the steps are the following:

  1. Create a Android project with a blank Activity in Android Studio and add the Android Things library and drivers in build.gradle
provided ‘<version>'
implementation '<version>'
  1. Add the following line in the AndroidManifest.xml under the application tag
<uses-library android:name=""/>
  1. In onCreate of the MainActivity, register the GPS sensor
// Callback when SensorManager delivers temperature data.
val I2C_BUS = "I2C1"
val UART_BUS = "UART0"
val UART_BAUD = 9600
val ACCURACY = 2.5f // From GPS datasheet


mLocationManager = getSystemService(Context.LOCATION_SERVICE) as LocationManager

// We need permission to get location updates
if (checkSelfPermission(Manifest.permission.ACCESS_FINE_LOCATION) != PackageManager.PERMISSION_GRANTED) {
    // A problem occurred auto-granting the permission
    Log.e(TAG, "No ACCESS_FINE_LOCATION permission")

try {
    // Register the GPS driver
    mGpsDriver = NmeaGpsDriver(this, UART_BUS, UART_BAUD, ACCURACY)

} catch (e: Exception) {
    Log.e(TAG, "Unable to open GPS UART", e)
  1. Then, register the listener to receive location updates, just like for a standard Android code base.

// Register for location updates
mLocationManager.requestLocationUpdates(LocationManager.GPS_PROVIDER, 2, 0, mLocationListener)


val mLocationListener = object : LocationListener {
    override fun onLocationChanged(location: Location) {
        Log.v(TAG, "Location update: $location")

    override fun onStatusChanged(provider: String, status: Int, extras: Bundle) {}
    override fun onProviderEnabled(provider: String) {}
    override fun onProviderDisabled(provider: String) {}

When you are running the app, you will see the current locations displayed in logcat. We can imagine to collect the locations in a Firebase Firestore database or display them on a map. We have seen, that it is very simple to build an Android Things prototype. As long as you have a bit of Android development experience and the correct driver and hardware setup, you’ll be okay.

Next steps

The goal of this innovation project was to experiment with the Android Things platform and discover if we could use it in our projects. It was great fun to build the electronic circuits and interact with the hardware via the Android framework. The setup is simple and the framework allows us to easily build a lot of amazing apps for IoT devices. You’ll find a lot of interesting use cases on the Web such as controlling you home switches remotely. Or even funnier examples like building an high five machine. Google insists on their website that the platform is not meant just for quick prototyping, but also for real products. Android Things is definitely a platform that was worth a try. The next step is to start integrating some of the Android code bases we have developed for Houston or other projects. We have a lot of ideas already!

30 Participants, 2 Speeches and some beer - Highlights of the e-commerce meet-up Wed, 05 Sep 2018 00:00:00 +0200 We got down to business with:
"Agile is simple, is it?" - Daniel Frey - Agile Mind Setter @ Liip

Agile is simple, is it?

Daniel Frey, PO Scrum Master and agile coach for Liip has been interested in agility from the early days on. His presentation was about agility as a useful tool for e-commerce company. But why? Agility is focusing on changing hearts and minds. Therefore he beliefs that your delivery system will emerge based on the new thinking and the cultural change to being agile rather than doing agile. With the focus on values and principles, agility is great to increase the performance of any project. Even though the performance isn’t always matching the hoped-for resultats. The velocity of business value delivering is increasing. Realising the higher business value early in a project is great. However Scrum is not suitable for all kinds of projects but for most of it agility is a great solution.

Qard - financing e-commerce made simple

Azzedine Chaibrassou, the Co-Founder of Qard was talking about his business model. The technology of Qard is based on symfony. As an open minded meet-up group everyone was really into news related to symfony in the field of e-commerce. Small businesses are famous for having working capital needs which are recurrent in the case of retailers, that’s when Qard is helping out. The business model is rather interesting as Qard is circumventing all the financial topic into data, to help small and mid sized enterprises.

Tone of Voice – 3 Steps to a Strong Voice Fri, 31 Aug 2018 00:00:00 +0200 Step 1: Find Your Voice

Every car gets you from A to B. But which one do you pick? The brand’s image is what makes the difference. Urban speedster? Sporty powerhouse? Or elegant premium look? Mini, BMW or Mercedes? You choose the one that’s best for you, your lifestyle and your social environment.

Mini test drive: phrases such as ‘style hunting’, ‘style in the city’ and ‘style aficionados’ show that they are appealing to stylish city dwellers.
Source: MINI Test Drive, accessed May 2018

BMW test drive: the claim of ‘ultimate driving machine’ shows that this is about a powerful engine. ‘Unmatched driving pleasure’ conveys competition and strength. We are targeting people who live or want to live life in the fast lane.
Source: BMW Test Drive, accessed May 2018

Mercedes test drive: ‘done your way’, ‘that suits you best’. This is all about the customer. A premium experience is the be-all and end-all.
Source: Mercedes Test Drive, accessed May 2018

Brand Values Inspire a Tone of Voice

The basis of your voice is your corporate identity: if this is clearly defined, then your tone of voice is fairly easy to work out. If your corporate identity is the dry theory, then your linguistic world is its practical implementation. Here are a few examples, once again using the car industry.

Brand values inspiring
Linguistic implementation using creative, playful language
Stylistic methods such as puns, punchlines, half sentences, exclamations
Source BMW Group/MINI, accessed May 2018

Potential brand values daring
Linguistic implementation using competitive language
Expressions such as ‘be/become the best’, ‘show them all’, ‘the way to the top’
Source BMW Group, accessed May 2018

Brand values excellence
Linguistic implementation using an elaborate code that stands out from the masses
Source Mercedes Mission and Values, accessed May 2018

Step 2: Make your Voice Consistently Heard

Once you have found your voice, it has to resound in every valley of the marketing universe and cover all content needs users have.

BMW is a perfect example of how to implement your tone of voice. Everywhere you can feel a love of performance and the pursuit of competition – as here, in its report of an extreme expedition.
Source:, accessed May 2018

The ‘ultimate driving machine’ is the constant focus: here, for test training in the snow.
Source:, accessed May 2018

A love of performance played out over and over, in the choice of themes and of linguistic expression.
Source:, accessed May 2018

Read our blog post about how you make your voice heard on search enginges

Step 3: Stick with it – for Decades

Consumers only ‘consume’ for just a few minutes – the rest of the time they are people with busy lives. If they are not listening to your company’s message, they will not hear your voice. It is therefore important that your company gets its message out over and over in the right tone, as this is the only way to lodge it in people’s minds and ensure recognition. So how long for? How about a couple of decades?

BMW: Ultimate Driving Machine
45 years old
Created in 1973 by the agency Amirati & Puris

Axe: The smell that drives women crazy
33 years old
Created in 19852 by the agency Lintas

Nike: Just do it
30 years old
Created in 1988 by the agency Wieden & Kennedy


Do you think the tone of voice the agency is suggesting is a bit too expensive? If you remember that BMW has been using the same tone for 45 years, it quickly pays off.

Speak, find out who you are – and then let the world join in. Again and again. In fact, again and again and again. And if you choose to develop or continue your tone of voice, we at Liip are happy to help.


  • Step 1: find your voice
    Fresh and bold or friendly and polite? However you sound, make it a conscious choice.

  • Step 2: make your voice consistently heard
    Every communication situation counts, as your image out there is a collection of individual experiences.

  • Step 3: stick with it – for decades
    It takes time to make a lasting impression. So you can safely hold on a bit longer than you think.

The experts behind this article

Thanks to Christoph and Jenny for content and copy assistance, to Jan and Jérémie for the images. This article would not have been possible without you!

Metadata-based search vs. Primary-data search Wed, 29 Aug 2018 00:00:00 +0200 Let's consider two examples:

  • Anna wants to compare real estate prices in Bern and Zurich. She stumbles upon an open data portal of Switzerland. There is a dataset of the Statistical Office of the City of Zurich covering the real estate prices, unfortunately there is no such dataset for Bern. Anna will try to find those using the search.
  • Thomas wants to have a look at the Outcomes of the recent votings (“Ständeratswahlen”) in Zürich and wants to know how each of the prominent politicians performed.

We have taken the assumption, that a user of such a portal is interested in certain data, but doesn’t know where to find it. They are not familiar with the portal nor are they considered “power users” with special knowledge about the data or metadata.

To make things more exciting we have written this blog post with two authors: Stefan Oderbolz and Thomas Ebermann. Stefan will try to convince you that a search based on metadata will help Anna greatly to find her the right dataset to compare real estate prices. Thomas will try to convince you that a search on primary data will work better for Thomas and allow him to quickly find the right dataset.

While we rolled a dice who will have to write which part, we will try as hard as we can to convince you - the reader - that our opinion is the best. Buckle up and enjoy the ride.

Why we need Metadata-based search

by Stefan Oderbolz

Metadata-based search means, the search engine, that powers a portal search has indexed documents based on a specified metadata schema. Documents have metadata fields like title, description, keywords or temporal and spatial coverage.

In its most basic form, a query for “Zurich real estate” will return all documents, that have metadata values matching the “Zurich” “real” and “estate”.
“Zurich” will be covered by the spatial coverage field. “Real” and “estate” resp. “Real estate” are either found as keywords or as part of the title and description of the dataset.

Anna’s search will return 2 results:

  • The aforementioned dataset of the Statistical Office of the City of Zurich
  • A dataset of the Canton of Zurich, that contains the real estate data for the whole Canton (incl. The City of Zurich)

No further results are shown. The analogous search “Bern real estate” doesn’t return any datasets.

This example shows the strength of the metadata-based search: if the metadata has good quality, you get the correct datasets. And if the search does not return anything, you can be confident, that the dataset does not exist (by-the-way: this would be a good time to start a “data request” for this data, so next time you search, you’ll find both datasets right away).

I think this is an important message: getting zero results is actually a good thing. You know it’s not there. This heavily relies on the assumption that the metadata is good and correctly indexed.

from kiwi.concept,, CC BY 2.0

A neatly organized catalogue helps the users to browse it, without even entering search terms. You can show categories for your datasets, to give information about what kind of things you can find on the portal.

A neatly organized catalogue helps the users to browse it, without even entering search terms. You can show categories for your datasets, to give information about what kind of things you can find on the portal.

Screenshot from

With keywords on each dataset you can even create this on a more fine-grained level. A user might discover similar datasets, that share the same keyword (see screenshot below). Like this a user has the possibility to “move” seamlessly in the catalogue and discover the available datasets.

Screenshot from

Last but not least the metadata-based approach forces data publishers to deliver high quality metadata for each dataset. Otherwise nobody will find and use the datasets. This incentive is important to keep in mind. If we simply outsource the task to the computer, we lose all the valuable knowledge the publishers have of their data. For the closed world of a data portal, this is a source we can not afford to lose.

The catch-all primary-data approach might be the right choice for something vast like the web, but not for the small-ish, precise area of a data catalogue. But I’ll let Thomas take it from here and convince you otherwise.

Why we will need primary-data search

By Thomas Ebermann

Let's go back in time. Let’s go way back, even before pokemon go was popular, maybe let's go even to a time where pokemon didn’t even exist. In that distant past somewhere around 1996 we lived in a different world. In a world without Trump and chaos. In a world where there things were somewhat ordered.

If you wanted a book you went to your local library - and if you had an old library like me: without a computer, then could go to an even then weird place, called an index-catalogue, where you could look for books containing a certain author, or you could look at books for a certain genre, for example fantasy literature. If you felt even a bit adventurous you could as well wonder through the library only to find that all of the books were nicely put into categories and have been alphabetized by the authors name. It was a nice experience. But it was also a tedious one.

A well sorted library

The web was no different. We were living at the time, where looking for a website was really easy. You just had to do the same things that you did in your library. Go to an index-catalogue - I mean Yahoo - and then select the right category, for example Recreation - Health and Sports and there you would find all of the websites about soccer or american football. It was simple, it was effective, life was good.


But of course someone had to ruin it. It was us, we just liked making websites too much. More and more websites emerged, and soon there was no simple fast way to categorize them all. It was like a library that instead of receiving 10 new books a day all of a sudden received 1 Million a day. Nobody could categorize it all, nobody could read it all.


And then Google came to save us. It was drastic. They had thrown away all of the categories, folders and myriads of subfolders, that librarians had worked so hard on. Instead they just gave us a search box. That was it. And the most ridiculous thing happened. People actually liked it. The results were relevant and fast. It was almost magic, like a librarian that knew it all. Like a librarian that had actually read all the books and knew what was inside them.


Moving forward 20 years here we stand, and I still feel reminded of my good old library when I think of Looking at the nearly 7000 datasets on opendata swiss it makes me proud how fast the amount of datasets has risen. In 2016 the amount of datasets was nearly half that number, I still remember that website having a big two...something on the frontpage.

While we cannot simply assume that the amount of open datasets will rise as quickly as the amount of websites or internet users, I still expect that rather sooner than later we will have 100 000 datasets on open data worldwide. At this point it will definitely be a burden to go and find these datasets via a catalogue. We will definitely rely on the big search box more.

We will probably expect even more from that searchbox. Similarly to Google, who has become a librarian who has read all the websites, we probably will also want a librarian who has read all of our datasets.

So when I type in “Limmatstrasse” into that box, I will somehow expect to find every dataset that has to do with Limmatstrasse. Probably those that are really popular to be on the top of my search results and those that are less popular to be at the back.

While I eventually might want to facet my search, so for example just have datasets on related to politics, I might as well enter “Limmatstrasse Kantonsratswahlen 2018 Mauch” or something into the box and find what I needed, when I am looking for a dataset containing the candidates and some sort of breakdown on the regions.

Voting results from

Being a lazy person I might expect, that a click on that search result will take me directly into the relevant rows of the dataset, just to verify that that's the right thing that I want.

Yet all of these things are not possible when relying only on a metadata for my search. First of all, I will probably get lost in the catalogue when trying to go through 100 000 datasets. Second, I probably won’t find even one dataset containing Limmatstrasse, because nobody cared to enter myriads of different streets into the metadata. It's just not practical. The same goes for all involved candidates. Nobody has the time nor resources to annotate the dataset that thoroughly. Finally it's simply impossible to point me at the right row in a dataset when all I have is just some metadata.

No results for Limmatstrasse

So while everybody who submitted his dataset did a fairly good job at annotating it, it's simply not enough to fulfill my needs. I need a librarian who, similarly to Google back in the 90ties, has a radically different approach. I will need a librarian who has read all of the datasets that are in the library and can point me to the right dataset, even if my query very rather fuzzy. In other words I need a search that has indexed all of the primary data.

Conclusion: best of both worlds

So there we are, you have seen the high flying arguments from both worlds while each us us has swiped the negative aspects of each solutions under the table. So here they are:

  • Downsides of Metadata for search:
    • It’s relevant when you want to make sense out of the primary data but it will never be as rich as the primary data. It obviously does not contain some aspects that a user might be searching for.
    • There is a constant dissonance between what the users are searching for and how we tag things (e.g. “weather” vs. meteodata, or “Lokale Informationen” vs. Zürich)
  • Downsides of Primary Data for search:
    • On the other side primary data might match a lot of relevant search terms that the user is searching for, but it is simply not good for abstraction (e.g. I want all the data from all swiss cities).
    • Creating such ontologies from primary data is very difficult: Thus automatically tagging datasets based on primary data into categories like health or politics is hard.
    • Using only primary data we might also run into the problem of relevancy. When a user is searching for a very generic keyword like Zürich, and then finds myriads or results that have the word Zürich and yet cannot facet his search down only to political results is frustrating.

Precision and Recall

So of course from our perspective a perfect search will have to embrace both worlds. To formalize that a bit let's think about recall and precision.

  • Recall: How many datasets that are relevant have been found? (If 10 are potentially relevant but the search returns only one, thats a low recall)
  • Precision: How many datasets that have been returned are relevant? (If 10 datasets have been returned, but only 1 is actually relevant, thats a low precision)

So in an ideal world we would want a search to have both, but the reality today looks more like this:

Precision Recall
Metadata Search High Low
Primary-data Search Low High
Combined Approach High High

So while metadata search has a high precision, because you only get what you search for, it lacks in recall, often not finding all of the relevant datasets, just because they have been tagged badly. On the other hand a primary-data search gives you a high recall, e.g. returning all of the datasets that somewhere have the word “Zürich” in it, but has a low precision because probably most of the search results are not really relevant for you.

There are also two other arguments where the primary data and meta-data approach differ: On one hand indexing primary data allows us to search for “Limmatstrasse Kantonsratswahlen 2018 Mauch”, so giving us a very fine grained information retrieval. On the other hand just using primary data to “browse” a catalogue is not useful. In contrast using metadata, searching for “Politics” or “Votings” we rather get a very broad result set. Yet using those tags to browse into “Politik” and “Abstimmungen” might give us a much wider overview of available datasets that go beyond our little search.

Good for Poor for
Metatdata Information Browsing the catalogue Highly detailed search queries
Primary-data Information Highly detailed search queries Browsing the catalogue

That's why we think that in the future we should embrace indexing primary data of our datasets while combining it smartly with the metadata information, to really get the best of both world. While this might not be easy, especially having a high precision and a high recall, we think it is a challenge worth trying. I am very sure that it will improve the overall user experience. After all we want all these precious datasets to be found and used.

Migrate File Entities to Media Entities in Drupal 8 Tue, 21 Aug 2018 00:00:00 +0200 At Liip, we started several Drupal 8 projects a while ago, when the media module was not yet part of Drupal Core. Back then, we decided, to use normal file / image fields for media uploads in our content types.

A few months later, our clients prefered a media library using Drupal Media in Core and an Entity Browser to search and find their media assets.

The question is: Is it possible to convert my old-fashioned image fields to new shiny media entities?
The answer is: Yes, there is a way out of your misery!

We created a module called: "Migrate File Entities to Media Entities".

The module allows you, to migrate Drupal 8.0 file entities to Drupal 8.5 media entities, using the migrate module.

The main features of the module?

  • It provides a drush command that automatically detects all file / image fields and creates the corresponding media reference field.
  • Before migrating the files to media entities, a binary hash of all images is calculated, and duplicate files are recognized. If the same file was uploaded multiple times on different nodes, only one media entity will be created.
  • Migration of translated file / image fields is supported. Having different images per language will create a translated media entity with the corresponding image.
  • Using migrate to file module allows drush processing, rollback and track changes.

How to migrate images to media entities

Prepare target media fields

  • Prepare the media fields based on the existing file fields using the following drush command:
    drush migrate:file-media-fields <entity_type> <bundle> <source_field_type> <target_media_bundle>


    drush migrate:file-media-fields node article image image

    For all file fields the corresponding media entity reference field will be automatically created suffixed by {field_name}_media.

Prepare duplicate file / image detection

In order to detect duplicate files / images, run the following drush command to calculate a binary hash for all files. The data will be saved to the table "migrate_file_to_media_mapping". You need to run this drush command to be able to import media entities.

drush migrate:duplicate-file-detection

Create a custom migration per content type based on the migrate_file_to_media_example module

  • Create a custom module
  • Create your own migration templates based on the examples in migrate_file_to_media_example.

The module provided a custom migrate source plugin called "media_entity_generator".

id: article_images
label: Article Image to Media Migration
migration_group: media
  plugin: media_entity_generator
  entity_type: node
  bundle: article
  langcode: 'en'

  # provide a list of all field names you want to migrate
  - field_image
  - field_image2

  plugin: entity:media

You need to create a migration per entity bundle and provide a list of all field names, you want to migrate. The source plugin will query the database and find all files / images linked with these fields.

Step-by-step instruction how to migrate your own files / images.

Step 1: Create media entities.

File migrate_file_to_media_example/config/install/migrate_plus.migration.article_images.yml

This is the starting point. This migration creates a unique media entity from all files / images referenced by fields in the configuration field_names of the source plugin.
In the example, we have two image fields called: "field_image" and "field_image2".

The drush command to calculate the binary hash needs to be run before you can use the media_entity_generator source plugin.

Using on Step 1:

File migrate_file_to_media_example/config/install/migrate_plus.migration.article_images_rokka.yml

This is an example migration, how to move all images to the image content delivery network. You need to install the
drupal rokka module.

Step 2: Create media entity translations.

File migrate_file_to_media_example/config/install/migrate_plus.migration.article_images_de.yml

This migration adds a translation to existing media entities if a translated file / image field is found.

Step 3: Link media entities with media reference field on target bundle.

File migrate_file_to_media_example/config/install/migrate_plus.migration.article_media.yml

This migration links the newly created media entities with entity reference field on the target bundle.

Step 4: Check the migration status.

drush migrate:status

Step 5: Run the migration.

drush migrate:import <migration_name>
Face detection - An overview and comparison of different solutions Wed, 15 Aug 2018 00:00:00 +0200

Part 1: SaaS vendors

This article is the first part of a series. Make sure to subscribe to receive future updates!
TLDR: If you want to use the API's as fast as possible, directly check out my code on GitHub.

Did you ever had the need for face detection?
Maybe to improve image cropping, ensure that a profile picture really contains a face or maybe to simply find images from your dataset containing people (well, faces in this case).
Which face detection SaaS vendor would be the best for your project? Let’s have a deeper look into the differences in success rates, pricing and speed.

In this blog post I'll be analyzing the face detection API's of:

How does face detection work anyway?

Before we dive into our analysis of the different solutions, let’s understand how face detection works today in the first place.

The Viola–Jones Face Detection

It’s the year 2001. Wikipedia is being launched by Jimmy Wales and Larry Sanger, the Netherlands becomes the first country in the world to make same-sex marriage legal and the world witnesses one of the most tragic terror attacks ever.
At the same time two bright minds, Paul Viola and Michael Jone, come together to start a revolution in computer vision.

Until 2001, face detection was something which didn’t work very precise nor very fast. That was, until the Viola-Jones Face Detection Framework was proposed which not only had a high success rate in detecting faces but could do it also in real time.

While face and object recognition challenges existed since the 90’s, they surely boomed even more after the Viola–Jones paper was released.

Deep Convolutional Neural Networks

One of such challenges is the ImageNet Large Scale Visual Recognition Challenge which exists since 2010. While in the first two years the top teams were working mostly with a combination of Fisher Vectors and Support vector machines, 2012 changed everything.

The team of the University of Toronto (consisting of Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton) used for the first time a deep convolutional neural network for object detection. They scored first place with an error rate of 15.4% while the second placed team had a 26.2% error rate!
A year later, in 2013, every team in the top 5 was using a deep convolutional neural network.

So, how does such a network work?
An easy-to-understand video was published by Google earlier this year:

What do Amazon, Google, IBM and Microsoft use today?

Since then, not much changed. Today’s vendors still use Deep Convolutional Neural Networks, probably combined with other Deep Learning techniques though.
Obviously, they don’t publish how their visual recognition techniques exactly work. The information I found was:

While they all sound very similar, there are some differences in the results.
Before we test them, let’s have a look at the pricing models first though!


Amazon, Google and Microsoft have a similar pricing model, meaning that with increasing usage the price per detection drops.
With IBM however, you always pay the same price per API call after your free tier usage volume is exhausted.
Microsoft provides you the best free tier, allowing you to process 30'000 images per month for free.
If you need more detections though, you need to use their standard tier where you start paying from the first image on.

Price comparison

That being said, let’s calculate the costs for three different profile types.

  • Profile A: Small startup/business processing 1’000 images per month
  • Profile B: Digital vendor with lots of images, processing 100’000 images per month
  • Profile C: Data center processing 10’000’000 images per month
Amazon Google IBM Microsoft
Profile A $1.00 USD Free Free Free
Profile B $100.00 USD $148.50 USD $396.00 USD $100.00 USD
Profile C $8’200.00 USD $10’498.50 USD $39’996.00 USD 7’200.00 USD

Looking at the numbers, for small customers there’s not much of a difference in pricing. While Amazon charges you starting from the first image, having 1’000 images processed still only costs one Dollar. However, if you don’t want to pay anything, then Google, IBM or Microsoft will be your vendor to go.

Note: Amazon offers a free tier on which you can process 5’000 images per month for the first 12 months for free! However, after this 12 month trial, you’ll have to start paying starting with the first image.

Large API usage

If you really need to process millions of images, it's important to compare how every vendor scales.
Here's a list of the minimum price you pay for the API usage after a certain amount of images.

  • IBM constantly charges you $4.00 USD per 1’000 images (no scaling)
  • Google scales down to $0.60 USD (per 1’000 images) after the 5’000’000th image
  • Amazon scales down to $0.40 USD (per 1’000 images) after the 100’000’000th image
  • Microsoft scales down to $0.40 USD (per 1’000 images) after the 100’000’000th image

So, comparing prices, Microsoft (and Amazon) seem to be the winner.
But can they also score in success rate, speed and integration? Let’s find out!

Hands on! Let’s try out the different API’s

Enough theory and numbers, let’s dive into coding! You can find all code used here in my GitHub repository.

Setting up our image dataset

First things first. Before we scan images for faces, let’s set up our image dataset.
For this blog post I’ve downloaded 33 images from, many thanks to the contributors/photographers of the images and also to Pexels!
The images have been committed to the GitHub repository, so you don't need to search for any images if you simply want to start playing with the API's.

Writing a basic test framework

Framework might be the wrong word as my custom code only consists of two classes. However, these two classes help me to easily analyze image (meta-) data and have as few code as possibly in the different implementations.

A very short description: The FaceDetectionClient class holds general information about where the images are stored, vendor details and all processed images (as FaceDetectionImage objects).

Comparing the vendors SDK’s

As I’m most familiar with PHP, I've decided to stick to PHP for this test. I still want to point out what SDK’s each vendor provides (as of today):

Amazon Google IBM Microsoft
  • Android
  • JavaScript
  • iOS
  • Java
  • .NET
  • Node.js
  • PHP
  • Ruby
  • Python
  • C#
  • Go
  • Java
  • Node.js
  • PHP
  • Python
  • Ruby
  • cURL examples
  • Node.js
  • Java
  • Python
  • cURL examples
  • C#
  • Go
  • Java
  • JavaScript
  • Node.js
  • PHP
  • Python
  • Ruby
  • cURL examples

Note: Microsoft doesn't actually provide any SDK's, they do offer code examples for the technologies listed above though.

If you’ve read the lists carefully, you might have noticed that IBM does not only offer the least amount of SDK’s but also no SDK for PHP.
However, that wasn’t a big issue for me as they provide cURL examples which helped me to easily write 37 lines of code for a (very basic) IBM Visual Recognition client class.

Integrating the vendors API’s

Getting the SDK's is easy. Even easier with Composer. However, I did notice some things that could be improved to make a developer’s life easier.


I've started with the Amazon Rekognition API. Going through their documentation, I really felt a bit lost at the beginning. Not only did I miss some basic examples (or wasn’t able to find them?), but also I had the feeling that I have to click a few times until I was able to find what I was looking for. In one case I even gave up and simply got the information by directly inspecting their SDK source code.
On the other hand, it could just be me? Let me know if Amazon Rekognition was easy (or difficult) for you to integrate!

Note: While Google and IBM return the bounding boxes coordinates, Amazon returns the coordinates as ratio of the overall image width/height. I have no idea why that is, but it's not a big deal. You can write a helper function to get the coordinates from the ratio, just as I did.


Next came Google. In comparison with Amazon, they do provide examples, which helped me a lot! Or maybe I was just already in the “investing different SDK’s” mindset.
Whatever the case may be, integrating the SDK felt a lot simpler and also I had to spend less clicks to retrieve information I was looking for.


As stated before, IBM doesn’t (yet?) provide a SDK for PHP. However, with the provided cURL examples, I had a custom client set up in no time. There’s not much that you can do wrong if a cURL example is provided to you!


Looking at Microsoft's code example for PHP (which uses Pear's HTTP_Request2 package), I ended up writing my own client for Microsoft's Face API.
I guess I'm simply a cURL person.

Inter-rater reliability

Before we compare the different face detection API’s, let's scan the images first by ourselves! How many faces would a human be able to detect?
If you already had a look on my dataset, you might have seen a few images containing tricky faces. What do I mean by "tricky"? Well, when you e.g. only see a small part of a face and/or the face is in an uncommon angle.

Time for a little experiment

I went over all images and wrote down how many faces I thought I've detected. I would use this number to calculate a vendor's sucess rate for an image and see if it was able to detect as many faces as I did.
However, setting the expected number of faces detected solely by me seemed a bit too biased to me. I needed more opinions.
This is when I kindly asked three coworkers to go through my images and tell me how many faces they would detect.
The only task I gave them was "Tell me how many faces, and not heads, you're able to detect". I didn't define any rules, I wanted to give them any imaginable freedom for doing this task.

What is a face?

When I ran through the images detecting faces, I just counted every face from which at least around a quarter was visible. Interestingly my coworkers came up with slightly different definitions of a face.

  • Coworker 1: I've also counted faces which I mostly wasn't able to see. But I did see the body, so my mind told me that there is a face
  • Coworker 2: If I was able to see the eyes, nose and mouth, I've counted it as a face
  • Coworker 3: I've only counted faces which I would be able to recognize in another image again

Example image #267885

My coworkers and me detected each 10, 13, 16 and 16 faces in this image. I've decided to continue with the average, thus 14.

It was very interesting to me to see how everyone came up with different techniques regarding face detection.
That being said, I've used the average face count of my results and the ones from my coworkers to set the expected number of faces detected for an image.

Comparing the results

Now that we have the dataset and the code set up, let’s process all images by all competitors and compare the results.
My FaceDetectionClient class also comes with a handy CSV export which provides some analytical data.

This is the first impression I've received:

Amazon Google IBM Microsoft
Total faces detected 99 / 188
(52.66 %)
76 / 188
(40.43 %)
74 / 188
(39.36 %)
33 / 188
(17.55 %)
Total processing time (ms) 57007 43977 72004 40417
Average processing time (ms) 1727 1333 2182 1225

Very low success rates?

Amazon was able to detect 52.66 % of the faces defined, Google 40.43 %, IBM 39.36 % and Microsoft even just 17.55 %.
How come the low success rates? Well, first off, I do have lots of tricky images in my dataset.
And secondly, we should not forget that we, as humans, do have a couple of million years worth of evolutionary context to help understand what something is.
While many people believe that we've mastered face detection in tech already, there's still room for improvement!

The need for speed

While Amazon was able to detect the most faces, Google’s and Microsoft’s processing times were clearly faster than the other ones. However, in average they still need longer than one second to process one image from our dataset.
Sending the image data from our computer/server to another server surely scratches on performance.

Note: We’ll find out in the next part of the series if (local) open source libraries could do the same job faster.

Groups of people with (relatively) small faces

After analyzing the images, Amazon seems to be quite good at detecting faces in groups of people and where the face is (relatively) small.

A small excerpt

Image # Amazon
(faces detected)
(faces detected)
(faces detected)
(faces detected)
109919 15 10 8 8
34692 10 8 6 8
889545 10 4 none none

Example image #889545 by Amazon

Amazon was able to detect 10 faces in this image, while Google only found 4, IBM 0 and Microsoft 0.

Different angles, uncomplete faces

So, does it mean that IBM is simply less good than their competitors? Not at all. While Amazon might be good in detecting small faces in group photos, IBM has another strength:
Difficult images.

What do I mean with that? Well, images with faces where the head is in an uncommon angle or maybe not shown completely.
Here are three examples from our dataset from which IBM was the sole vendor to detect the face.

Example image #356147 by IBM

Image with a face only detected by IBM.

Example image #403448 by IBM

Image with a face only detected by IBM.

Example image #761963 by IBM

Image with a face only detected by IBM.

Bounding boxes

Yes, also the resulting bounding boxes are different.
Amazon, IBM and Microsoft are here very similar and return the bounding boxes of a person’s face.
Google is slightly different and focuses not on someone’s face but on the complete head (which makes more sense to me?).

Example image #933964 by Google

Google returns bounding boxes covering most of the head, not just the face.

Example image #34692 by Microsoft

Microsoft (as well as IBM and Amazon) focus on the face instead of the head.

What is your opinion on this? Should an API return the bounding boxes to the person's face or to the person's head?

False positives

Even though our dataset was quite small (33 images), it contains two images on which face detection failed for some vendors.

Example image #167637 by Amazon

Find the face!

In this (nice) picture of a band, Amazon and Google both didn’t detect the face of the front man but of his tattoo(!) instead. Microsoft didn't detect any face at all.
Only IBM succeeded and correctly detected the front man’s face (and not his tattoo).
Well played IBM!

Example image #948199 by Google

Two-Face, is that you?

In this image Google somehow detected two faces in the same region. Or the network sees something which is invisible to us. Which is even more scary.

Wait, there is more!

You can find the complete dataset with 33 source images, 4x 33 processed images and the metadata CSV export on GitHub.
Not only that, if you clone the repository and enter your API keys, you can even process your own dataset!
At last but not least, if you know of any other face detection API, feel free to send me a pull request to include it to the repository!

How come the different results?

As stated in the beginning of this blog post, none of the vendors completely reveal how they implemented face detection.
Let’s pretend for a second that they use the same algorithms and network configuration - they could still end up with different results depending on the training data they used to train their neural network.

Also there might be some wrappers around the neural networks. Maybe IBM simply rotates the image 3 times and processes it 4 times in total to also find uncommon face angles?
We may never find out.

A last note

Please keep in mind that I only focused on face detection. It’s not to confuse with face recognition (which can tell if a certain face belongs to a certain person) and also I didn’t dive deeper into other features the API’s may provide to you.
Amazon for example, tells you if someone is smiling, has a beard or their eyes open/closed. Google can tell you the likeliness if someone is surprised or wearing a headwear. IBM tries to provide you an approximately age range of a person including its likely gender. And Microsoft could tell you if a person is wearing any makeup.

The above points are only a few examples of what this vendors can offer to you. If you need more than just basic face detection, I highly recommend you to read and test their specs according to your purpose.


So, which vendor is now the best? There is really no right answer to this. Every vendor has its strengths and weaknesses. But for “common” images, Amazon, Google and IBM should do a pretty good job.
Microsoft didn't really convince me though. With 33 out of 188 faces detected, they had the lowest success rate of all four vendors.

Example image #1181562 by Google

For "common" images, Amazon, Google and IBM will be able to detect all faces.

Example image #1181562 by Microsoft

Microsoft, y u no detect faces?

What about OpenCV and other open source alternatives?

This question will be answered in the next part of this series. Feel free to subscribe to our data science RSS feed to receive related updates in the future and thank you so much for reading!

Zoo Pokedex Part 2: Hands on with Keras and Resnet50 Tue, 07 Aug 2018 00:00:00 +0200 Short Recap from Part 1

In the last blog post I briefly discussed the potential of using deep learning to build a zoo pokedex app that could be used to motivate zoo goers to engage with the animals and the information. We also discussed the imagenet competition and how deep learning has drastically changed the image recognition game. We went over the two main tricks that deep learning architectures do, namely convolutions and pooling, that allow such deep learning networks to perform extremely well. Last but not least we realized that all you have to do these days is to stand on the shoulders of giants by using the existing networks (e.g. Resnet50) to be able to write applications that have a similar state of the art precision. So finally in this blog post it’s time to put these giants to work for us.


The goal is to write an image detection app that will be able to distinguish animals in our zoo. Now for obvious reasons I will make our zoo really small, thus only containing two types of animals:

  • Oryxes and
  • LLamas (why there is a second L in english is beyond my comprehension).

Why those animals? Well they seem fluffy, but mostly because the original imagenet competition does not contain these animals. So it represents a quite realistic scenario of a Zoo having animals that need to be distinguished but having existing deep learning networks that have not been trained for those. I really have picked these two kinds of animals mostly by random just to have something to show. (Actually I checked if the Zürich Zoo has these so i can take our little app and test it in real life, but that's already part of the third blog post regarding this topic)

Getting the data

Getting data is easier than ever in the age of the internet. Probably in the 90ties I would have had to go to some archive or even worse take my own camera and shoot lots and lots of pictures of these animals to use them as training material. Today I can just ask Google to show me some. But wait - if you have actually tried using Google Image search as a resource you will realize that downloading their images in huge amounts is a pain in the ass. The image api is highly limited in terms of what you can get for free, and writing scrapers that download such images is not really fun. That's why I went to the competition and used Microsoft's cognitive services to download images for each animal.

Downloading image data from Microsoft

Microsoft offers quite a convenient image search API via their cogitive services. You can sign up there to get a free tier for a couple of days, which should be enough to get you started. What you basically need is an API Key and then you can already start downloading images to create your datasets.

# Code to download images via Microsoft cognitive api
require 'HTTParty'
require 'fileutils'

API_KEY = "##############"
SEARCH_TERM = "alpaka"
QUERY = "alpaka"
FOLDER = "datasets"
MAX = 1000

# Make the dir
FileUtils::mkdir_p "#{FOLDER}/#{SEARCH_TERM}"

# Make the request
headers = {'Ocp-Apim-Subscription-Key' => API_KEY}
query = {"q": QUERY, "offset": 0, "count": BATCH_SIZE}
puts("Searching for #{SEARCH_TERM}")
response = HTTParty.get(API_ENDPOINT,:query => query,:headers => headers)
total_matches = response["totalEstimatedMatches"]

i = 0
while response["nextOffset"] != nil && i < MAX
    response["value"].each do |image|
        i += 1
        content_url = image["contentUrl"]
        ext = content_url.scan(/^\.|jpg$|gif$|png$/)[0]
        file_name = "#{FOLDER}/#{SEARCH_TERM}/#{i}.#{ext}"
        next if ext == nil
        next if File.file?(file_name)
            puts("Offset #{response["nextOffset"]}. Downloading #{content_url}")
            r = HTTParty.get(content_url)
  , 'wb') { |file| file.write(r.body) }
            puts "Error fetching #{content_url}"
    query = {"q": SEARCH_TERM, "offset": i+BATCH_SIZE, "count": BATCH_SIZE}
    response = HTTParty.get(API_ENDPOINT,:query => query,:headers => headers)

The ruby code above simple uses the API in batches and downloads llamas and oryxes into their separate directories and names them accordingly. What you don’t see is that I went through these folders by hand and removed images that were not really the animal, but for example a fluffy shoe, that showed up in the search results. I also de-duped each folder. You can scan the images quickly on your mac using the thumbnail preview or use an image browser that you are familiar with to do the job.

Problem with not enough data

Ignoring probable copyright issues (Am i allowed to train my neural network on copyrighted material) and depending on what you want to achieve you might run into the problem, that it’s not really that easy to gather 500 or 5000 images of oryxes and llamas. Also to make things a bit challenging I tried to see if it was possible to train the neural networks using only 100 examples of each animal while using roughly 50 examples to validate the accuracy of the networks.

Normally everyone would tell you that you need definitely more image material because deep learning networks need a lot of data to become useful. But in our case we are going to use two dirty tricks to try to get away with our really small collection: data augmentation and reuse of already pre-trained networks.

Image data generation

A really neat handy trick that seems to be prevalent everyday now is to take the images that you already have and change them slightly artificially. That means rotating them, changing the perspective, zooming in on them. What you end up is, that instead of having one image of a llama, you’ll have 20 pictures of that animal, just every picture being slightly different from the original one. This trick allows you to create more variation without actually having to download more material. It works quite well, but is definitely inferior to simply having more data.

We will be using Keras a deep learning library on top of tensorflow, that we have used before in other blog posts to create a good sentiment detection. In the domain of image recognition Keras can really show its strength, by already having built in methods to do image data generation for us, without having to involve any third party tools.

# Creating a Image data generator
train_datagen = ImageDataGenerator(preprocessing_function=preprocess_input,
    shear_range=0.2, zoom_range=0.2, horizontal_flip=True)

As you can see above we have created an image data generator, that uses sheering, zooming and horizontal flipping to change our llama pictures. We don’t do a vertical flip for example because its rather unrealistic that you will hold your phone upside down. Depending on the type of images (e.g. aerial photography) different transformations might or might not make sense.

# Creating variations to show you some examples
img = load_img('data/train/alpaka/Alpacca1.jpg')
x = img_to_array(img) 
x = x.reshape((1,) + x.shape)  
i = 0
for batch in train_datagen.flow(x, batch_size=1,
                          save_to_dir='preview', save_prefix='alpacca', save_format='jpeg'):
    i += 1
    if i > 20:
        break  # otherwise the generator would loop indefinitely

Now if you want to use that generators in our model directly you can use the convenient flow from directory method, where you can even define the target size, so you don’t have to scale down your training images with an external library.

# Flow from directory method
train_generator = train_datagen.flow_from_directory(train_data_dir,
    target_size=(sz, sz),
    batch_size=batch_size, class_mode='binary')

Using Resnet50

In order to finally step on the shoulder of giants we can simply import the resnet50 model, that we talked about earlier. Here is a detailed description of each layer and here is the matching paper that describes it in detail. While there are different alternatives that you might also use the resnet50 model has a fairly high accuracy, while not being too “big” in comparison to the computationally expensive VGG network architecture.

On a side note: The name “res” comes from residual. A residual can be understood a a subtraction of features that were learned from the input a leach layer. ResNet has a very neat trick that allows deeper network to learn from residuals by “short-circuiting” them with the deeper layers. So directly connecting the input of an n-th layer to some (n+x)th layer. This short-circuiting has been proven to make the training easier. It does so by helping with the problem of degrading accuracy, where networks that are too deep are becoming exponentially harder to train.

#importing resnet into keras
from keras.models import load_model
base_model = ResNet50(weights='imagenet')

As you can see above, importing the network is really dead easy in keras. It might take a while to download the network though. Notice that we are downloading the weights too, not only the architecture.

Training existing models

The next part is the exciting one. Now we finally get to train the existing networks on our own data. The simple but ineffective approach would be to download or just re-build the architecture of the successful network and train those with our data. The problem with that approach is, that we only have 100 images per class. 100 images per class are not even remotely close to being enough data to train those networks well enough to be useful.

Instead we will try another technique (which I somewhat stole from the great keras blog): We will freeze all weights of the downloaded network and add three final layers at the end of the network and then train those.

Freezing the base model

Why is this useful you might ask: Well by doing so we can freeze all of the existing layers of the resnet50 network and just train the final layer. This makes sense, since the imagenet task is about recognizing everyday objects from everyday photographs, and it is already very good at recognising “basic” features such as legs, eyes, circles, heads, etc… All of this “smartness” is already encoded in the weights (see the last blog post). If we throw these weights away we will lose these nice smart properties. But instead we can just glue another pooling layer and a dense layer at the very end of it, followed by a sigmoid activation layer, that's needed to distinguish between our two classes. That's by the way why it says “include_top=False” in the code, in order to not include the initial 1000 classes layer, that was used for the imagenet competition. Btw. If you want to read up on the different alternatives to the resnet50 you will find them here.

# Adding three layers on top of the network
base_model = ResNet50(weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(1, activation='sigmoid')(x)

Finally we can now re-train the network with our own image material and hope for it to turn out to be quite useful. I’ve had some trouble finding the right optimizer that had proper results. Usually you will have to experiment with the right learning rate to find a configuration that has an improving accuracy in the training phase.

#freezing all the original weights and compiling the network
from keras import optimizers
optimizer = optimizers.RMSprop(lr=0.00001, rho=0.9, epsilon=None, decay=0.0)
model = Model(inputs=base_model.input, outputs=predictions)
for layer in base_model.layers: layer.trainable = False
model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
model.fit_generator(train_generator, train_generator.n // batch_size, epochs=3, workers=4,
        validation_data=validation_generator, validation_steps=validation_generator.n // batch_size)

The training shouldn’t take long, even when you are using just a CPU instead of a GPU and the output might look something like this:

You’ll notice that we reached an accuracy of 71% which isn’t too bad, given that we have only 100 original images of each class.


One thing that we might do now is to unfreeze some of the very last layers in the network and re-train the network again, allowing those layers to change slightly. We’ll do this in the hope that allowing for more “wiggle-room”, while changing most of the actual weights, the network might give us better results.

# Make the very last layers trainable
split_at = 140
for layer in model.layers[:split_at]: layer.trainable = False
for layer in model.layers[split_at:]: layer.trainable = True
model.compile(optimizer=optimizers.RMSprop(lr=0.00001, rho=0.9, epsilon=None, decay=0.0), loss='binary_crossentropy', metrics=['accuracy'])    
model.fit_generator(train_generator, train_generator.n // batch_size, epochs=1, workers=3,
        validation_data=validation_generator, validation_steps=validation_generator.n // batch_size)

And indeed it helped our model to go from 71% accuracy to 82%! You might want play around with the learning rates a bit or maybe split it at a different depth, in order to tweak results. But generally I think that just adding more images would be the easiest way to achieve 90% accuracy.

Confusion matrix

In order to see how well our model is doing we might also compute a confusion matrix, thus calculating the true positives, true negatives, and the false positives and false negatives.

# Calculating confusion matrix
from sklearn.metrics import confusion_matrix
r = next(validation_generator)
probs = model.predict(r[0])
classes = []
for prob in probs:
    if prob < 0.5:
cm = confusion_matrix(classes, r[1])

As you can see above I simply took the first batch from the validation generator (so the images of which we know if its a alpakka or an oryx) and then use the confusion matrix from scikit-learn to output something. So in the example below we see that 28 resp. 27 images of each class were labeled correctly while making an error in 4 resp. 5 images. I would say that’s quite a good result, given that we used only so little data.

#example output of confusion matrix
array([[28,  5],
       [ 4, 27]])

Use the model to predict images

Last but not least we can of course finally use the model to predict if an animal in our little zoo is an oryx or an alpakka.

# Helper function to display images
def load_image(img_path, show=False):

    img = image.load_img(img_path, target_size=(224, 224))
    img_tensor = image.img_to_array(img)                    # (height, width, channels)
    img_tensor = np.expand_dims(img_tensor, axis=0)         # (1, height, width, channels), add a dimension because the model expects this shape: (batch_size, height, width, channels)
    #img_tensor /= 255.                                      # imshow expects values in the range [0, 1]

    if show:

    return img_tensor

# Load two sample images
oryx = load_image("data/valid/oryx/106.jpg", show=True)
alpaca = load_image("data/valid/alpaca/alpaca102.jpg", show=True)

As you can see in the output, our model successfully labeled the alpaca as an alpaca since the value was less than 0.5 and the oryx as an oryx, since the value was > 0.5. Hooray!

Conclusion or What’s next?

I hope that the blog post was useful to you, and showed you that you don’t really need much in order to get started with deep learning for image recognition. I know that our example zoo pokedex is really small at this point, but I don’t see a reason (apart from the lack of time and resources) why it should be a problem to scale out from our 2 animals to 20 or 200.

On the technical side, now that we have a model running that’s kind of useful, it would be great to find out how to use it in on a smartphone e.g. the IPhone, to finally have a pokedex that we can really try out in the wild. I will cover that bit in the third part of the series, showing you how to export existing models to Apple mobile phones making use of the CoreML technology. As always I am looking forward to your comments and corrections and point you to the ipython notebook that you can download here.

Deploy your Nuxt.js app to Mon, 06 Aug 2018 00:00:00 +0200 Nuxt.js is Vue's version of Next.js for React. It's a "framework for creating Universal Vue.js Applications." Getting started with Nuxt.js is rather straight forward, the guides help a lot. is a cloud hosting provider we use a lot at Liip. Configuring to serve any kind of app is also pretty straight forward, as there's a lot of guides for all kinds of apps.

I started building a microsite. Nothing too fancy. As I was familiar with Vue, I wanted to give Nuxt.js a try and created this app as a single page application. So I created a skeleton of the app, included a header, a bit of navigation and an image or two and was ready to deploy the first version of it somewhere, so stakeholders could actually have a look at it. I've already used for various other apps before, so I figured it would be fit for a Vue SPA.

Since this was my first Nuxt.js app, I tried to figure out how to deploy it to, but didn't find any resources. I decided to share the steps and config needed in order to deploy it.

Vue rendered

Nuxt's documentation is pretty straight forward when it comes to deployment. There's essentially three commands that need to be run in order to get a fresh copy of a Nuxt app running:

npm install
npm run build
npm start

And these commands are exactly what is needed in order to deploy the app. Most important: There's no need for any special Nuxt config. The out-of-the-box config should be enough.

To configure an app for, there's three files needed:

  • ./.platform/routes.yaml - Available routes for the app
  • ./.platform/services.yaml - Attached services, such as databases, search platforms, etc.
  • ./ - The main configuration file

First of all, the app must be configured. I'll call the app node, use a node:8.11 as its type and give it 128M disk space:


name: node
type: nodejs:8.11
disk: 128

Now the build process needs to be added. This is done by adding a build hook:


name: node
type: nodejs:8.11
disk: 128

# Build hook
  build: |
    npm install
    npm run build

Afterwards, needs to know how to start the app and what kind of locations it needs to serve. The finished now looks like the following:


name: node
type: nodejs:8.11
disk: 128

# Build hook
  build: |
    npm install
    npm run build

# Web config
    start: npm start
      passthru: true

In the file .platform/routes.yaml, we also need to add a default route that passes everything it receives straight to the Nuxt process:

# .platform/routes.yaml

    type: upstream
    upstream: "node:http"

The file .platform/services.yaml can be left empty.

That's it. Now we can go on to deploy the Nuxt app to

git remote add platform [...]
git push -u platform

Static pages

Static pages function a bit differently. They are generated by Nuxt during the build process and are served as static files by A starting point for such a configuration can be found in the documentation.

A little bit of adjustment is needed, though.

The same starting config for name, type and disk size can be used:


name: node
type: nodejs:8.11
disk: 128

Now, instead of running npm run build in the build hook, we let Nuxt generate static files via npm run generate:


name: node
type: nodejs:8.11
disk: 128

  build: |
    npm install
    npm run generate

... and let serve everything in the dist/ folder as a static page:


name: node
type: nodejs:8.11
disk: 128

  build: |
    npm install
    npm run generate

    start: sleep infinity
      root: dist
         - index.html
              allow: true
              allow: true

Note the start command: by letting the whole app sleep for an infinite amount of time, we're not actually executing anything. Only the static files are served.

The files .platform/routes.yaml can stay the same:

# .platform/routes.yaml

    type: upstream
    upstream: "node:http"

Also, the file .platform/services.yaml stays empty. The actual deployment then happens the same way:

git remote add platform [...]
git push -u platform

Takeaway thoughts and Nuxt.js really play well together. Combining the two was less complex than I originally thought and the sheer speed of both make development, for me, loads of fun. Almost as much fun as sharing knowledge and learning new things.