Liip Blog https://www.liip.ch/de/blog Kirby Tue, 21 Aug 2018 00:00:00 +0200 Neueste Artikel aus dem Liip Blog de Migrate File Entities to Media Entities in Drupal 8 https://www.liip.ch/de/blog/migrate-file-entities-to-media-entities-in-drupal-8 https://www.liip.ch/de/blog/migrate-file-entities-to-media-entities-in-drupal-8 Tue, 21 Aug 2018 00:00:00 +0200 At Liip, we started several Drupal 8 project a while ago, when the media module was not part of Drupal Core. Back then, we decided, to use to normal file / image fields for media uploads in our content types.

A few months later, our clients prefer a full media library using Drupal Media in Core and Entity Browser to search and find their media assets.

The question is: Is it possible to convert my old-fashioned image fields to new shiny media entities?
The answer is: Yes, there is a way out of your misery!

We created a module called: "Migrate File Entities to Media Entities".

migrate-file-to-media-small

https://www.drupal.org/project/migrate_file_to_media
The module allows you to migrate Drupal 8.0 file entities to Drupal 8.5 media entities using the migrate module.

What are the main features?

  • It provides a drush command that automatically detects all file / image fields and create the corresponding media reference field.
  • Before migrating the files to media entities, a binary hash of all images is calculated, and duplicate files are recognized. If the same file was uploaded multiple times on different nodes, only one media entity will be created.
  • Migration of translated file/image fields is supported. Having different images per language will create a translated media entity with the corresponding image.
  • Using migrate module allows drush processing, rollback and track changes.

How to migrate images to media entities using this module?

Prepare target media fields

  • Prepare the media fields based on the existing file fields using the following drush command:
    drush migrate:file-media-fields <entity_type> <bundle> <source_field_type> <target_media_bundle>

    Example

    drush migrate:file-media-fields node article image image

    For all file fields the corresponding media entity reference fields will be automatically created suffixed by {field_name}_media.

Prepare duplicate file / image detection

In order to detect duplicate files / images, run the following drush command to calculate a binary hash for all files. The data will be saved to the table "migrate_file_to_media_mapping". You need to run this drush command to be able to import media entities.

drush migrate:duplicate-file-detection

Create a custom the migration per content type based on the migrate_file_to_media_example module

  • Create a custom module
  • Create your own migration templates based on the examples in migrate_file_to_media_example.

The module provided a custom migrate source plugin called "media_entity_generator".

id: article_images
label: Article Image to Media Migration
migration_group: media
source:
  plugin: media_entity_generator
  entity_type: node
  bundle: article
  langcode: 'en'

  # provide a list of all field names you want to migrate
  field_names:
  - field_image
  - field_image2

destination:
  plugin: entity:media

You need to create a migration per entity bundle and provide a list of all field names, you want to migrate. The source plugin will query the database and find all files / images linked with these fields.

Step-by-step instruction how to migrate your own files / images.

Step 1: Create media entities.

File migrate_file_to_media_example/config/install/migrate_plus.migration.article_images.yml

This is the starting point. This migration creates a unique media entity from all files / images referenced by fields in the configuration field_names of the source plugin.
In the example, we have two image fields called: "field_image" and "field_image2".

Important:
The drush command to calculate the binary hash need to be run before you can use the media_entity_generator source plugin.

Using rokka.io on Step 1:

File migrate_file_to_media_example/config/install/migrate_plus.migration.article_images_rokka.yml

This is an example migration, how to move all images to the rokka.io image content delivery network. You need to install the
drupal rokka module.

Step 2: Create media entity translations.

File migrate_file_to_media_example/config/install/migrate_plus.migration.article_images_de.yml

This migration adds a translation to existing media entities if a translated file / image field is found.

Step 3: Link media entities with media reference field on target bundle.

File migrate_file_to_media_example/config/install/migrate_plus.migration.article_media.yml

This migration links the newly created media entities with entity reference field on the target bundle.

Step 4: Check the migration status.

drush migrate:status

Step 5: Run the migration.

drush migrate:import <migration_name>
]]>
Face detection - An overview and comparison of different solutions https://www.liip.ch/de/blog/face-detection-an-overview-and-comparison-of-different-solutions-part1 https://www.liip.ch/de/blog/face-detection-an-overview-and-comparison-of-different-solutions-part1 Wed, 15 Aug 2018 00:00:00 +0200

Part 1: SaaS vendors

This article is the first part of a series. Make sure to subscribe to receive future updates!
TLDR: If you want to use the API's as fast as possible, directly check out my code on GitHub.

Did you ever had the need for face detection?
Maybe to improve image cropping, ensure that a profile picture really contains a face or maybe to simply find images from your dataset containing people (well, faces in this case).
Which face detection SaaS vendor would be the best for your project? Let’s have a deeper look into the differences in success rates, pricing and speed.

In this blog post I'll be analyzing the face detection API's of:

How does face detection work anyway?

Before we dive into our analysis of the different solutions, let’s understand how face detection works today in the first place.

The Viola–Jones Face Detection

It’s the year 2001. Wikipedia is being launched by Jimmy Wales and Larry Sanger, the Netherlands becomes the first country in the world to make same-sex marriage legal and the world witnesses one of the most tragic terror attacks ever.
At the same time two bright minds, Paul Viola and Michael Jone, come together to start a revolution in computer vision.

Until 2001, face detection was something which didn’t work very precise nor very fast. That was, until the Viola-Jones Face Detection Framework was proposed which not only had a high success rate in detecting faces but could do it also in real time.

While face and object recognition challenges existed since the 90’s, they surely boomed even more after the Viola–Jones paper was released.

Deep Convolutional Neural Networks

One of such challenges is the ImageNet Large Scale Visual Recognition Challenge which exists since 2010. While in the first two years the top teams were working mostly with a combination of Fisher Vectors and Support vector machines, 2012 changed everything.

The team of the University of Toronto (consisting of Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton) used for the first time a deep convolutional neural network for object detection. They scored first place with an error rate of 15.4% while the second placed team had a 26.2% error rate!
A year later, in 2013, every team in the top 5 was using a deep convolutional neural network.

So, how does such a network work?
An easy-to-understand video was published by Google earlier this year:

What do Amazon, Google, IBM and Microsoft use today?

Since then, not much changed. Today’s vendors still use Deep Convolutional Neural Networks, probably combined with other Deep Learning techniques though.
Obviously, they don’t publish how their visual recognition techniques exactly work. The information I found was:

While they all sound very similar, there are some differences in the results.
Before we test them, let’s have a look at the pricing models first though!

Pricing

Amazon, Google and Microsoft have a similar pricing model, meaning that with increasing usage the price per detection drops.
With IBM however, you always pay the same price per API call after your free tier usage volume is exhausted.
Microsoft provides you the best free tier, allowing you to process 30'000 images per month for free.
If you need more detections though, you need to use their standard tier where you start paying from the first image on.

Price comparison

That being said, let’s calculate the costs for three different profile types.

  • Profile A: Small startup/business processing 1’000 images per month
  • Profile B: Digital vendor with lots of images, processing 100’000 images per month
  • Profile C: Data center processing 10’000’000 images per month
Amazon Google IBM Microsoft
Profile A $1.00 USD Free Free Free
Profile B $100.00 USD $148.50 USD $396.00 USD $100.00 USD
Profile C $8’200.00 USD $10’498.50 USD $39’996.00 USD 7’200.00 USD

Looking at the numbers, for small customers there’s not much of a difference in pricing. While Amazon charges you starting from the first image, having 1’000 images processed still only costs one Dollar. However, if you don’t want to pay anything, then Google, IBM or Microsoft will be your vendor to go.

Note: Amazon offers a free tier on which you can process 5’000 images per month for the first 12 months for free! However, after this 12 month trial, you’ll have to start paying starting with the first image.

Large API usage

If you really need to process millions of images, it's important to compare how every vendor scales.
Here's a list of the minimum price you pay for the API usage after a certain amount of images.

  • IBM constantly charges you $4.00 USD per 1’000 images (no scaling)
  • Google scales down to $0.60 USD (per 1’000 images) after the 5’000’000th image
  • Amazon scales down to $0.40 USD (per 1’000 images) after the 100’000’000th image
  • Microsoft scales down to $0.40 USD (per 1’000 images) after the 100’000’000th image

So, comparing prices, Microsoft (and Amazon) seem to be the winner.
But can they also score in success rate, speed and integration? Let’s find out!

Hands on! Let’s try out the different API’s

Enough theory and numbers, let’s dive into coding! You can find all code used here in my GitHub repository.

Setting up our image dataset

First things first. Before we scan images for faces, let’s set up our image dataset.
For this blog post I’ve downloaded 33 images from pexels.com, many thanks to the contributors/photographers of the images and also to Pexels!
The images have been committed to the GitHub repository, so you don't need to search for any images if you simply want to start playing with the API's.

Writing a basic test framework

Framework might be the wrong word as my custom code only consists of two classes. However, these two classes help me to easily analyze image (meta-) data and have as few code as possibly in the different implementations.

A very short description: The FaceDetectionClient class holds general information about where the images are stored, vendor details and all processed images (as FaceDetectionImage objects).

Comparing the vendors SDK’s

As I’m most familiar with PHP, I've decided to stick to PHP for this test. I still want to point out what SDK’s each vendor provides (as of today):

Amazon Google IBM Microsoft
  • Android
  • JavaScript
  • iOS
  • Java
  • .NET
  • Node.js
  • PHP
  • Ruby
  • Python
  • C#
  • Go
  • Java
  • Node.js
  • PHP
  • Python
  • Ruby
  • cURL examples
  • Node.js
  • Java
  • Python
  • cURL examples
  • C#
  • Go
  • Java
  • JavaScript
  • Node.js
  • PHP
  • Python
  • Ruby
  • cURL examples

Note: Microsoft doesn't actually provide any SDK's, they do offer code examples for the technologies listed above though.

If you’ve read the lists carefully, you might have noticed that IBM does not only offer the least amount of SDK’s but also no SDK for PHP.
However, that wasn’t a big issue for me as they provide cURL examples which helped me to easily write 37 lines of code for a (very basic) IBM Visual Recognition client class.

Integrating the vendors API’s

Getting the SDK's is easy. Even easier with Composer. However, I did notice some things that could be improved to make a developer’s life easier.

Amazon

I've started with the Amazon Rekognition API. Going through their documentation, I really felt a bit lost at the beginning. Not only did I miss some basic examples (or wasn’t able to find them?), but also I had the feeling that I have to click a few times until I was able to find what I was looking for. In one case I even gave up and simply got the information by directly inspecting their SDK source code.
On the other hand, it could just be me? Let me know if Amazon Rekognition was easy (or difficult) for you to integrate!

Note: While Google and IBM return the bounding boxes coordinates, Amazon returns the coordinates as ratio of the overall image width/height. I have no idea why that is, but it's not a big deal. You can write a helper function to get the coordinates from the ratio, just as I did.

Google

Next came Google. In comparison with Amazon, they do provide examples, which helped me a lot! Or maybe I was just already in the “investing different SDK’s” mindset.
Whatever the case may be, integrating the SDK felt a lot simpler and also I had to spend less clicks to retrieve information I was looking for.

IBM

As stated before, IBM doesn’t (yet?) provide a SDK for PHP. However, with the provided cURL examples, I had a custom client set up in no time. There’s not much that you can do wrong if a cURL example is provided to you!

Microsoft

Looking at Microsoft's code example for PHP (which uses Pear's HTTP_Request2 package), I ended up writing my own client for Microsoft's Face API.
I guess I'm simply a cURL person.

Inter-rater reliability

Before we compare the different face detection API’s, let's scan the images first by ourselves! How many faces would a human be able to detect?
If you already had a look on my dataset, you might have seen a few images containing tricky faces. What do I mean by "tricky"? Well, when you e.g. only see a small part of a face and/or the face is in an uncommon angle.

Time for a little experiment

I went over all images and wrote down how many faces I thought I've detected. I would use this number to calculate a vendor's sucess rate for an image and see if it was able to detect as many faces as I did.
However, setting the expected number of faces detected solely by me seemed a bit too biased to me. I needed more opinions.
This is when I kindly asked three coworkers to go through my images and tell me how many faces they would detect.
The only task I gave them was "Tell me how many faces, and not heads, you're able to detect". I didn't define any rules, I wanted to give them any imaginable freedom for doing this task.

What is a face?

When I ran through the images detecting faces, I just counted every face from which at least around a quarter was visible. Interestingly my coworkers came up with slightly different definitions of a face.

  • Coworker 1: I've also counted faces which I mostly wasn't able to see. But I did see the body, so my mind told me that there is a face
  • Coworker 2: If I was able to see the eyes, nose and mouth, I've counted it as a face
  • Coworker 3: I've only counted faces which I would be able to recognize in another image again

Example image #267885

267885
My coworkers and me detected each 10, 13, 16 and 16 faces in this image. I've decided to continue with the average, thus 14.

It was very interesting to me to see how everyone came up with different techniques regarding face detection.
That being said, I've used the average face count of my results and the ones from my coworkers to set the expected number of faces detected for an image.

Comparing the results

Now that we have the dataset and the code set up, let’s process all images by all competitors and compare the results.
My FaceDetectionClient class also comes with a handy CSV export which provides some analytical data.

This is the first impression I've received:

Amazon Google IBM Microsoft
Total faces detected 99 / 188
(52.66 %)
76 / 188
(40.43 %)
74 / 188
(39.36 %)
33 / 188
(17.55 %)
Total processing time (ms) 57007 43977 72004 40417
Average processing time (ms) 1727 1333 2182 1225

Very low success rates?

Amazon was able to detect 52.66 % of the faces defined, Google 40.43 %, IBM 39.36 % and Microsoft even just 17.55 %.
How come the low success rates? Well, first off, I do have lots of tricky images in my dataset.
And secondly, we should not forget that we, as humans, do have a couple of million years worth of evolutionary context to help understand what something is.
While many people believe that we've mastered face detection in tech already, there's still room for improvement!

The need for speed

While Amazon was able to detect the most faces, Google’s and Microsoft’s processing times were clearly faster than the other ones. However, in average they still need longer than one second to process one image from our dataset.
Sending the image data from our computer/server to another server surely scratches on performance.

Note: We’ll find out in the next part of the series if (local) open source libraries could do the same job faster.

Groups of people with (relatively) small faces

After analyzing the images, Amazon seems to be quite good at detecting faces in groups of people and where the face is (relatively) small.

A small excerpt

Image # Amazon
(faces detected)
Google
(faces detected)
IBM
(faces detected)
Microsoft
(faces detected)
109919 15 10 8 8
34692 10 8 6 8
889545 10 4 none none

Example image #889545 by Amazon

amazon-889545
Amazon was able to detect 10 faces in this image, while Google only found 4, IBM 0 and Microsoft 0.

Different angles, uncomplete faces

So, does it mean that IBM is simply less good than their competitors? Not at all. While Amazon might be good in detecting small faces in group photos, IBM has another strength:
Difficult images.

What do I mean with that? Well, images with faces where the head is in an uncommon angle or maybe not shown completely.
Here are three examples from our dataset from which IBM was the sole vendor to detect the face.

Example image #356147 by IBM

ibm-356147
Image with a face only detected by IBM.

Example image #403448 by IBM

ibm-403448
Image with a face only detected by IBM.

Example image #761963 by IBM

ibm-761963
Image with a face only detected by IBM.

Bounding boxes

Yes, also the resulting bounding boxes are different.
Amazon, IBM and Microsoft are here very similar and return the bounding boxes of a person’s face.
Google is slightly different and focuses not on someone’s face but on the complete head (which makes more sense to me?).

Example image #933964 by Google

google-933964
Google returns bounding boxes covering most of the head, not just the face.

Example image #34692 by Microsoft

microsoft-34692
Microsoft (as well as IBM and Amazon) focus on the face instead of the head.

What is your opinion on this? Should an API return the bounding boxes to the person's face or to the person's head?

False positives

Even though our dataset was quite small (33 images), it contains two images on which face detection failed for some vendors.

Example image #167637 by Amazon

amazon-167637
Find the face!

In this (nice) picture of a band, Amazon and Google both didn’t detect the face of the front man but of his tattoo(!) instead. Microsoft didn't detect any face at all.
Only IBM succeeded and correctly detected the front man’s face (and not his tattoo).
Well played IBM!

Example image #948199 by Google

google-948199
Two-Face, is that you?

In this image Google somehow detected two faces in the same region. Or the network sees something which is invisible to us. Which is even more scary.

Wait, there is more!

You can find the complete dataset with 33 source images, 4x 33 processed images and the metadata CSV export on GitHub.
Not only that, if you clone the repository and enter your API keys, you can even process your own dataset!
At last but not least, if you know of any other face detection API, feel free to send me a pull request to include it to the repository!

How come the different results?

As stated in the beginning of this blog post, none of the vendors completely reveal how they implemented face detection.
Let’s pretend for a second that they use the same algorithms and network configuration - they could still end up with different results depending on the training data they used to train their neural network.

Also there might be some wrappers around the neural networks. Maybe IBM simply rotates the image 3 times and processes it 4 times in total to also find uncommon face angles?
We may never find out.

A last note

Please keep in mind that I only focused on face detection. It’s not to confuse with face recognition (which can tell if a certain face belongs to a certain person) and also I didn’t dive deeper into other features the API’s may provide to you.
Amazon for example, tells you if someone is smiling, has a beard or their eyes open/closed. Google can tell you the likeliness if someone is surprised or wearing a headwear. IBM tries to provide you an approximately age range of a person including its likely gender. And Microsoft could tell you if a person is wearing any makeup.

The above points are only a few examples of what this vendors can offer to you. If you need more than just basic face detection, I highly recommend you to read and test their specs according to your purpose.

Conclusion

So, which vendor is now the best? There is really no right answer to this. Every vendor has its strengths and weaknesses. But for “common” images, Amazon, Google and IBM should do a pretty good job.
Microsoft didn't really convince me though. With 33 out of 188 faces detected, they had the lowest success rate of all four vendors.

Example image #1181562 by Google

google-1181562
For "common" images, Amazon, Google and IBM will be able to detect all faces.

Example image #1181562 by Microsoft

microsoft-1181562
Microsoft, y u no detect faces?

What about OpenCV and other open source alternatives?

This question will be answered in the next part of this series. Feel free to subscribe to our data science RSS feed to receive related updates in the future and thank you so much for reading!

]]>
Zoo Pokedex Part 2: Hands on with Keras and Resnet50 https://www.liip.ch/de/blog/zoo-pokedex-part-2-hands-on-with-keras-and-resnet50 https://www.liip.ch/de/blog/zoo-pokedex-part-2-hands-on-with-keras-and-resnet50 Tue, 07 Aug 2018 00:00:00 +0200 Short Recap from Part 1

In the last blog post I briefly discussed the potential of using deep learning to build a zoo pokedex app that could be used to motivate zoo goers to engage with the animals and the information. We also discussed the imagenet competition and how deep learning has drastically changed the image recognition game. We went over the two main tricks that deep learning architectures do, namely convolutions and pooling, that allow such deep learning networks to perform extremely well. Last but not least we realized that all you have to do these days is to stand on the shoulders of giants by using the existing networks (e.g. Resnet50) to be able to write applications that have a similar state of the art precision. So finally in this blog post it’s time to put these giants to work for us.

Goal

The goal is to write an image detection app that will be able to distinguish animals in our zoo. Now for obvious reasons I will make our zoo really small, thus only containing two types of animals:

  • Oryxes and
  • LLamas (why there is a second L in english is beyond my comprehension).
lamavsoryx

Why those animals? Well they seem fluffy, but mostly because the original imagenet competition does not contain these animals. So it represents a quite realistic scenario of a Zoo having animals that need to be distinguished but having existing deep learning networks that have not been trained for those. I really have picked these two kinds of animals mostly by random just to have something to show. (Actually I checked if the Zürich Zoo has these so i can take our little app and test it in real life, but that's already part of the third blog post regarding this topic)

Getting the data

Getting data is easier than ever in the age of the internet. Probably in the 90ties I would have had to go to some archive or even worse take my own camera and shoot lots and lots of pictures of these animals to use them as training material. Today I can just ask Google to show me some. But wait - if you have actually tried using Google Image search as a resource you will realize that downloading their images in huge amounts is a pain in the ass. The image api is highly limited in terms of what you can get for free, and writing scrapers that download such images is not really fun. That's why I went to the competition and used Microsoft's cognitive services to download images for each animal.

Downloading image data from Microsoft

Microsoft offers quite a convenient image search API via their cogitive services. You can sign up there to get a free tier for a couple of days, which should be enough to get you started. What you basically need is an API Key and then you can already start downloading images to create your datasets.

microsoft
# Code to download images via Microsoft cognitive api
require 'HTTParty'
require 'fileutils'

API_KEY = "##############"
SEARCH_TERM = "alpaka"
QUERY = "alpaka"
API_ENDPOINT  = "https://api.cognitive.microsoft.com/bing/v7.0/images/search"
FOLDER = "datasets"
BATCH_SIZE = 50
MAX = 1000

# Make the dir
FileUtils::mkdir_p "#{FOLDER}/#{SEARCH_TERM}"

# Make the request
headers = {'Ocp-Apim-Subscription-Key' => API_KEY}
query = {"q": QUERY, "offset": 0, "count": BATCH_SIZE}
puts("Searching for #{SEARCH_TERM}")
response = HTTParty.get(API_ENDPOINT,:query => query,:headers => headers)
total_matches = response["totalEstimatedMatches"]

i = 0
while response["nextOffset"] != nil && i < MAX
    response["value"].each do |image|
        i += 1
        content_url = image["contentUrl"]
        ext = content_url.scan(/^\.|jpg$|gif$|png$/)[0]
        file_name = "#{FOLDER}/#{SEARCH_TERM}/#{i}.#{ext}"
        next if ext == nil
        next if File.file?(file_name)
        begin
            puts("Offset #{response["nextOffset"]}. Downloading #{content_url}")
            r = HTTParty.get(content_url)
            File.open(file_name, 'wb') { |file| file.write(r.body) }
        rescue
            puts "Error fetching #{content_url}"
        end
    end
    query = {"q": SEARCH_TERM, "offset": i+BATCH_SIZE, "count": BATCH_SIZE}
    response = HTTParty.get(API_ENDPOINT,:query => query,:headers => headers)
end

The ruby code above simple uses the API in batches and downloads llamas and oryxes into their separate directories and names them accordingly. What you don’t see is that I went through these folders by hand and removed images that were not really the animal, but for example a fluffy shoe, that showed up in the search results. I also de-duped each folder. You can scan the images quickly on your mac using the thumbnail preview or use an image browser that you are familiar with to do the job.

Problem with not enough data

Ignoring probable copyright issues (Am i allowed to train my neural network on copyrighted material) and depending on what you want to achieve you might run into the problem, that it’s not really that easy to gather 500 or 5000 images of oryxes and llamas. Also to make things a bit challenging I tried to see if it was possible to train the neural networks using only 100 examples of each animal while using roughly 50 examples to validate the accuracy of the networks.

Normally everyone would tell you that you need definitely more image material because deep learning networks need a lot of data to become useful. But in our case we are going to use two dirty tricks to try to get away with our really small collection: data augmentation and reuse of already pre-trained networks.

Image data generation

A really neat handy trick that seems to be prevalent everyday now is to take the images that you already have and change them slightly artificially. That means rotating them, changing the perspective, zooming in on them. What you end up is, that instead of having one image of a llama, you’ll have 20 pictures of that animal, just every picture being slightly different from the original one. This trick allows you to create more variation without actually having to download more material. It works quite well, but is definitely inferior to simply having more data.

We will be using Keras a deep learning library on top of tensorflow, that we have used before in other blog posts to create a good sentiment detection. In the domain of image recognition Keras can really show its strength, by already having built in methods to do image data generation for us, without having to involve any third party tools.

# Creating a Image data generator
train_datagen = ImageDataGenerator(preprocessing_function=preprocess_input,
    shear_range=0.2, zoom_range=0.2, horizontal_flip=True)

As you can see above we have created an image data generator, that uses sheering, zooming and horizontal flipping to change our llama pictures. We don’t do a vertical flip for example because its rather unrealistic that you will hold your phone upside down. Depending on the type of images (e.g. aerial photography) different transformations might or might not make sense.

# Creating variations to show you some examples
img = load_img('data/train/alpaka/Alpacca1.jpg')
x = img_to_array(img) 
x = x.reshape((1,) + x.shape)  
i = 0
for batch in train_datagen.flow(x, batch_size=1,
                          save_to_dir='preview', save_prefix='alpacca', save_format='jpeg'):
    i += 1
    if i > 20:
        break  # otherwise the generator would loop indefinitely
variations

Now if you want to use that generators in our model directly you can use the convenient flow from directory method, where you can even define the target size, so you don’t have to scale down your training images with an external library.

# Flow from directory method
train_generator = train_datagen.flow_from_directory(train_data_dir,
    target_size=(sz, sz),
    batch_size=batch_size, class_mode='binary')

Using Resnet50

In order to finally step on the shoulder of giants we can simply import the resnet50 model, that we talked about earlier. Here is a detailed description of each layer and here is the matching paper that describes it in detail. While there are different alternatives that you might also use the resnet50 model has a fairly high accuracy, while not being too “big” in comparison to the computationally expensive VGG network architecture.

On a side note: The name “res” comes from residual. A residual can be understood a a subtraction of features that were learned from the input a leach layer. ResNet has a very neat trick that allows deeper network to learn from residuals by “short-circuiting” them with the deeper layers. So directly connecting the input of an n-th layer to some (n+x)th layer. This short-circuiting has been proven to make the training easier. It does so by helping with the problem of degrading accuracy, where networks that are too deep are becoming exponentially harder to train.

#importing resnet into keras
from keras.models import load_model
base_model = ResNet50(weights='imagenet')
comparison

As you can see above, importing the network is really dead easy in keras. It might take a while to download the network though. Notice that we are downloading the weights too, not only the architecture.

Training existing models

The next part is the exciting one. Now we finally get to train the existing networks on our own data. The simple but ineffective approach would be to download or just re-build the architecture of the successful network and train those with our data. The problem with that approach is, that we only have 100 images per class. 100 images per class are not even remotely close to being enough data to train those networks well enough to be useful.

Instead we will try another technique (which I somewhat stole from the great keras blog): We will freeze all weights of the downloaded network and add three final layers at the end of the network and then train those.

Freezing the base model

Why is this useful you might ask: Well by doing so we can freeze all of the existing layers of the resnet50 network and just train the final layer. This makes sense, since the imagenet task is about recognizing everyday objects from everyday photographs, and it is already very good at recognising “basic” features such as legs, eyes, circles, heads, etc… All of this “smartness” is already encoded in the weights (see the last blog post). If we throw these weights away we will lose these nice smart properties. But instead we can just glue another pooling layer and a dense layer at the very end of it, followed by a sigmoid activation layer, that's needed to distinguish between our two classes. That's by the way why it says “include_top=False” in the code, in order to not include the initial 1000 classes layer, that was used for the imagenet competition. Btw. If you want to read up on the different alternatives to the resnet50 you will find them here.

# Adding three layers on top of the network
base_model = ResNet50(weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(1, activation='sigmoid')(x)

Finally we can now re-train the network with our own image material and hope for it to turn out to be quite useful. I’ve had some trouble finding the right optimizer that had proper results. Usually you will have to experiment with the right learning rate to find a configuration that has an improving accuracy in the training phase.

#freezing all the original weights and compiling the network
from keras import optimizers
optimizer = optimizers.RMSprop(lr=0.00001, rho=0.9, epsilon=None, decay=0.0)
model = Model(inputs=base_model.input, outputs=predictions)
for layer in base_model.layers: layer.trainable = False
model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
model.fit_generator(train_generator, train_generator.n // batch_size, epochs=3, workers=4,
        validation_data=validation_generator, validation_steps=validation_generator.n // batch_size)

The training shouldn’t take long, even when you are using just a CPU instead of a GPU and the output might look something like this:

training

You’ll notice that we reached an accuracy of 71% which isn’t too bad, given that we have only 100 original images of each class.

Fine-tuning

One thing that we might do now is to unfreeze some of the very last layers in the network and re-train the network again, allowing those layers to change slightly. We’ll do this in the hope that allowing for more “wiggle-room”, while changing most of the actual weights, the network might give us better results.

# Make the very last layers trainable
split_at = 140
for layer in model.layers[:split_at]: layer.trainable = False
for layer in model.layers[split_at:]: layer.trainable = True
model.compile(optimizer=optimizers.RMSprop(lr=0.00001, rho=0.9, epsilon=None, decay=0.0), loss='binary_crossentropy', metrics=['accuracy'])    
model.fit_generator(train_generator, train_generator.n // batch_size, epochs=1, workers=3,
        validation_data=validation_generator, validation_steps=validation_generator.n // batch_size)
improvement

And indeed it helped our model to go from 71% accuracy to 82%! You might want play around with the learning rates a bit or maybe split it at a different depth, in order to tweak results. But generally I think that just adding more images would be the easiest way to achieve 90% accuracy.

Confusion matrix

In order to see how well our model is doing we might also compute a confusion matrix, thus calculating the true positives, true negatives, and the false positives and false negatives.

# Calculating confusion matrix
from sklearn.metrics import confusion_matrix
r = next(validation_generator)
probs = model.predict(r[0])
classes = []
for prob in probs:
    if prob < 0.5:
        classes.append(0)
    else:
        classes.append(1)
cm = confusion_matrix(classes, r[1])
cm

As you can see above I simply took the first batch from the validation generator (so the images of which we know if its a alpakka or an oryx) and then use the confusion matrix from scikit-learn to output something. So in the example below we see that 28 resp. 27 images of each class were labeled correctly while making an error in 4 resp. 5 images. I would say that’s quite a good result, given that we used only so little data.

#example output of confusion matrix
array([[28,  5],
       [ 4, 27]])

Use the model to predict images

Last but not least we can of course finally use the model to predict if an animal in our little zoo is an oryx or an alpakka.

# Helper function to display images
def load_image(img_path, show=False):

    img = image.load_img(img_path, target_size=(224, 224))
    img_tensor = image.img_to_array(img)                    # (height, width, channels)
    img_tensor = np.expand_dims(img_tensor, axis=0)         # (1, height, width, channels), add a dimension because the model expects this shape: (batch_size, height, width, channels)
    #img_tensor /= 255.                                      # imshow expects values in the range [0, 1]

    if show:
        plt.imshow(img_tensor[0]/255)                           
        plt.axis('off')
        plt.show()

    return img_tensor

# Load two sample images
oryx = load_image("data/valid/oryx/106.jpg", show=True)
alpaca = load_image("data/valid/alpaca/alpaca102.jpg", show=True)
model.predict(alpaka)
model.predict(oryx)
prediction

As you can see in the output, our model successfully labeled the alpaca as an alpaca since the value was less than 0.5 and the oryx as an oryx, since the value was > 0.5. Hooray!

Conclusion or What’s next?

I hope that the blog post was useful to you, and showed you that you don’t really need much in order to get started with deep learning for image recognition. I know that our example zoo pokedex is really small at this point, but I don’t see a reason (apart from the lack of time and resources) why it should be a problem to scale out from our 2 animals to 20 or 200.

On the technical side, now that we have a model running that’s kind of useful, it would be great to find out how to use it in on a smartphone e.g. the IPhone, to finally have a pokedex that we can really try out in the wild. I will cover that bit in the third part of the series, showing you how to export existing models to Apple mobile phones making use of the CoreML technology. As always I am looking forward to your comments and corrections and point you to the ipython notebook that you can download here.

]]>
Deploy your Nuxt.js app to platform.sh https://www.liip.ch/de/blog/deploy-your-nuxt-js-app-to-platform-sh https://www.liip.ch/de/blog/deploy-your-nuxt-js-app-to-platform-sh Mon, 06 Aug 2018 00:00:00 +0200 Nuxt.js is Vue's version of Next.js for React. It's a "framework for creating Universal Vue.js Applications." Getting started with Nuxt.js is rather straight forward, the guides help a lot. platform.sh is a cloud hosting provider we use a lot at Liip. Configuring platform.sh to serve any kind of app is also pretty straight forward, as there's a lot of guides for all kinds of apps.

I started building a microsite. Nothing too fancy. As I was familiar with Vue, I wanted to give Nuxt.js a try and created this app as a single page application. So I created a skeleton of the app, included a header, a bit of navigation and an image or two and was ready to deploy the first version of it somewhere, so stakeholders could actually have a look at it. I've already used platform.sh for various other apps before, so I figured it would be fit for a Vue SPA.

Since this was my first Nuxt.js app, I tried to figure out how to deploy it to platform.sh, but didn't find any resources. I decided to share the steps and config needed in order to deploy it.

Vue rendered

Nuxt's documentation is pretty straight forward when it comes to deployment. There's essentially three commands that need to be run in order to get a fresh copy of a Nuxt app running:

npm install
npm run build
npm start

And these commands are exactly what is needed in order to deploy the app. Most important: There's no need for any special Nuxt config. The out-of-the-box config should be enough.

To configure an app for platform.sh, there's three files needed:

  • ./.platform/routes.yaml - Available routes for the app
  • ./.platform/services.yaml - Attached services, such as databases, search platforms, etc.
  • ./.platform.app.yaml - The main configuration file

First of all, the app must be configured. I'll call the app node, use a node:8.11 as its type and give it 128M disk space:

# .platform.app.yaml

name: node
type: nodejs:8.11
disk: 128

Now the build process needs to be added. This is done by adding a build hook:

# .platform.app.yaml

name: node
type: nodejs:8.11
disk: 128

# Build hook
hooks:
  build: |
    npm install
    npm run build

Afterwards, platform.sh needs to know how to start the app and what kind of locations it needs to serve. The finished .platform.app.yaml now looks like the following:

# .platform.app.yaml

name: node
type: nodejs:8.11
disk: 128

# Build hook
hooks:
  build: |
    npm install
    npm run build

# Web config
web:
  commands:
    start: npm start
  locations:
    '/':
      passthru: true

In the file .platform/routes.yaml, we also need to add a default route that passes everything it receives straight to the Nuxt process:

# .platform/routes.yaml

"https://{default}/":
    type: upstream
    upstream: "node:http"

The file .platform/services.yaml can be left empty.

That's it. Now we can go on to deploy the Nuxt app to platform.sh:

git remote add platform [...]
git push -u platform

Static pages

Static pages function a bit differently. They are generated by Nuxt during the build process and are served as static files by platform.sh. A starting point for such a configuration can be found in the platform.sh documentation.

A little bit of adjustment is needed, though.

The same starting config for name, type and disk size can be used:

# .platform.app.yaml

name: node
type: nodejs:8.11
disk: 128

Now, instead of running npm run build in the build hook, we let Nuxt generate static files via npm run generate:

# .platform.app.yaml

name: node
type: nodejs:8.11
disk: 128

hooks:
  build: |
    npm install
    npm run generate

... and let platform.sh serve everything in the dist/ folder as a static page:

# .platform.app.yaml

name: node
type: nodejs:8.11
disk: 128

hooks:
  build: |
    npm install
    npm run generate

web:
  commands:
    start: sleep infinity
  locations:
    '/':
      root: dist
      index:
         - index.html
      rules:
          \.(css|js|gif|jpe?g|png|ttf|eot|woff2?|otf|html|ico|svg?)$:
              allow: true
          ^/robots\.txt$:
              allow: true

Note the start command: by letting the whole app sleep for an infinite amount of time, we're not actually executing anything. Only the static files are served.

The files .platform/routes.yaml can stay the same:

# .platform/routes.yaml

"https://{default}/":
    type: upstream
    upstream: "node:http"

Also, the file .platform/services.yaml stays empty. The actual deployment then happens the same way:

git remote add platform [...]
git push -u platform

Takeaway thoughts

Platform.sh and Nuxt.js really play well together. Combining the two was less complex than I originally thought and the sheer speed of both make development, for me, loads of fun. Almost as much fun as sharing knowledge and learning new things.

]]>
Words and the design process, InVision Design Talk, our learnings and readings https://www.liip.ch/de/blog/words-and-design-invision-designtalk-our-learnings https://www.liip.ch/de/blog/words-and-design-invision-designtalk-our-learnings Tue, 24 Jul 2018 00:00:00 +0200 Thanks to Invision, we get a week of free great talks!

Invision proposed four talks about UX copy, about process, inclusive UX copy, how UX writers and designers can better collaborate, and how to improve forms with great copy.

Biz Sanford held the first talk. She manages Shopify’s voice and tone, sets content standards, and teaches her workmates how to write their own content.
Her talk was about consistent interface content. It is a core element of a well-designed user experience. She explained how to incorporate content throughout the design process.
Her main advice was to be specific. Any Lorem ipsum text or scribble should be banished from sketching to wireframing.

My key learnings about Biz inspiring talk

Step 1: Sketching

When sketching, we don’t find the exact words that we will use in the final copy. Our content will go through feedback rounds before it is final. However, we should already chose appropriate words. We should be especially careful with the following components that are key to the functionality:

  • headings for pages and sections,
  • key verbs and nouns,
  • buttons and link text.
    Words are essential to the user experience, they lead the user to do what they want. Sketching with words will help you have a better overview of what needs to be designed. It is useful to work with a diverse team to gather different words: don’t stick to you own jargon.

Step 2: Wireframing

Her idea to break down in a table like Googlesheets or Excel all the content elements is most interesting! It makes sure we don’t forget a piece of the scenario. I find it especially useful for multilingual websites. The table provides an overview of the naming of all elements, their translation, and any design needed (like an icon). It can be shared with all team members. It will ensure a consistent naming of the elements in all languages.

Step 3: Content in low fidelity mockups

At this stage, we use realistic content and real data. To show that it is not a finalised version of the content, we can use a funny font (like comic sans). It helps every team member understand how it works without too many explanations. We want to gather robust feedbacks on flow and functionality rather than wording or typo.
My favourite advice she gave was to use the wording of your users. You can talk to the team members who have direct contact with your customers for example. I find that a team often uses specific jargons, maybe legal or technical and often does not realise that their users speak a different language.

Step 4: Content in high fidelity mockups

At Shopify, the project team organises polish reviews. The team sits together and ‘plays’ with the product to check that everything is right.
It is definitely something I would like to start here at Liip! It’s useful to have team members who did not write the copy doing some re-readings. When we are too focused on our work, sometimes mistakes slip our notice and external advice is the key to go back on track.

Questions & Answers with Biz

It was a live talk. Wand we could ask questions and comment. Here is a few themes that came up.
Content is often dealt with at the end of a project. How to be a strong advocate for content?
Designers are content’s best ally. Designers can encourage to use real content and use it early. People will follow. The usefulness of content is self explanatory when it’s already incorporated in the design.
Consistency is essential.
Users don’t care that one team did this and another team did that. Consistency builds credibility and trust. A good option is to create a glossary where all team members can contribute. At the best, the glossary is shared with the marketing team too.
Does the layout fits the content or does the content fits the layout?
Neither! Both of them shape each other, this is why content should be involved as early as possible. Designers and content strategists should collaborate all along the way.

Further readings on content at Shopify and usefool tools

If you missed Biz’s talk and my little summary woke your curiosity up, read her blog post Words and the design process - Greetings from a friendly content strategist. She explains everything with details, sketches and images. The content of this blogpost is very similar to the content of her talk.
The content team at Shopify shares many learnings. Read their blog to learn more on content and design.
For example I recommend Product content at each stage of a project - How content strategists help teams build better products. The blogpost helps you pinpoint where and when you can add-value in a project.

Julien, who participated to the meet-up wrote a great blog post (in French) to share his personal learnings La place de la Copy dans le processus de création.

Useful tools to test the readability

To test the readability of our English content, we can use the Hemingway App.
To test the readability of our German content, we can use the website Psychometrica.
Do you have such a tool for French?

invision-talk-setting

Cosy setting to enjoy the talk =)

Watching the talk together and sharing learnings

Since it was a remote talk, we created a cosy atmosphere at Liip to have a chance to discuss and share our learnings. I very much enjoyed our evening! Thanks to all participants and my team mates who make fun and projects possible everyday !

]]>
LiipRokkaImagineBundle 1.0.0 Release https://www.liip.ch/de/blog/liiprokkaimaginebundle-1-0-0-release https://www.liip.ch/de/blog/liiprokkaimaginebundle-1-0-0-release Mon, 23 Jul 2018 00:00:00 +0200 Previously, such a switch could cost you lot's of efforts, but with LiipRokkaImagineBundle we reduced it to minimum. It allows you to configure LiipImagineBundle in the way, that it uses rokka.io service as a storage. You just have to create a rokka account and adjust a few settings in your application config. Check the installation details in our documentation and try out how easy it is.
We've already tested this extension against different open source platforms. For example, the installation on Sylius takes only 3-5 minutes.

If you like LiipRokkaImagineBundle extension, the github star will be very much appreciated. Please also do not hesitate to create tickets or pull requests, if you notice any issues or would like to make an improvement to the new extension.

]]>
Vereinfachtes Raiffeisen-Login https://www.liip.ch/de/blog/raiffeisen-login https://www.liip.ch/de/blog/raiffeisen-login Fri, 20 Jul 2018 00:00:00 +0200 Vereinfachte Benutzerführung durch zentrales Login
Die Anmeldung für Raiffeisen-Kundinnen und -Kunden ist seit April 2018 ganz einfach: Die Anmeldung auf www.raiffeisen.ch beinhaltet das Login zum Raiffeisen-Newsletter, MemberPlus-Portal und der Piazza-App. Die Strategie war ein All in one Login.

Mit 5 Teilprojekten zur neuen Lösung
Innerhalb von 10 Monaten implementierte das Liip-Team in engster Zusammenarbeit mit Raiffeisen das neue zentrale Login als Single Sign On.
In den Teilprojekten lösten wir alle Abhängigkeiten auf: Die Newsletter-Administration und -Registrierung sind nun überarbeitet, das Kundenservicecenter hat ein neues Support-Tool und die Anbindung an MemberPlus ist neu und in der Piazza-App integriert.

Technologisch up to date
Im 13-monatigen Projekt - welches die Konzeption wie auch Umsetzung beinhaltete - brachten wir die Login-Lösung technologisch auf den neusten Stand: Die Front-end-Umsetzung basiert auf Angular 5 und PHP-­Technologie. Das Upgrade auf Symfony 3.4 wurde genutzt und das Deployment über Open Shift durchgeführt.
Trotz der technischen Komplexität war der Go Live Ende April sehr erfolgreich und es wurde kein Fehler gefunden.

Engste Zusammenarbeit in einem Team
Liips Fokus war das Front-end , während sich Raiffeisen aufs Back-end konzentrierte.
Die umfangreichen Arbeiten wurden In einem gemeinsamen Scrum-Team angegangen: Raiffeisen übernahm die Product Owner-Rolle und Liip den ScrumMaster, die Entwickler kamen aus beiden Unternehmen.
Der Newsletter als ein kleineres Teilprojekte wurde im Support-Modus mit Kanban umgesetzt, wiederum mit Raiffeisen als Story Owner.

]]>
Poke-Zoo - How to use deep learning image recognition to tell oryxes apart from llamas in a zoo https://www.liip.ch/de/blog/poke-zoo-or-making-deep-learning-tell-oryxes-apart-from-lamas-in-a-zoo-part-1-the-idea-and-concepts https://www.liip.ch/de/blog/poke-zoo-or-making-deep-learning-tell-oryxes-apart-from-lamas-in-a-zoo-part-1-the-idea-and-concepts Wed, 18 Jul 2018 00:00:00 +0200 We’ve all witnessed the hype in 2016 when people started hunting pokemons in “real-life” with the app Pokémon GO . It was one of the apps with the fastest rise in user-base and for a while with a higher addiction rate than crack - correction: I mean candycrush. Comparing it to technologies like telephone or email, it only took it 19 days to reach 50 mio users, vs. 75 years for the telephone.

Connecting the real with the digital world

You might be wondering, why I am reminiscing about old apps, we have certainly all moved on since the Pokemon GO hype in 2016 and are doing other serious things now. True, but I think though that the idea of “collecting” virtual things that are bound to real-life locations was a great idea and that we want to build more of it in the future. That’s why Pokemon is the starting point fort this blogpost. In fact If you are young enough to have watched the pokemon series, you are probably familiar with the idea of the pokedex.

pokedex

The idea

The pokedex was a small device that Ash (the main character) could use to lookup information about certain pokemons in the animated series. He used it now and then to lookup some facts about them. While we have seen how popular the pokemon GO was, by connecting the real with the digital world, why not take the idea of the pokedex and apply it in real world scenarios, or:

What if we had such an app to distinguish not pokemons but animals in the zoo?

The Zoo-Pokedex

Imagine a scenario where kids have an app their parent’s mobile phones - the zoo-pokedex. They start it up when entering a zoo and they then go exploring. When they are at a cage they point the phones camera at the cage and try to film the animal with it. The app recognizes which animal they are seeing and gives them additional information on it as a reward.

Instead of perceiving the zoo as a educational place where you have to go from cage to cage and observe the animal, absorb the info material you could send them out there and let them “capture” all the animals with their Zoo-Pokedex.

pokedexzoo

Let’s have a look at the classic ways of learning about animals in a zoo:

Reading a plaque in the zoo feels boring and dated
Reading an info booklet you got at the cashier feels even worse
Bringing your own book about animals might be fun, when comparing the pictures of animals in the book with the real ones, but there is no additional information
Having a QR code at the cage that you need to scan, will never feel exciting or fun
Having a list of animals in my app that I can tap on to get more info could be fun, but more for parents in order to appear smart before their kids giving them facts about the animal

Now imagine the zoo-pokedex, you really need to go exploring the zoo in order to get information. In cases where the animals area is big and it can retreat you need to wait in front of it to take a picture of it. That takes endurance and perseverance. It might even be the case that you don’t get to see it and have to come back. When the animal appears in front of you, you’ll need to be quick - maybe even an element of surprise, excitement is there - you need to get that one picture of the animal in order to check of the challenge. Speaking of challenge, why not make it a challenge to have seen every animal in the zoo? That would definitely mean you need to come back multiple times, take your time and go home having ticked off 4-5 animals in your visit. This experience encourages you to come back and try again next time. And each time you learn something, you go home with a sense of accomplishment.

That would definitely be quite interesting, but how could such a device work? Well we would definitively use the phone’s camera and we could train a deep learning network to recognize the animals that are present in the zoo.

So imagine a kid walking up to an area and then trying to spot the animal in order to point his mobile phone to it and then magically a green check-mark appears next to it. We could display some additional info material like where they are originally from, what they eat, when they sleep etc.., but definitely those infos would feel much more entertaining than just reading them off a boring info plaque.

How train the Pokedex to distinguish new animals

Well nice idea you say, but how am I going to make that magical device that will recognize animals, especially the “weird” ones e.g. the oryx in the title :) . The answer is …. of course …. deep learning.

In recent years you have probably noticed the rise of deep learning in different areas of machine learning and noticed their practical applications in your everyday life. In fact I have covered a couple of these practical applications such as state of the art sentiment detection or survival rates for structured data or automatic speech recognition and text to speech applications in our blog.

Deep learning image categorization task

The area we need for our little zoo-pokedex is image categorization. Image categorization tasks have advanced tremendously in the last years, due to deep learning outperforming all other machine learning approaches (see below). One good indicator of this movement is the yearly imagenet competition, which is about letting machine learning algorithms compete about the best way of finding out what can be seen on an image. The task is simple: there are 1000 categories of everyday objects such as cats, elephants, tea-cattles and millions of images that need to be mapped to one of these categories. The algorithm that makes the lowest error wins. Below is an example of the output on the sample images. You’ll notice that the algorithm displays the label of which it thinks the image belongs to.

imagenet

Now this ILSVRC competition has been going on for a couple of years now and while the improvements that have been made have been astonishing each year, in the last 5 years especially in 2012 and 2013 deep learning appeared with a big bang on the horizon. As you can see on the image below the amount of state of the art solutions exploded and outperformed all other solutions in this area. It even goes so far that the ability of the algorithm to tell the contents apart is better than this of a competing human group. This super-human ability of deep learning networks in these areas is what the hype is all about.

solutions

How does it work?

In this blog post I don’t want to be technical but just show you how two easy concepts of convolution (kernels) and pooling are applied in a smart way to really achieve outstanding results in image recognition tasks with deep learning. I don’t want to go into details how deep learning works in the way of how it learns in the form of updating of weights, backpropagation but abstract all of this stuff away from you. In fact if you have 20 minutes and are a visual learner I definitely recommend that video below that does an extremely good job at explaining the concepts behind it:

Instead I will quickly cover the two basic tricks that are used to make things really work.

We’ll start by looking at a common representation of a deep learning network above and you’ll notice that two words appear a lot there, namely convolution and pooling. While it seems obvious that the image data has to travel through these layers from left to right, it would be cool if we only knew what these layers do.

Convolutions and Kernels

If you are not a native speaker you’ve probably have never heard of the word convolution before and might be quite puzzled when you hear it. For me it also sounded like some magic procedure that apparently does something very complicated and apparently makes the deep learning work :).

After getting into the field I realized that it's basically its an image transformation that is almost 20 years old (e.g. Computer Vision. From Prentice Hall book by Shapiro) and present in your everyday image editing software. Things like sharpening an image or blurring it, or finding edges are basically a convolution. It's a process of applying a small e.g. 3x3 Matrix over each pixel of your image and multiply this value with the neighbouring pixels and then collect the results of that manipulation in a new image.

To make this concept more understandable I stole some examples of how a 3x3 matrix, also called a kernel, transforms an image after being applied to every pixel in your image.

In the image below the kernel gives you the top-edges in your image. The numbers in the grey boxes represent the gray image values (from 0 black to 255 white) and the little numbers after the X represent how these numbers are multiplied when added together. If you change these numbers you get another transformation.

top-edge

Here is another set of numbers in the 3x3 matrix that will blur your image.

blur

Now normally the way of create such “filters” is to hand-tune these numbers by hand to achieve the desired results. With some logical thinking you can easily come up with filters that sharpen or blur an image and then apply those to the image. But how are these applied in the context of deep learning?

With deep learning we do things the other way round, we teach the neural network to find filters that are somewhat useful in regards to the final result. So for example to tell a zebra apart from an elephant it would really be useful if we had a filter that detects diagonal edges. And if the image has diagonal edges e.g. the stripes of the zebra, it's probably not an elephant. So we train the network on our training images of zebras and elephants and let it learn these filters or kernels on its own. If the emerging kernels are helpful with the task they have a tendency to stay, if not, they keep on updating themselves until they become useful.

So one layer that applies such filters or kernels or convolutions is called a convolutional layer. And now comes another cool property. If you keep on stacking such layers on top of each other, each of these layers will find own filters that are helpful. And on top of that each of these filters will become more and more complicated and be able to detect more detailed features.

layer

In the image above (which is from a seminal paper, you see gray boxes and images. A great way to show these filters is to show the activations or convolutions which are these gray boxes. The images are samples that “trigger” these filters the most. Or said the other way round, these are images that these filters detect well.

So for example in the first layer you’ll notice that the network detects mostly vertical, horizontal and diagonal edges. In the second layer its already a bit “smarter” and is able to detect round things, e.g. eyes or corners of frames etc.. In the third layer its already a bit smarter and is able to detect not only round things but things that look like car tires for example. This layering often goes on and on for many layers. Some networks have over 200 of these layers. That's why they are called deep. Now you know. So usually adding more and more of these layers makes the network better at detecting things but also it makes it slower and sometimes less able to generalize for things it had not seen yet.

Pooling

The second word that you might see a lot in those architecture above is the word pooling. Here the trick is really simple: You look at a couple of pixels next to each other e.g. 2x2 and simply take the biggest value - also called max-pooling. In the image below this trick has been applied for each colored 2x2 area and the output is a much smaller image. Now why are we doing this?

The answer is simple, in order to be size invariant. We try to scale the image down and up multiple times in order to be able to detect a zebra that is really close to the camera vs. one that might only be viewable in the far distance.

pooling

Putting things together

After the small excursion into the two main principles of inner workings of state of the art deep learning networks we have to ask the question of how we are going to use these tricks to detect our animals in the zoo.

While a few years ago you would have had to write a lot of code and hire a whole machine team to do this task, today you can already stand on the shoulders of giants. Thanks to the Imagenet competitions (and I guess thanks to Google, Microsoft and other research teams constantly outputting new research) we can use some of these pretrained networks to do our job for us. What does this mean?

The networks that are often used in these competitions can be obtained freely (In fact they even come https://github.com/pytorch/pytorch pre-bundled to the deep-learning frameworks) and you can use networks these without any tuning in order to be able to categorize your image into the 1000 categories that are used in the competition. As you can see in the image below the bigger in terms of layers the network the better it performs, but also the slower it is and the more data it needs to be trained.

comparison

Outlook - Part 2 How to train state of the art image recognition networks to categorize new material

The cool thing now is that in the next blog post we will use these pretrained networks and teach them new tricks. In our case teach them to tell apart a llama from an oryx, for our zoo pokedex. So basically train these network to recognize things these networks have never been trained to do. So obviously we will need training data and we have to find a way to somehow teach them new stuff without “destroying” their properties of being really good at detecting common things.

Finally after that blog post I hope to leave you with at least one the takeaway of demystifying deep learning networks in the image recognition domain. So hopefully whenever you see these weird architecture drawings of image recognition deep learning networks and you see those steps saying “convolution” and “pooling” you’ll hopefully know that this magic sauce is not that magic after all. It’s just a very smart way of applying those very old techniques to achieve outstanding results.

]]>
It's never "just a WebView" https://www.liip.ch/de/blog/its-never-just-a-webview https://www.liip.ch/de/blog/its-never-just-a-webview Wed, 11 Jul 2018 00:00:00 +0200 Thomas already talked about how the web and the app ecosystems are different. They don't have the same goals, and should aim for a different user experience. I will focus on the technical side on implementing a website into a native app using WebView on Android and WKWebView on iOS here.

Websites have extra UI that you don't want in your app

Websites always have extra content which is not needed, when wrapping them in an app. They have a header title, a navigation menu, and a footer with extra links. Depending on how much "native" you want your app to appear, you will show a native navigation bar, a custom navigation flow and certainly not a footer under each screen.

If you are lucky, you can ask the website developer to create a special template for the app to remove those extra features and only show the main content. Otherwise you'll have to inject javascript to hide this content.

If the website contains a special "app" template, make sure you always use it

We built an app where the website had a special template. Each page could be loaded with the extra parameter mobile=true like http://www.liip.ch/?mobile=true. This worked great, but each link contained to the page did not have this extra parameter. If we’d simply allowed link clicks without filtering, the user would see the non-app pages. We had to catch every link that the user clicked, and append the extra parameter "manually". This is quite easy for GET parameters, but it can get quite tricky when POSTing a form.

Making sure users cannot go anywhere

By default a WebView will follow every link. This means that as long as the web page shows a link, the user will be able to click on. If your page links to Google or Wikipedia, the user can go anywhere from within the app. That can be confusing.

It is easier to block every link, and specifically allow those that we know, in a "whitelist" fashion. This is particularly important because the webpage can change without the app developer being notified. New links can appear on the application and capsize the whole navigation.

WebViews take a lot of memory

WebViews use a lot of RAM compared to a native view. When a WebView is hidden — when we show another screen or put the app in background for example — it is very likely that the system will kill the Android Activity that contains the WebView or the whole iOS application.

As it takes a lot of memory, it matches killing criterias to making space for other apps in the system.

When restoring the Activity or application which contains the WebView, the view has lost its context. This means, if the user entered content in a form, everything is gone. Furthermore, if the user navigated within the website, the WebView can’t remember on which page the user was.

You can mitigate some of the inconvenience by handling state restoration on Android and iOS. Going as far as remembering the state inside a WebView will cause a lot of distress for the developer :)

WebView content is a blackbox

Unless you inject JavaScript to know what is inside, you cannot know what is displayed to the user. You don't know which areas are clickable, you don't know (easily) if the view is scrollable, etc...

WebViews don't handle file downloads

On iOS, there is no real concept of file system, even with the new Files app. If your website offers PDF downloads for example, the WebView does simply nothing. An option is to catch the URLs that are PDFs and open Safari to view them.

On Android, you can use setDownloadListener to be notified when the WebView detects a file that should be downloaded instead of displayed. You have to handle the file download by using DownloadManager for example.

webview.setDownloadListener { url, _, contentDisposition, mimetype, _ ->
  val uri = Uri.parse(url)
  val request = DownloadManager.Request(uri)
  val filename = URLUtil.guessFileName(url, contentDisposition, mimetype)
  request.allowScanningByMediaScanner()
  request.setNotificationVisibility(DownloadManager.Request.VISIBILITY_VISIBLE_NOTIFY_COMPLETED)
  request.setDestinationInExternalPublicDir(Environment.DIRECTORY_DOWNLOADS, filename)
  val dm = context?.getSystemService(Context.DOWNLOAD_SERVICE) as DownloadManager
  dm.enqueue(request)
}

WebViews don't handle non-http protocols such as mailto:

You will have to catch any non-http protocol loads and define what the application should do.

The Android back button is not handled

If the user presses their back button, the standard action (leave activity, ...) will happen. It will not do a "previous page" action like the user is used to. You have to handle it yourself.

webview.setOnKeyListener { _, keyCode, event ->
 if (keyCode == KeyEvent.KEYCODE_BACK && event.action == MotionEvent.ACTION_UP && webview.canGoBack()) {
    webview.goBack()
    return@setOnKeyListener true
 }
  return@setOnKeyListener false
}

There is no browser UI to handle previous page, reload, loading spinner, etc...

If the website is more than one page, you need to have the possibility to go back and forth in the navigation history. This is important if you removed the website's header/footer/menu in particular. On Android, users can still use the back button (if you enabled it) but on iOS there is no way to do that.

You should offer the possibility to reload the page. Showing a loading indicator so that users know when the page is loaded completely, like in a native application or a standard browser helps.

This is no default way to display errors

When an HTTP error occurs, if you don't have a network connection for example, the WebView doesn’t handle it for you.

On iOS, the view will not change. If the first page fails to load, you will have a white screen.

On Android, you will have an ugly default error message like this one:

webview-error-cropped

WebViews don't know how to open target="_blank" links

WebViews don't know how to handle links that are supposed to open a new browser window. You have to handle it. You can decide to stop the "new window" opening and load the page on the same WebView for example. But by default nothing will happen for those links.

iOS and security

iOS is — rightfully — very conservative regarding security. Apple added App Transport Security in iOS 9 to prevent loading unsecure content. If your website does not use a recent SSL certificate, you will have to disable ATS, which never is a good sign.

It is hard to make code generic

Since every page can be different, having different needs, it is hard to make code generic. For a client, where we show two sub-pages of their website on different screens, we have two webview handlers because of the various needs.

Once you have thought through all these things and believe you can start coding your webview, you will discover that iOS and Android handle them differently.

There is a strong dependency on the website

Once you faced all challenges and your app is ready to ship, there is one last thing that you cannot control: You are displaying a website that you did not code, and that is most likely not managed by someone in your company.

If the website changes, there is a high chance that the webmaster doesn’t think of telling you. Mainly because he doesn’t think one change on the website is enough to mess up your app completely.

Conclusion

WebViews can be used for wrapping upon existing websites fast, but it is never "just" a WebView. There is some work to do to deliver the high quality that we promise at Liip. My advices for you:

  • Use Defensive programming.
  • List all features of each page with the client, and define the steps clearly, like for any native app.
  • Get in touch with the website developer and collaborate. Make sure they tell you when the website changes.
]]>
How to change things in your company https://www.liip.ch/de/blog/how-to-change-things-in-your-company https://www.liip.ch/de/blog/how-to-change-things-in-your-company Wed, 11 Jul 2018 00:00:00 +0200 Do you sometimes want to change things in your company? Do you face processes that are slow or error-prone. Do you see things that are missing or should be done in a new way?
I moved from being a UX Designer to take care of changes like these within Liip. In the beginning it often took me forever to have an impact. But over time my process became more stable and reliable. Today I’m able to deliver the first results within a month or two, and this is how I do it.

Form a team with diverse skills that can implement solutions

Like everything else, change means a lot of work. You have to write, develop, design, communicate and organise lots of tasks. I lost time often because of staffing my projects with volunteers who cared for the topic, but then didn’t have the skills needed to implement the ideas. Today I form and book a core team with diverse skills (a developer, a designer, an expert of the matter) before I start with the project.

Formulate a goal and face risks to know where you are going

Finding a common vision was often time-consuming and talking about risks sometimes drained energy and enthusiasm. After I read the excellent book “Sprint” by James Knapp, it went down much faster and easier. Many of the following steps are inspired from this book.
On the first day, we (the core team) ask ourselves “Why are we doing this? Where do we want to be in five years?” and formulate a long term goal. We answer “How could we fail?” and formulate the emerging fears and risks as challenges.
This usually takes about one hour and the team is still motivated and energised afterwards.

Make a map to understand the scope of your challenge

Change feels often epic and overwhelming to me, especially in the beginning. Where should I start in this huge, complex topic?
So we started drawing a map of the process we are about to change or improve. That’s sometimes quite difficult to do, but in the end we’re getting an overview of our challenge, who is involved and which steps the different roles need to take.

goal-map

Ask customers and experts to unlock their knowledge

We show the map to affected employees, clients and experts. By interviewing them about the challenge at hand, we unlock their knowledge and experience. This leads to a complete and more differentiated picture of the challenge. Everything is captured on stickies and clustered into topics at the end of the morning.

Pick a specific target to focus your energies on

We vote for the most important insights of the interviews, most relevant challenges and most critical steps in the process. In the end we decide on what to focus and put our energy on. Out of our decisions, we synthesise a design challenge, a problem statement that we want to solve. Out of an open and complex topic, we pick one specific target with a lot of potential.

pick-target

Turn ideas into detailed sketches to choose the best ones

In my first projects I let people create ideas on a whole process or topic and write ideas on stickies with only a sentence or two. The results were broad and often not thought through. It was hard to compare them and more than once we failed to implement the winner ideas.
Today I let the team members sketch out their ideas more carefully and draw them out in solo work with the 4 Step Sketch technique (also from the Sprint book). We have fewer solutions, but they are more concrete and it’s easier to evaluate them and to implement the winner ideas.

ideation

Use a service blueprint to stitch the ideas into a coherent solution

Often the winner solutions don’t fit together neatly and have a different level of granularity. To come up with a coherent solution and to not forget anything, we use a service blueprint. We define how our solutions will be delivered, what material, actions and infrastructure are needed in every step of the process, all mapped on a huge sheet with different lanes.
Out of this blueprint we write user stories of what needs to be done and prioritise them.

blueprint

Block time for the whole team to implement solutions fast

I usually work on changes with volunteers who do this next to their daily job. It’s hard for them to make time for an internal project and finish their tasks on time. We started to block 4-5 half days in advance, where the whole team is together in a room, but everyone works individually on their own tasks. Like this we can implement and deliver solutions much faster and work on a predictable timeline.

Deliver a first result within a month

When I started my work as an internal Service Designer and Change Agent it took me forever to tackle problems in my company. With this process, I got a lot faster and I can deliver a result in a predictable timeframe.
If a small team is willing to invest 3-5 days into changing something, we can understand the problem, define the most relevant step and deliver a solution within one month. We don’t produce groundbreaking innovations in this time, but at least a first step.
More often than not, the change process continues once it is started and the team keeps producing more solutions over time.

Steal it, if you like

If you can use any part of this process or the whole thing, please do. Let me know if you have questions, need more explanations or tell me how your own change went at zahida.huber@liip.ch

]]>