<?xml version="1.0" encoding="utf-8"?>
<!-- generator="Kirby" -->
<rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom">

  <channel>
    <title>Mot-cl&#233;: machine learning &#183; Blog &#183; Liip</title>
    <link>https://www.liip.ch/fr/blog/tags/machine+learning</link>
    <generator>Kirby</generator>
    <lastBuildDate>Tue, 23 Oct 2018 00:00:00 +0200</lastBuildDate>
    <atom:link href="https://www.liip.ch" rel="self" type="application/rss+xml" />

        <description>Articles du blog Liip avec le mot-cl&#233; &#8220;machine learning&#8221;</description>
    
        <language>fr</language>
    
        <item>
      <title>Real time numbers recognition (MNIST) on an iPhone with CoreML from A to Z</title>
      <link>https://www.liip.ch/fr/blog/numbers-recognition-mnist-on-an-iphone-with-coreml-from-a-to-z</link>
      <guid>https://www.liip.ch/fr/blog/numbers-recognition-mnist-on-an-iphone-with-coreml-from-a-to-z</guid>
      <pubDate>Tue, 23 Oct 2018 00:00:00 +0200</pubDate>
      <description><![CDATA[<h1>Creating a CoreML model from A-Z in less than 10 Steps</h1>
<p>This is the third part of our deep learning on mobile phones series. In part one I have shown you <a href="https://www.liip.ch/en/blog/poke-zoo-or-making-deep-learning-tell-oryxes-apart-from-lamas-in-a-zoo-part-1-the-idea-and-concepts">the two main tricks on how to use convolutions and pooling to train deep learning networks</a>. In part two I have shown you <a href="https://www.liip.ch/en/blog/zoo-pokedex-part-2-hands-on-with-keras-and-resnet50">how to train existing deep learning networks like resnet50 to detect new objects</a>. In part three I will now show you how to train a deep learning network, how to convert it in the CoreML format and then deploy it on your mobile phone! </p>
<p>TLDR: I will show you how to create your own iPhone app from A-Z that recognizes handwritten numbers: </p>
<figure><img src="https://liip.rokka.io/www_inarticle/812493/output.gif" alt=""></figure>
<p>Let’s get started!</p>
<h2>1. How to start</h2>
<p>To have a fully working example I thought we’d start with a toy dataset like the <a href="https://en.wikipedia.org/wiki/MNIST_database">MNIST set of handwritten letters</a> and train a deep learning network to recognize those. Once it’s working nicely on our PC, we will port it to an iPhone X using the <a href="https://developer.apple.com/documentation/coreml">CoreML standard</a>. </p>
<h2>2. Getting the data</h2>
<pre><code class="language-python"># Importing the dataset with Keras and transforming it
from keras.datasets import mnist
from keras import backend as K

def mnist_data():
    # input image dimensions
    img_rows, img_cols = 28, 28
    (X_train, Y_train), (X_test, Y_test) = mnist.load_data()

    if K.image_data_format() == 'channels_first':
        X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)
        X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)
        input_shape = (1, img_rows, img_cols)
    else:
        X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)
        X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
        input_shape = (img_rows, img_cols, 1)

    # rescale [0,255] --&gt; [0,1]
    X_train = X_train.astype('float32')/255
    X_test = X_test.astype('float32')/255

    # transform to one hot encoding
    Y_train = np_utils.to_categorical(Y_train, 10)
    Y_test = np_utils.to_categorical(Y_test, 10)

    return (X_train, Y_train), (X_test, Y_test)

(X_train, Y_train), (X_test, Y_test) = mnist_data()</code></pre>
<h2>3. Encoding it correctly</h2>
<p>When working with image data we have to distinguish how we want to encode it. Since Keras is a high level-library that can work on multiple “backends” such as <a href="https://www.tensorflow.org">Tensorflow</a>, <a href="http://deeplearning.net/software/theano/">Theano</a>  or <a href="https://www.microsoft.com/en-us/cognitive-toolkit/">CNTK</a>, we have to first find out how our backend encodes the data. It can either be encoded in a “channels first” or in a “channels last” way which is the default in Tensorflow in the <a href="https://keras.io/backend/">default Keras backend</a>. So in our case, when we use Tensorflow it would be a tensor of (batch_size, rows, cols, channels). So we first input the batch_size, then the 28 rows of the image, then the 28 columns of the image and then a 1 for the number of channels since we have image data that is grey-scale.  </p>
<p>We can take a look at the first 5 images that we have loaded with the following snippet:</p>
<pre><code class="language-python"># plot first six training images
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib.cm as cm
import numpy as np

(X_train, y_train), (X_test, y_test) = mnist.load_data()

fig = plt.figure(figsize=(20,20))
for i in range(6):
    ax = fig.add_subplot(1, 6, i+1, xticks=[], yticks=[])
    ax.imshow(X_train[i], cmap='gray')
    ax.set_title(str(y_train[i]))</code></pre>
<figure><img src="https://liip.rokka.io/www_inarticle/7cce04/numbers.png" alt=""></figure>
<h2>4. Normalizing the data</h2>
<p>We see that there are white numbers on a black background, each thickly written just in the middle and they are quite low resolution - in our case 28 pixels x 28 pixels. </p>
<p>You have noticed that above we are rescaling each of the image pixels, by dividing them by 255. This results in pixel values between 0 and 1 which is quite useful for any kind of training. So each of the images pixel values look like this before the transformation:</p>
<pre><code class="language-python"># visualize one number with pixel values
def visualize_input(img, ax):
    ax.imshow(img, cmap='gray')
    width, height = img.shape
    thresh = img.max()/2.5
    for x in range(width):
        for y in range(height):
            ax.annotate(str(round(img[x][y],2)), xy=(y,x),
                        horizontalalignment='center',
                        verticalalignment='center',
                        color='white' if img[x][y]&lt;thresh else 'black')

fig = plt.figure(figsize = (12,12)) 
ax = fig.add_subplot(111)
visualize_input(X_train[0], ax)</code></pre>
<figure><img src="https://liip.rokka.io/www_inarticle/6d0772/detail.png" alt=""></figure>
<p>As you noticed each of the grey pixels has a value between 0 and 255 where 255 is white and 0 is black. Notice that here <code>mnist.load_data()</code> loads the original data into X_train[0]. When we write our custom mnist_data() function we transform every pixel intensity into a value of 0-1 by calling  <code>X_train = X_train.astype('float32')/255 </code>. </p>
<h2>5. One hot encoding</h2>
<p>Originally the data is encoded in such a way that the Y-Vector contains the number value that the X Vector (Pixel Data) contains. So for example if it looks like a 7, the Y-Vector contains the number 7 in there. We need to do this transformation, because we want to map our output to 10 output neurons in our network that fire when the according number is recognized. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/46a2ef/onehot.png" alt=""></figure>
<h2>6. Modeling the network</h2>
<p>Now it is time to define a convolutional network to distinguish those numbers. Using the <a href="https://www.liip.ch/en/blog/poke-zoo-or-making-deep-learning-tell-oryxes-apart-from-lamas-in-a-zoo-part-1-the-idea-and-concepts">convolution and pooling tricks from part one of this series</a> we can model a network that will be able to distinguish numbers from each other. </p>
<pre><code class="language-python"># defining the model
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
def network():
    model = Sequential()
    input_shape = (28, 28, 1)
    num_classes = 10

    model.add(Conv2D(filters=32, kernel_size=(3, 3), padding='same', activation='relu', input_shape=input_shape))
    model.add(MaxPooling2D(pool_size=2))
    model.add(Conv2D(filters=32, kernel_size=2, padding='same', activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Conv2D(filters=32, kernel_size=2, padding='same', activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.3))
    model.add(Flatten())
    model.add(Dense(500, activation='relu'))
    model.add(Dropout(0.4))
    model.add(Dense(num_classes, activation='softmax'))

    # summarize the model
    # model.summary()
    return model </code></pre>
<p>So what did we do there? Well we started with a <a href="https://keras.io/layers/convolutional/">convolution</a> with a kernel size of 3. This means the window is 3x3 pixels. The input shape is our 28x28 pixels.  We then followed this layer by a <a href="https://keras.io/layers/pooling/">max pooling layer</a>. Here the pool_size is two so we downscale everything by 2. So now our input to the next convolutional layer is 14 x 14. We then repeated this two more times ending up with an input to the final convolution layer of 3x3. We then use a <a href="https://keras.io/layers/core/#dropout">dropout layer</a> where we randomly set 30% of the input units to 0 to prevent overfitting in the training. Finally we then flatten the input layers (in our case 3x3x32 = 288) and connect them to the dense layer with 500 inputs. After this step we add another dropout layer and finally connect it to our dense layer with 10 nodes which corresponds to our number of classes (as in the number from 0 to 9). </p>
<h2>7. Training the model</h2>
<pre><code class="language-python">#Training the model
model.compile(loss='categorical_crossentropy', optimizer=keras.optimizers.Adadelta(), metrics=['accuracy'])

model.fit(X_train, Y_train, batch_size=512, epochs=6, verbose=1,validation_data=(X_test, Y_test))

score = model.evaluate(X_test, Y_test, verbose=0)

print('Test loss:', score[0])
print('Test accuracy:', score[1])</code></pre>
<p>We first compile the network by defining a loss function and an optimizer: in our case we select categorical_crossentropy, because we have multiple categories (as in the numbers 0-9). There are a number of optimizers that <a href="https://keras.io/optimizers/#usage-of-optimizers">Keras offers</a>, so feel free to try out a few, and stick with what works best for your case. I’ve found that AdaDelta (an advanced form of AdaGrad) works fine for me. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/42b4b8/train.png" alt=""></figure>
<p>So after training I’ve got a model that has an accuracy of 98%, which is quite excellent given the rather simple network infrastructure. In the screenshot you can also see that in each epoch the accuracy was increasing, so everything looks good to me. We now have a model that can quite well predict the numbers 0-9 from their 28x28 pixel representation. </p>
<h2>8. Saving the model</h2>
<p>Since we want to use the model on our iPhone we have to convert it to a format that our iPhone understands. There is actually an ongoing initiative from Microsoft, Facebook and Amazon (and others) to harmonize all of the different deep learning network formats to have an interchangable open neural networks exchange format that you can use on any device. Its called <a href="https://onnx.ai">ONNX</a>. </p>
<p>Yet, as of today Apple devices work only with the CoreML format though. In order to convert our Keras model to CoreML Apple luckily provides  a very handy helper library called <a href="https://apple.github.io/coremltools/generated/coremltools.converters.keras.convert.html">coremltools</a> that we can use to get the job done. It is able to convert scikit-learn models, Keras and XGBoost models to CoreML, thus covering quite a bit of the everyday applications.  Install it with “pip install coremltools” and then you will be able to use it easily. </p>
<pre><code class="language-python">coreml_model = coremltools.converters.keras.convert(model,
                                                    input_names="image",
                                                    image_input_names='image',
                                                    class_labels=['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
                                                    )</code></pre>
<p>The most important parameters are class_labels, they define how many classes the model is trying to predict, and input names or image_input_names. By setting them to <ode>image</code> XCode will automatically recognize that this model is about taking in an image and trying to predict something from it. Depending on your application it makes a lot of sense to study the <a href="https://apple.github.io/coremltools/generated/coremltools.converters.keras.convert.html">documentation</a>, especially when you want to make sure that it encodes the RGB channels in the same order (parameter is_bgr) or making sure that it assumes correctly that all inputs are values between 0 and 1 (parameter image_scale) . </p>
<p>The only thing left is to add some metadata to your model. With this you are helping all the developers greatly, since they don’t have to guess how your model is working and what it expects as input. </p>
<pre><code class="language-python">#entering metadata
coreml_model.author = 'plotti'
coreml_model.license = 'MIT'
coreml_model.short_description = 'MNIST handwriting recognition with a 3 layer network'
coreml_model.input_description['image'] = '28x28 grayscaled pixel values between 0-1'
coreml_model.save('SimpleMnist.mlmodel')

print(coreml_model)</code></pre>
<h2>9. Use it to predict something</h2>
<p>After saving the model to a CoreML model we can try if it works correctly on our machine. For this we can feed it with an image and try to see if it predicts the label correctly. You can use the MNIST training data or you can snap a picture with your phone and transfer it on your PC to see how well the model handles real-life data. </p>
<pre><code class="language-python">#Use the core-ml model to predict something
from PIL import Image  
import numpy as np
model =  coremltools.models.MLModel('SimpleMnist.mlmodel')
im = Image.fromarray((np.reshape(mnist_data()[0][0][12]*255, (28, 28))).astype(np.uint8),"L")
plt.imshow(im)
predictions = model.predict({'image': im})
print(predictions)</code></pre>
<p>It works hooray! Now it's time to include it in a project in XCode. </p>
<h1>Porting our model to XCode in 10 Steps</h1>
<p>Let me start by saying: I am by no means a XCode or Mobile developer. I have studied a <a href="https://github.com/markmansur/CoreML-Vision-demo">quite a few</a> <a href="https://sriraghu.com/2017/06/15/computer-vision-in-ios-object-recognition/">super</a> <a href="https://www.raywenderlich.com/577-core-ml-and-vision-machine-learning-in-ios-11-tutorial">helpful tutorials</a>, <a href="https://www.pyimagesearch.com/2018/04/23/running-keras-models-on-ios-with-coreml/">walkthroughs</a>  and <a href="https://www.youtube.com/watch?v=bOg8AZSFvOc">videos</a> on how to create a simple mobile phone app with CoreML and have used those to create my app. I can only say a big thank you and kudos to the community being so open and helpful. </p>
<h2>1. Install XCode</h2>
<p>Now it's time to really get our hands dirty. Before you can do anything you have to have XCode. So download it via <a href="https://itunes.apple.com/us/app/xcode/id497799835?mt=12">Apple-Store</a> and install it. In case you already have it, make sure to have at least version 9 and above. </p>
<h2>2. Create the Project</h2>
<p>Start XCode and create a single view app. Name your project accordingly.  I did name mine “numbers”. Select a place to save it. You can leave “create git repository on my mac” checked. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/26a145/single.png" alt=""></figure>
<h2>3. Add the CoreML model</h2>
<p>We can now add the CoreML model that we created using the coremltools converter. Simply drag the model into your project directory. Make sure to drag it into the correct folder (see screenshot). You can use the option “add as Reference”, like this whenever you update your model, you don’t have to drag it into your project again to update it. XCode should automatically recognize your model and realize that it is a model to be used for images. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/d4115c/addmodel.png" alt=""></figure>
<h2>4. Delete the view or storyboard</h2>
<p>Since we are going to use just the camera and display a label we don’t need a fancy graphical user interface - or in other words a view layer. Since the storyboard in Swing corresponds to the view in the MVC pattern we are going to simply delete it. In the project settings deployment info make sure to delete the Main Interface too (see screenshot), by setting it to blank.</p>
<figure><img src="https://liip.rokka.io/www_inarticle/8f4709/storyboard.png" alt=""></figure>
<h2>5. Create the root view controller programmatically</h2>
<p>Instead we are going to create view root controller programmatically by replacing the <code>funct application</code> in AppDelegate.swift with the following code:</p>
<pre><code class="language-swift">// create the view root controller programmatically
func application(_ application: UIApplication, didFinishLaunchingWithOptions launchOptions: [UIApplicationLaunchOptionsKey: Any]?) -&gt; Bool {
    // create the user interface window, make it visible
    window = UIWindow()
    window?.makeKeyAndVisible()

    // create the view controller and make it the root view controller
    let vc = ViewController()
    window?.rootViewController = vc

    // return true upon success
    return true
}</code></pre>
<h2>6. Build the view controller</h2>
<p>Finally it is time to build the view controller. We will use UIKit - a lib for creating buttons and labels, AVFoundation - a lib to capture the camera on the iPhone and Vision - a lib to handle our CoreML model. The last is especially handy if you don’t want to resize the input data yourself. </p>
<p>In the Viewcontroller we are going to inherit from UI and AV functionalities, so we need to overwrite some methods later to make it functional. </p>
<p>The first thing we will do is to create a label that will tell us what the camera is seeing. By overriding the <code>viewDidLoad</code> function we will trigger the capturing of the camera and add the label to the view. </p>
<p>In the function <code>setupCaptureSession</code> we will create a capture session, grab the first camera (which is the front facing one) and capture its output into <code>captureOutput</code> while also displaying it on the <code>previewLayer</code>. </p>
<p>In the function <code>captureOutput</code> we will finally make use of our CoreML model that we imported before. Make sure to hit Cmd+B - build, when importing it, so XCode knows it's actually there. We will use it to predict something from the image that we captured. We will then grab the first prediction from the model and display it in our label. </p>
<pre><code class="language-swift">\\define the ViewController
import UIKit
import AVFoundation
import Vision

class ViewController: UIViewController, AVCaptureVideoDataOutputSampleBufferDelegate {
    // create a label to hold the Pokemon name and confidence
    let label: UILabel = {
        let label = UILabel()
        label.textColor = .white
        label.translatesAutoresizingMaskIntoConstraints = false
        label.text = "Label"
        label.font = label.font.withSize(40)
        return label
    }()

    override func viewDidLoad() {
        // call the parent function
        super.viewDidLoad()       
        setupCaptureSession() // establish the capture
        view.addSubview(label) // add the label
        setupLabel()
    }

    func setupCaptureSession() {
        // create a new capture session
        let captureSession = AVCaptureSession()

        // find the available cameras
        let availableDevices = AVCaptureDevice.DiscoverySession(deviceTypes: [.builtInWideAngleCamera], mediaType: AVMediaType.video, position: .back).devices

        do {
            // select the first camera (front)
            if let captureDevice = availableDevices.first {
                captureSession.addInput(try AVCaptureDeviceInput(device: captureDevice))
            }
        } catch {
            // print an error if the camera is not available
            print(error.localizedDescription)
        }

        // setup the video output to the screen and add output to our capture session
        let captureOutput = AVCaptureVideoDataOutput()
        captureSession.addOutput(captureOutput)
        let previewLayer = AVCaptureVideoPreviewLayer(session: captureSession)
        previewLayer.frame = view.frame
        view.layer.addSublayer(previewLayer)

        // buffer the video and start the capture session
        captureOutput.setSampleBufferDelegate(self, queue: DispatchQueue(label: "videoQueue"))
        captureSession.startRunning()
    }

    func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
        // load our CoreML Pokedex model
        guard let model = try? VNCoreMLModel(for: SimpleMnist().model) else { return }

        // run an inference with CoreML
        let request = VNCoreMLRequest(model: model) { (finishedRequest, error) in

            // grab the inference results
            guard let results = finishedRequest.results as? [VNClassificationObservation] else { return }

            // grab the highest confidence result
            guard let Observation = results.first else { return }

            // create the label text components
            let predclass = "\(Observation.identifier)"

            // set the label text
            DispatchQueue.main.async(execute: {
                self.label.text = "\(predclass) "
            })
        }

        // create a Core Video pixel buffer which is an image buffer that holds pixels in main memory
        // Applications generating frames, compressing or decompressing video, or using Core Image
        // can all make use of Core Video pixel buffers
        guard let pixelBuffer: CVPixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }

        // execute the request
        try? VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: [:]).perform([request])
    }

    func setupLabel() {
        // constrain the label in the center
        label.centerXAnchor.constraint(equalTo: view.centerXAnchor).isActive = true

        // constrain the the label to 50 pixels from the bottom
        label.bottomAnchor.constraint(equalTo: view.bottomAnchor, constant: -50).isActive = true
    }
}</code></pre>
<p>Make sure that you have changed the model part to the naming of your model. Otherwise you will get build errors. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/b4364b/modeldetails.png" alt=""></figure>
<h2>6. Add Privacy Message</h2>
<p>Finally, since we are going to use the camera, we need to inform the user that we are going to do so, and thus add a privacy message “Privacy - Camera Usage Description”  in the Info.plist file under Information Property List. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/29ab1e/privacy.png" alt=""></figure>
<h2>7. Add a build team</h2>
<p>In order to deploy the app on your mobile iPhone, you will need to <a href="https://developer.apple.com/programs/enroll/">register with the Apple developer program</a>. There is no need to pay any money to do so, <a href="https://9to5mac.com/2016/03/27/how-to-create-free-apple-developer-account-sideload-apps/">you can register also without any fees</a>. Once you are registered you can select the team Apple calls it this way) that you have signed up there in the project properties. </p>
<h2>8. Deploy on your iPhone</h2>
<p>Finally it's time to deploy the model on your iPhone. You will need to connect it via USB and then unlock it. Once it's unlocked you need to select the destination under Product - Destination- Your iPhone. Then the only thing left is to run it on your mobile: Select Product - Run (or simply hit CMD + R) in the Menu and XCode will build and deploy the project on your iPhone. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/7cc4f5/destination.png" alt=""></figure>
<h2>9. Try it out</h2>
<p>After having had to jump through so many hoops it is finally time to try out our app. If you are starting it for the first time it will ask you to allow it to use your camera (after all we have placed this info there). Then make sure to hold your iPhone sideways, since it matters on how we trained the network. We have not been using any augmentation techniques, so our model is unable to recognize numbers that are “lying on the side”. We could make our model better by applying these techniques as I have shown in <a href="https://www.liip.ch/en/blog/zoo-pokedex-part-2-hands-on-with-keras-and-resnet50">this blog article</a>.</p>
<p>A second thing you might notice is, that the app always recognizes some number, as there is no “background” class. In order to fix this, we could train the model additionally on some random images, which we classify as the background class. This way our model would be better equipped to tell apart if it is seeing a number or just some random background. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/812493/output.gif" alt=""></figure>
<h2>Conclusion or the famous “so what”</h2>
<p>Obviously this has is a very long blog post. Yet I wanted to get all the necessary info into one place in order to show other mobile devs how easy it is to create your own deep learning computer vision applications. In our case at Liip it will most certainly boil down to a collaboration between our <a href="https://www.liip.ch/en/work/data">data services team</a> and our mobile developers in order to get the best of both worlds. </p>
<p>In fact we are currently innovating together by creating an app that <a href="https://www.liip.ch/en/blog/zoo-pokedex-part-2-hands-on-with-keras-and-resnet50">will be able to recognize</a> <a href="https://www.liip.ch/en/blog/poke-zoo-or-making-deep-learning-tell-oryxes-apart-from-lamas-in-a-zoo-part-1-the-idea-and-concepts">animals in a zoo</a> and working on another small fun game that lets two people doodle against each other: You will be given a task, as in “draw an apple” and the person who draws the apple faster in such a way that it is recognised by the deep learning model wins. </p>
<p>Beyond such fun innovation projects the possibilities are endless, but always depend on the context of the business and the users. Obviously the saying “if you have a hammer every problem looks like a nail to you” applies here too, not every app will benefit from having computer vision on board, and not all apps using computer vision are <a href="https://www.theverge.com/2017/6/26/15876006/hot-dog-app-android-silicon-valley">useful ones</a> as some of you might know from the famous Silicon Valley episode. </p>
<p>Yet there are quite a few nice examples of apps that use computer vision successfully: </p>
<ul>
<li><a href="http://leafsnap.com">Leafsnap</a>, lets you distinguish different types of leafs. </li>
<li><a href="https://www.aipoly.com">Aipoly</a> helps visually impaired people to explore the world.</li>
<li><a href="http://www.snooth.com/iphone-app/">Snooth</a> gets you more infos on your wine by taking a picture of the label.</li>
<li><a href="https://www.theverge.com/2017/2/8/14549798/pinterest-lens-visual-discovery-shazam">Pinterest</a> has launched a visual search that allows you to search for pins that match the product that you captured with your phone. </li>
<li><a href="http://www.caloriemama.ai">Caloriemama</a> lets you snap a picture of your food and tells you how many calories it has. </li>
</ul>
<p>As usual the code that you have seen in this blogpost is <a href="https://github.com/plotti/mnist-to-coreml">available online</a>. Feel free to experiment with it. I am looking forward to your comments and I hope you enjoyed the journey.  P.S. I would like to thank Stefanie Taepke for  proof reading and for her helpful comments which made this post more readable.</p>]]></description>
                  <enclosure url="http://liip.rokka.io/www_card_2/d6f619/p1013593.jpg" length="5538521" type="image/jpeg" />
          </item>
        <item>
      <title>Zoo Pokedex Part 2: Hands on with Keras and Resnet50</title>
      <link>https://www.liip.ch/fr/blog/zoo-pokedex-part-2-hands-on-with-keras-and-resnet50</link>
      <guid>https://www.liip.ch/fr/blog/zoo-pokedex-part-2-hands-on-with-keras-and-resnet50</guid>
      <pubDate>Tue, 07 Aug 2018 00:00:00 +0200</pubDate>
      <description><![CDATA[<h3>Short Recap from Part 1</h3>
<p>In the <a href="https://www.liip.ch/en/blog/poke-zoo-or-making-deep-learning-tell-oryxes-apart-from-lamas-in-a-zoo-part-1-the-idea-and-concepts">last blog post</a> I briefly discussed the potential of using deep learning to build a zoo pokedex app that could be used to motivate zoo goers to engage with the animals and the information. We also discussed the <a href="http://image-net.org">imagenet competition</a> and how deep learning has drastically changed the image recognition game. We went over the two main tricks that deep learning architectures do, namely convolutions and pooling, that allow such deep learning networks to perform extremely well. Last but not least we realized that all you have to do these days is to stand on the shoulders of giants by using the existing networks (e.g. Resnet50)  to be able to write applications that have a similar state of the art precision.  So finally in this blog post it’s time to put these giants to work for us.</p>
<h3>Goal</h3>
<p>The goal is to write an image detection app that will be able to distinguish animals in our zoo. Now for obvious reasons I will make our zoo really small, thus only containing two types of animals:</p>
<ul>
<li>Oryxes and</li>
<li>LLamas (why there is a second L in english is beyond my comprehension).</li>
</ul>
<figure><img src="https://liip.rokka.io/www_inarticle/8c74f3/lamavsoryx.jpg" alt=""></figure>
<p>Why those animals? Well they seem fluffy, but mostly because the original imagenet competition does not contain these animals. So it represents a quite realistic scenario of a Zoo having animals that need to be distinguished but having existing deep learning networks that have not been trained for those. I really have picked these two kinds of animals mostly by random just to have something to show. (Actually I checked if the Zürich Zoo has these so i can take our little app and test it in real life, but that's already part of the third blog post regarding this topic)</p>
<h3>Getting the data</h3>
<p>Getting data is easier than ever in the age of the internet. Probably in the 90ties I would have had to go to some archive or even worse take my own camera and shoot lots and lots of pictures of these animals to use them as training material. Today I can just ask Google to show me some. But wait - if you have actually tried using Google Image search as a resource you will realize that downloading their images in huge amounts is a pain in the ass. The image api is highly limited in terms of what you can get for free, and writing scrapers that download such images is not really fun. That's why I went to the competition and used Microsoft's cognitive services to download images for each animal. </p>
<h3>Downloading image data from Microsoft</h3>
<p>Microsoft offers quite a convenient image search API via their <a href="https://azure.microsoft.com/en-us/services/cognitive-services/">cogitive services</a>. You can sign up there to get a free tier for a couple of days, which should be enough to get you started. What you basically need is an API Key and then you can already start downloading images to create your datasets. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/b79e82/microsoft.jpg" alt=""></figure>
<pre><code class="language-ruby "># Code to download images via Microsoft cognitive api
require 'HTTParty'
require 'fileutils'

API_KEY = "##############"
SEARCH_TERM = "alpaka"
QUERY = "alpaka"
API_ENDPOINT  = "https://api.cognitive.microsoft.com/bing/v7.0/images/search"
FOLDER = "datasets"
BATCH_SIZE = 50
MAX = 1000

# Make the dir
FileUtils::mkdir_p "#{FOLDER}/#{SEARCH_TERM}"

# Make the request
headers = {'Ocp-Apim-Subscription-Key' =&gt; API_KEY}
query = {"q": QUERY, "offset": 0, "count": BATCH_SIZE}
puts("Searching for #{SEARCH_TERM}")
response = HTTParty.get(API_ENDPOINT,:query =&gt; query,:headers =&gt; headers)
total_matches = response["totalEstimatedMatches"]

i = 0
while response["nextOffset"] != nil &amp;&amp; i &lt; MAX
    response["value"].each do |image|
        i += 1
        content_url = image["contentUrl"]
        ext = content_url.scan(/^\.|jpg$|gif$|png$/)[0]
        file_name = "#{FOLDER}/#{SEARCH_TERM}/#{i}.#{ext}"
        next if ext == nil
        next if File.file?(file_name)
        begin
            puts("Offset #{response["nextOffset"]}. Downloading #{content_url}")
            r = HTTParty.get(content_url)
            File.open(file_name, 'wb') { |file| file.write(r.body) }
        rescue
            puts "Error fetching #{content_url}"
        end
    end
    query = {"q": SEARCH_TERM, "offset": i+BATCH_SIZE, "count": BATCH_SIZE}
    response = HTTParty.get(API_ENDPOINT,:query =&gt; query,:headers =&gt; headers)
end</code></pre>
<p>The ruby code above simple uses the API in batches and downloads llamas and oryxes into their separate directories and names them accordingly. What you don’t see is that I went through these folders by hand and removed images that were not really the animal, but for example a fluffy shoe, that showed up in the search results. I also de-duped each folder. You can scan the images quickly on your mac using the thumbnail preview or use an image browser that you are familiar with to do the job. </p>
<h3>Problem with not enough data</h3>
<p>Ignoring probable copyright issues (Am i allowed to train my neural network on copyrighted material) and depending on what you want to achieve you might run into the problem, that it’s not really that easy to gather 500 or 5000 images of oryxes and llamas. Also to make things a bit challenging I tried to see if it was possible to train the neural networks using only 100 examples of each animal while using roughly 50 examples to validate the accuracy of the networks. </p>
<p>Normally everyone would tell you that you need definitely more image material because deep learning networks need a lot of data to become useful. But in our case we are going to use two dirty tricks to try to get away with our really small collection: data augmentation and reuse of already pre-trained networks. </p>
<h3>Image data generation</h3>
<p>A really neat handy trick that seems to be prevalent everyday now is to take the images that you already have and change them slightly artificially. That means rotating them, changing the perspective, zooming in on them. What you end up is, that instead of having one image of a llama, you’ll have 20 pictures of that animal, just every picture being slightly different from the original one. This trick allows you to create more variation without actually having to download more material. It works quite well, but is definitely inferior to simply having more data.  </p>
<p>We will be using <a href="http://keras.io">Keras</a> a deep learning library on top of tensorflow, that we have used before in <a href="https://www.liip.ch/en/blog/tensorflow-and-tflearn-or-can-deep-learning-predict-if-dicaprio-could-have-survived-the-titanic">other</a> blog posts to <a href="https://www.liip.ch/en/blog/sentiment-detection-with-keras-word-embeddings-and-lstm-deep-learning-networks">create a good sentiment detection</a>. In the domain of image recognition Keras can really show its strength, by already having built in methods to do image data generation for us, without having to involve any third party tools. </p>
<pre><code class="language-python"># Creating a Image data generator
train_datagen = ImageDataGenerator(preprocessing_function=preprocess_input,
    shear_range=0.2, zoom_range=0.2, horizontal_flip=True)</code></pre>
<p>As you can see above we have created an image data generator, that uses sheering, zooming and horizontal flipping to change our llama pictures. We don’t do a vertical flip for example because its rather unrealistic that you will hold your phone upside down.  Depending on the type of images (e.g. aerial photography) different transformations might or might not make sense.</p>
<pre><code class="language-python"># Creating variations to show you some examples
img = load_img('data/train/alpaka/Alpacca1.jpg')
x = img_to_array(img) 
x = x.reshape((1,) + x.shape)  
i = 0
for batch in train_datagen.flow(x, batch_size=1,
                          save_to_dir='preview', save_prefix='alpacca', save_format='jpeg'):
    i += 1
    if i &gt; 20:
        break  # otherwise the generator would loop indefinitely</code></pre>
<figure><img src="https://liip.rokka.io/www_inarticle/31a080/variations.png" alt=""></figure>
<p>Now if you want to use that generators in our model directly you can use the convenient flow from directory method, where you can even define the target size, so you don’t have to scale down your training images with an external library. </p>
<pre><code class="language-python"># Flow from directory method
train_generator = train_datagen.flow_from_directory(train_data_dir,
    target_size=(sz, sz),
    batch_size=batch_size, class_mode='binary')</code></pre>
<h3>Using Resnet50</h3>
<p>In order to finally step on the shoulder of giants we can simply import the resnet50 model, that we talked about earlier. <a href="http://ethereon.github.io/netscope/#/gist/db945b393d40bfa26006">Here</a> is a detailed description of each layer and <a href="https://arxiv.org/pdf/1512.03385.pdf">here is the matching paper</a> that describes it in detail. While there are <a href="https://keras.io/applications/">different alternatives that you might also use</a> the resnet50 model has a fairly high accuracy, while not being too “big” in comparison to the computationally expensive <a href="http://www.robots.ox.ac.uk/~vgg/">VGG</a> network architecture.</p>
<p>On a side note: The name “res” comes from residual. A residual can be understood a a subtraction of features that were learned from the input a leach layer. ResNet has a very neat trick that allows deeper network to learn from residuals by “short-circuiting” them with the deeper layers. So directly connecting the input of an n-th layer to some (n+x)th layer. This short-circuiting has been proven to make the training easier. It does so by helping with the problem of degrading accuracy, where networks that are too deep are becoming exponentially harder to train. </p>
<pre><code class="language-python">#importing resnet into keras
from keras.models import load_model
base_model = ResNet50(weights='imagenet')</code></pre>
<figure><img src="https://liip.rokka.io/www_inarticle/3b54cc/comparison.jpg" alt=""></figure>
<p>As you can see above, importing the network is really dead easy in keras. It might take a while to download the network though. Notice that we are downloading the weights too, not only the architecture.</p>
<h3>Training existing models</h3>
<p>The next part is the exciting one. Now we finally get to train the existing networks on our own data. The simple but ineffective approach would be to download or just re-build the architecture of the successful network and train those with our data. The problem with that approach is, that we only have 100 images per class. 100 images per class  are not even remotely close to being enough data to train those networks well enough to be useful. </p>
<p>Instead we will try another technique (which I somewhat stole from the <a href="https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html">great keras blog</a>): We will freeze all weights of the downloaded network and add three final layers at the end of the network and then train those. </p>
<h3>Freezing the base model</h3>
<p>Why is this useful you might ask: Well by doing so we can freeze all of the existing layers of the resnet50 network and just train the final layer. This makes sense, since the imagenet task is about recognizing everyday objects from everyday photographs, and it is already very good at recognising “basic” features such as legs, eyes, circles, heads, etc… All of this “smartness” is already encoded in the weights (see the last blog post). If we throw these weights away we will lose these nice smart properties. But instead we can just glue another pooling layer and a dense layer at the very end of it, followed by a sigmoid activation layer, that's needed to distinguish between our two classes. That's by the way why it says “include_top=False” in the code, in order to not include the initial 1000 classes layer, that was used for the imagenet competition. Btw. If you want to read up on the different alternatives to the resnet50 you will find them <a href="https://keras.io/applications/">here</a>.</p>
<pre><code class="language-python"># Adding three layers on top of the network
base_model = ResNet50(weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(1, activation='sigmoid')(x)</code></pre>
<p>Finally we can now re-train the network with our own image material and hope for it to turn out to be quite useful. I’ve had some trouble finding the right optimizer that had proper results. Usually you will have to experiment with the right learning rate to find a configuration that has an improving accuracy in the training phase.</p>
<pre><code class="language-python">#freezing all the original weights and compiling the network
from keras import optimizers
optimizer = optimizers.RMSprop(lr=0.00001, rho=0.9, epsilon=None, decay=0.0)
model = Model(inputs=base_model.input, outputs=predictions)
for layer in base_model.layers: layer.trainable = False
model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
model.fit_generator(train_generator, train_generator.n // batch_size, epochs=3, workers=4,
        validation_data=validation_generator, validation_steps=validation_generator.n // batch_size)</code></pre>
<p>The training shouldn’t take long, even when you are using just a CPU instead of a GPU and the output might look something like this:</p>
<figure><img src="https://liip.rokka.io/www_inarticle/dc208a/training.png" alt=""></figure>
<p>You’ll notice that we reached an accuracy of 71% which isn’t too bad, given that we have only 100 original images of each class. </p>
<h3>Fine-tuning</h3>
<p>One thing that we might do now is to unfreeze some of the very last layers in the network and re-train the network again, allowing those layers to change slightly. We’ll do this in the hope that allowing for more “wiggle-room”, while changing most of the actual weights, the network might give us better results. </p>
<pre><code class="language-python "># Make the very last layers trainable
split_at = 140
for layer in model.layers[:split_at]: layer.trainable = False
for layer in model.layers[split_at:]: layer.trainable = True
model.compile(optimizer=optimizers.RMSprop(lr=0.00001, rho=0.9, epsilon=None, decay=0.0), loss='binary_crossentropy', metrics=['accuracy'])    
model.fit_generator(train_generator, train_generator.n // batch_size, epochs=1, workers=3,
        validation_data=validation_generator, validation_steps=validation_generator.n // batch_size)</code></pre>
<figure><img src="https://liip.rokka.io/www_inarticle/2e0324/improvement.png" alt=""></figure>
<p>And indeed it helped our model to go from 71% accuracy to 82%! You might want play around with the learning rates a bit or maybe split it at a different depth, in order to tweak results. But generally I think that just adding more images would be the easiest way to achieve 90% accuracy.  </p>
<h3>Confusion matrix</h3>
<p>In order to see how well our model is doing we might also compute a confusion matrix, thus calculating the true positives, true negatives, and the false positives and false negatives. </p>
<pre><code class="language-python"># Calculating confusion matrix
from sklearn.metrics import confusion_matrix
r = next(validation_generator)
probs = model.predict(r[0])
classes = []
for prob in probs:
    if prob &lt; 0.5:
        classes.append(0)
    else:
        classes.append(1)
cm = confusion_matrix(classes, r[1])
cm</code></pre>
<p>As you can see above I simply took the first batch from the validation generator (so the images of which we know if its a alpakka  or an oryx) and then use the <a href="http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html#sphx-glr-auto-examples-model-selection-plot-confusion-matrix-py">confusion matrix from scikit-learn</a> to output something. So in the example below we see that 28 resp. 27 images of each class were labeled correctly while making an error in 4 resp. 5 images. I would say that’s quite a good result, given that we used only so little data.</p>
<pre><code class="language-python">#example output of confusion matrix
array([[28,  5],
       [ 4, 27]])</code></pre>
<h3>Use the model to predict images</h3>
<p>Last but not least we can of course finally use the model to predict if an animal in our little zoo is an oryx or an alpakka. </p>
<pre><code class="language-python"># Helper function to display images
def load_image(img_path, show=False):

    img = image.load_img(img_path, target_size=(224, 224))
    img_tensor = image.img_to_array(img)                    # (height, width, channels)
    img_tensor = np.expand_dims(img_tensor, axis=0)         # (1, height, width, channels), add a dimension because the model expects this shape: (batch_size, height, width, channels)
    #img_tensor /= 255.                                      # imshow expects values in the range [0, 1]

    if show:
        plt.imshow(img_tensor[0]/255)                           
        plt.axis('off')
        plt.show()

    return img_tensor

# Load two sample images
oryx = load_image("data/valid/oryx/106.jpg", show=True)
alpaca = load_image("data/valid/alpaca/alpaca102.jpg", show=True)
model.predict(alpaka)
model.predict(oryx)</code></pre>
<figure><img src="https://liip.rokka.io/www_inarticle/6d2129/prediction.png" alt=""></figure>
<p>As you can see in the output, our model successfully labeled the alpaca as an alpaca since the value was less than 0.5 and the oryx as an oryx, since the value was &gt; 0.5. Hooray! </p>
<h3>Conclusion or What’s next?</h3>
<p>I hope that the blog post was useful to you, and showed you that you don’t really need much in order to get started with deep learning for image recognition. I know that our example zoo pokedex is really small at this point, but I don’t see a reason (apart from the lack of time and resources) why it should be a problem to scale out from our 2 animals to 20 or 200. </p>
<p>On the technical side, now that we have a model running that’s kind of useful, it would be great to find out how to use it in on a smartphone e.g. the IPhone, to finally have a pokedex that we can really try out in the wild. I will cover that bit in the third part of the series, showing you how to export existing models to Apple mobile phones making use of the <a href="https://developer.apple.com/machine-learning/">CoreML</a> technology. As always I am looking forward to your comments and corrections and point you to the ipython notebook that you can download <a href="https://github.com/plotti/zoo/blob/master/Zoo%20prediction.ipynb">here</a>.</p>]]></description>
                  <enclosure url="http://liip.rokka.io/www_card_2/460c46/adorable-adult-animals-1040396.jpg" length="2036526" type="image/jpeg" />
          </item>
        <item>
      <title>Poke-Zoo - How to use deep learning image recognition to tell oryxes apart from llamas in a zoo</title>
      <link>https://www.liip.ch/fr/blog/poke-zoo-or-making-deep-learning-tell-oryxes-apart-from-lamas-in-a-zoo-part-1-the-idea-and-concepts</link>
      <guid>https://www.liip.ch/fr/blog/poke-zoo-or-making-deep-learning-tell-oryxes-apart-from-lamas-in-a-zoo-part-1-the-idea-and-concepts</guid>
      <pubDate>Wed, 18 Jul 2018 00:00:00 +0200</pubDate>
      <description><![CDATA[<p>We’ve all witnessed the hype in 2016 when people started hunting pokemons in “real-life” with the app Pokémon GO . It was one of the apps with the <a href="http://www.businessofapps.com/data/pokemon-go-statistics/">fastest rise</a> in user-base and for a while with a higher addiction rate than crack - correction: I mean candycrush. Comparing it to technologies like telephone or email, <a href="http://blog.interactiveschools.com/blog/50-million-users-how-long-does-it-take-tech-to-reach-this-milestone">it only took it 19 days to reach 50 mio users</a>, vs. 75 years for the telephone. </p>
<h3>Connecting the real with the digital world</h3>
<p>You might be wondering, why I am reminiscing about old apps, we have certainly all moved on since the Pokemon GO hype in 2016 and are doing other serious things now. True, but I think though that the idea of “collecting” virtual things that are bound to real-life locations was a great idea and that we want to build more of it in the future. That’s why Pokemon is the starting point fort this blogpost. In fact If you are young enough to have watched the pokemon series, you are probably familiar with the idea of the pokedex. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/22bd5c/pokedex.jpg" alt=""></figure>
<h3>The idea</h3>
<p>The pokedex was a small device that Ash (the main character) could use to lookup information about certain pokemons in the animated series. He used it now and then to lookup some facts about them. While we have seen how popular the pokemon GO was, by connecting the real with the digital world, why not take the idea of the pokedex and apply it in  real world scenarios, or:</p>
<p><strong><em> What if we had such an app to distinguish not pokemons but animals in the zoo? </em></strong></p>
<h2>The Zoo-Pokedex</h2>
<p>Imagine a scenario where kids have an app their parent’s mobile phones - the zoo-pokedex. They start it up when entering a zoo and they then go exploring. When they are at a cage they point the phones camera at the cage and try to film the animal with it. The app recognizes which animal they are seeing and gives them additional information on it as a reward. </p>
<p>Instead of perceiving the zoo as a educational place where you have to go from cage to cage and observe the animal, absorb the info material you could send them out there and let them “capture” all the animals with their Zoo-Pokedex. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/c89545/pokedexzoo.jpg" alt=""></figure>
<p>Let’s have a look at the classic ways of learning about animals in a zoo:</p>
<p>Reading a plaque in the zoo feels boring and dated<br />
Reading an info booklet you got at the cashier feels even worse<br />
Bringing your own book about animals might be fun, when comparing the pictures of animals in the book with the real ones, but there is no additional information<br />
Having a QR code at the cage that you need to scan, will never feel exciting or fun<br />
Having a list of animals in my app that I can tap on to get more info could be fun, but more for parents in order to appear smart before their kids giving them facts about the animal</p>
<p>Now imagine the zoo-pokedex, you really need to go exploring the zoo in order to get information. In cases where the animals area is big and it can retreat you need to wait in front of it to take a picture of it. That takes endurance and perseverance. It might even be the case that you don’t get to see it and have to come back. When the animal appears in front of you, you’ll need to be quick - maybe even an element of surprise, excitement is there - you need to get that one picture of the animal in order to check of the challenge. Speaking of challenge, why not make it a challenge to have seen every animal in the zoo? That would definitely mean you need to come back multiple times, take your time and go home having ticked off 4-5 animals in your visit. This experience encourages you to come back and try again next time. And each time you learn something, you go home with a sense of accomplishment.  </p>
<p>That would definitely be quite interesting, but how could such a device work? Well we would definitively use the phone’s camera and we could train a deep learning network to recognize the animals that are present in the zoo. </p>
<p>So imagine a kid walking up to an area and then trying to spot the animal in order to point his mobile phone to it and then magically a green check-mark appears next to it. We could display some additional info material like where they are originally from, what they eat, when they sleep etc.., but definitely those infos would feel much more entertaining than just reading them off a boring info plaque.</p>
<h3>How train the Pokedex to distinguish new animals</h3>
<p>Well nice idea you say, but how am I going to make that magical device that will recognize animals, especially the “weird” ones e.g. the oryx in the title :) . The answer is …. of course …. deep learning. </p>
<p>In recent years you have probably noticed the rise of deep learning in different areas of machine learning and noticed their practical applications in your everyday life. In fact I have covered a couple of these practical applications such as <a href="https://www.liip.ch/en/blog/sentiment-detection-with-keras-word-embeddings-and-lstm-deep-learning-networks">state of the art sentiment detection</a> or <a href="https://www.liip.ch/en/blog/tensorflow-and-tflearn-or-can-deep-learning-predict-if-dicaprio-could-have-survived-the-titanic">survival rates for structured data</a> or <a href="https://www.liip.ch/en/blog/betti-bossi-recipe-assistant-prototype-with-automatic-speech-recognition-asr-and-text-to-speech-tts-on-socket-io">automatic speech recognition</a> and <a href="https://www.liip.ch/en/blog/recipe-assistant-prototype-with-asr-and-tts-on-socket-io-part-3-developing-the-prototype">text to speech applications</a> in our blog. </p>
<h3>Deep learning image categorization task</h3>
<p>The area we need for our little zoo-pokedex is image categorization. Image categorization tasks have advanced tremendously in the last years, due to deep learning outperforming all other machine learning approaches (see below). One good indicator of this movement is the yearly <a href="http://www.image-net.org">imagenet competition</a>, which is about letting machine learning algorithms compete about the best way of finding out what can be seen on an image. The task is simple: there are 1000 categories of everyday objects such as cats, elephants, tea-cattles and millions of images that need to be mapped to one of these categories. The algorithm that makes the lowest error wins. Below is an example of the output on the sample images. You’ll notice that the algorithm displays the label of which it thinks the image belongs to. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/351c6b/imagenet.jpg" alt=""></figure>
<p>Now this ILSVRC competition has been going on for a couple of years now and while the improvements that have been made have been astonishing each year, in the last 5 years especially in 2012 and 2013 deep learning appeared with a big bang on the horizon. As you can see on the image below the amount of state of the art solutions exploded and outperformed all other solutions in this area. It even goes so far that the ability of the algorithm to tell the contents apart is better than this of a competing human group. This super-human ability of deep learning networks in these areas is what the hype is all about. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/46be78/solutions.jpg" alt=""></figure>
<h3>How does it work?</h3>
<p>In this blog post I don’t want to be technical but just show you how two easy concepts of convolution (kernels) and pooling are applied in a smart way to really achieve outstanding results in image recognition tasks with deep learning. I don’t want to go into details how deep learning works in the way of how it learns in the form of updating of weights, backpropagation but abstract all of this stuff away from you. In fact if you have 20 minutes and are a visual learner I definitely recommend that video below that does an extremely good job at explaining the concepts behind it:</p>
<figure class="embed-responsive embed-responsive--16/9"><iframe src="//youtube.com/embed/aircAruvnKk" frameborder="0" webkitallowfullscreen="true" mozallowfullscreen="true" allowfullscreen="true"></iframe></figure>
<p>Instead I will quickly cover the two basic tricks that are used to make things really work. </p>
<p>We’ll start by looking at a common representation of a deep learning network above and you’ll  notice that two words appear a lot there, namely convolution and pooling. While it seems obvious that the image data has to travel through these layers from left to right, it would be cool if we only knew what these layers do. </p>
<h3>Convolutions and Kernels</h3>
<p>If you are not a native speaker you’ve probably have never heard of the word convolution before and might be quite puzzled when you hear it. For me it also sounded like some magic procedure that apparently does something very complicated and apparently makes the deep learning work :). </p>
<p>After getting into the field I realized that it's basically its an image transformation that is almost 20 years old (e.g. Computer Vision. From Prentice Hall book by Shapiro)  and present in your everyday image editing software. Things like sharpening an image or blurring it, or finding edges are basically a convolution. It's a process of applying a small e.g. 3x3 Matrix over each pixel of your image and multiply this value with the neighbouring pixels and then collect the results of that manipulation in a new image.</p>
<p>To make this concept more understandable I stole some <a href="http://setosa.io/ev/image-kernels/">examples</a> of how a 3x3 matrix, also called a kernel, transforms an image after being applied to every pixel in your image. </p>
<p>In the image below the kernel gives you the top-edges in your image. The numbers in the grey boxes represent the gray image values (from 0 black to 255 white) and the little numbers after the X represent how these numbers are multiplied when added together. If you change these numbers you get another transformation. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/edbbdd/top-edge.jpg" alt=""></figure>
<p>Here is another set of numbers in the 3x3 matrix that will blur your image. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/1ae7fe/blur.jpg" alt=""></figure>
<p>Now normally the way of create such “filters” is to hand-tune these numbers by hand to achieve the desired results. With some logical thinking you can easily come up with filters that sharpen or blur an image and then apply those to the image. But how are these applied in the context of deep learning?</p>
<p>With deep learning we do things the other way round, we teach the neural network to find filters that are somewhat useful in regards to the final result. So for example to tell a zebra apart from an elephant it would really be useful if we had a filter that detects diagonal edges. And if the image has diagonal edges e.g. the stripes of the zebra, it's probably not an elephant. So we train the network on our training images of zebras and elephants and let it learn these filters or kernels on its own. If the emerging kernels are helpful with the task they have a tendency to stay, if not, they keep on updating themselves until they become useful. </p>
<p>So one layer that applies such filters or kernels or convolutions is called a convolutional layer. And now comes another cool property. If you keep on stacking such layers on top of each other, each of these layers will find own filters that are helpful. And on top of that each of these filters will become more and more complicated and be able to detect more detailed features.</p>
<figure><img src="https://liip.rokka.io/www_inarticle/9904f6/layer.jpg" alt=""></figure>
<p>In the image above (which is from a seminal <a href="https://arxiv.org/pdf/1311.2901.pdf">paper</a>, you see gray boxes and images. A great way to show these filters is to show the activations or convolutions which are these gray boxes. The images are samples that “trigger” these filters the most. Or said the other way round, these are images that these filters detect well. </p>
<p>So for example in the first layer you’ll notice that the network detects mostly vertical, horizontal and diagonal edges. In the second layer its already a bit “smarter” and is able to detect round things, e.g. eyes or corners of frames etc.. In the third layer its already a bit smarter and is able to detect not only round things but things that look like car tires for example. This layering often goes on and on for many layers. Some networks have over 200 of these layers. That's why they are called deep. Now you know. So usually adding more and more of these layers makes the network better at detecting things but also it makes it slower and sometimes less able to generalize for things it had not seen yet.  </p>
<h3>Pooling</h3>
<p>The second word that you might see a lot in those architecture above is the word pooling. Here the trick is really simple: You look at a couple of pixels next to each other e.g. 2x2 and simply take the biggest value - also called max-pooling. In the image below this trick has been applied for each colored 2x2 area and the output is a much smaller image. Now why are we doing this?</p>
<p>The answer is simple, in order to be size invariant. We try to scale the image down and up  multiple times in order to be able to detect a zebra that is really close to the camera vs. one that might only be viewable in the far distance. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/52da95/pooling.jpg" alt=""></figure>
<h3>Putting things together</h3>
<p>After the small excursion into the two main principles of inner workings of state of the art deep learning networks we have to ask the question of how we are going to use these tricks to detect our animals in the zoo. </p>
<p>While a few years ago you would have had to write a lot of code and hire a whole machine team to do this task, today you can already stand on the shoulders of giants. Thanks to the Imagenet competitions (and I guess thanks to Google, Microsoft and other research teams constantly outputting new research) we can use some of these pretrained networks to do our job for us. What does this mean?</p>
<p>The networks that are often used in these competitions can be obtained freely (In fact they even come <a href="https://github.com/pytorch/pytorch pre-bundled to the deep-learning frameworks"><a href="https://github.com/pytorch/pytorch">https://github.com/pytorch/pytorch</a> pre-bundled to the deep-learning frameworks</a>) and you can use networks these without any tuning in order to be able to categorize your image into the 1000 categories that are used in the competition. As you can see in the image below the bigger in terms of layers the network the better it performs, but also the slower it is and the more data it needs to be trained. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/3b54cc/comparison.jpg" alt=""></figure>
<h3>Outlook - Part 2 How to train state of the art image recognition networks to categorize new material</h3>
<p>The cool thing now is that in the next blog post we will use these pretrained networks and teach them new tricks. In our case teach them to tell apart a llama from an oryx, for our zoo pokedex. So basically train these network to recognize things these networks have never been trained to do. So obviously we will need training data and we have to find a way to somehow teach them new stuff without “destroying” their properties of being really good at detecting common things. </p>
<p>Finally after that blog post I hope to leave you with at least one the takeaway of demystifying deep learning networks in the image recognition domain. So hopefully whenever you see these weird architecture drawings of image recognition deep learning networks and you see those steps saying “convolution” and “pooling” you’ll hopefully know that this magic sauce is not that magic after all. It’s just a very smart way of applying those very old techniques to achieve outstanding results.</p>]]></description>
                  <enclosure url="http://liip.rokka.io/www_card_2/b498f0/animals-assorted-background-953211.jpg" length="541852" type="image/jpeg" />
          </item>
        <item>
      <title>Recipe Assistant Prototype with ASR and TTS on Socket.IO - Part 3 Developing the prototype</title>
      <link>https://www.liip.ch/fr/blog/recipe-assistant-prototype-with-asr-and-tts-on-socket-io-part-3-developing-the-prototype</link>
      <guid>https://www.liip.ch/fr/blog/recipe-assistant-prototype-with-asr-and-tts-on-socket-io-part-3-developing-the-prototype</guid>
      <pubDate>Tue, 12 Jun 2018 00:00:00 +0200</pubDate>
      <description><![CDATA[<p>Welcome to part three of three in our mini blog post series on how to build a recipe assistant with automatic speech recognition and text to speech to deliver a hands free cooking experience. In the first blog post we gave you a hands on <a href="https://www.liip.ch/en/blog/betti-bossi-recipe-assistant-prototype-with-automatic-speech-recognition-asr-and-text-to-speech-tts-on-socket-io">market overview</a> of existing Saas and opensource TTS solutions, in the second post we have put the user in the center by covering the <a href="https://www.liip.ch/en/blog/recipe-assistant-prototype-with-asr-and-tts-on-socket-io-part-2-ux-workshop">usability aspects of dialog driven apps</a> and how to create a good conversation flow. Finally it's time to get our hands dirty and show you some code. </p>
<h3>Prototyping with Socket.IO</h3>
<p>Although we envisioned the final app to be a mobile app and run on a phone it was much faster for us to build a small Socket.io web application, that is basically mimicking how an app might work on the mobile. Although socket.io is not the newest tool in the shed, it was great fun to work with it because it was really easy to set up. All you needed is a js library on the HTML side and tell it to connect to the server, which in our case is a simple python flask micro-webserver app.</p>
<pre><code class="language-html">#socket IO integration in the html webpage
...
&lt;script src="https://cdnjs.cloudflare.com/ajax/libs/socket.io/2.1.0/socket.io.js"&gt;&lt;/script&gt;
&lt;/head&gt;
&lt;body&gt;
&lt;script&gt;
$(document).ready(function(){
    var socket = io.connect('http://' + document.domain + ':' + location.port);
    socket.on('connect', function() {
        console.log("Connected recipe");
        socket.emit('start');
    });
    ...</code></pre>
<p>The code above connects to our flask server and emits the start message, signaling that our audio service can start reading the first step. Depending on different messages we can quickly alter the DOM or do other things in almost real time, which is very handy.</p>
<p>To make it work on the server side in the flask app all you need is a <a href="https://flask-socketio.readthedocs.io">python library</a> that you integrate in your application and you are ready to go:</p>
<pre><code class="language-python"># socket.io in flask
from flask_socketio import SocketIO, emit
socketio = SocketIO(app)

...

#listen to messages 
@socketio.on('start')
def start_thread():
    global thread
    if not thread.isAlive():
        print("Starting Thread")
        thread = AudioThread()
        thread.start()

...

#emit some messages
socketio.emit('ingredients', {"ingredients": "xyz"})
</code></pre>
<p>In the code excerpt above we start a thread that will be responsible for handling our audio processing. It starts when the web server receives the start message from the client, signalling that he is ready to lead a conversation with the user. </p>
<h3>Automatic speech recognition and state machines</h3>
<p>The main part of the application is simply a while loop in the thread that listens to what the user has to say. Whenever we change the state of our application, it displays the next recipe state and reads it out loudly. We’ve sketched out the flow of the states in the diagram below. This time it is really a simple mainly linear conversation flow, with the only difference, that we sometimes branch off, to remind the user to preheat the oven, or take things out of the oven. This way we can potentially save the user time or at least offer some sort of convenience, that he doesn’t get in a “classic” recipe on paper. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/dc0509/flow.png" alt=""></figure>
<p>The automatic speech recognion (see below) works with <a href="https://wit.ai">wit.ai</a>  in the same manner like I have shown in my recent <a href="https://www.liip.ch/en/blog/speech-recognition-with-wit-ai">blog post</a>. Have a look there to read up on the technology behind it and find out how the RecognizeSpeech class works. In a nutshell we are recording 2 seconds of audio locally and then sending it over a REST API to <a href="https://wit.ai">Wit.ai</a> and waiting for it to turn it into text. While this is convenient from a developer’s side - not having to write a lot of code and be able to use a service - the downside is the reduced usability for the user. It introduces roughly 1-2 seconds of lag, that it takes to send the data, process it and receive the results. Ideally I think the ASR should take place on the mobile device itself to introduce as little lag as possible. </p>
<pre><code class="language-python">#abbreviated main thread

self.states = ["people","ingredients","step1","step2","step3","step4","step5","step6","end"]
while not thread_stop_event.isSet():
    socketio.emit("showmic") # show the microphone symbol in the frontend signalling that the app is listening
    text = recognize.RecognizeSpeech('myspeech.wav', 2) #the speech recognition is hidden here :)
    socketio.emit("hidemic") # hide the mic, signaling that we are processing the request

    if self.state == "people":
        ...
        if intro_not_played:
            self.play(recipe["about"])
            self.play(recipe["persons"])
            intro_not_played = False
        persons = re.findall(r"\d+", text)
        if len(persons) != 0:
            self.state = self.states[self.states.index(self.state)+1]
        ...
    if self.state == "ingredients"
        ...
        if intro_not_played:
            self.play(recipe["ingredients"])
            intro_not_played = False
        ...
        if "weiter" in text:
            self.state = self.states[self.states.index(self.state)+1]
        elif "zurück" in text:
            self.state = self.states[self.states.index(self.state)-1]
        elif "wiederholen" in text:
            intro_not_played = True #repeat the loop
        ...
</code></pre>
<p>As we see above, depending on the state that we are in, we play the right audio TTS to the user and then progress into the next state. Each step also listens if the user wanted to go forward (weiter), backward (zurück) or repeat the step  (wiederholen), because he might have misheard. </p>
<p>The first prototype solution, that I am showing above, is not perfect though, as we are not using a wake-up word. Instead we are offering the user periodically a chance to give us his input. The main drawback is that when the user speaks when it is not expected from him, we might not record it, and in consequence be unable to react to his inputs. Additionally sending audio back and forth in the cloud, creates a rather sluggish experience. I would be much happier to have the ASR part on the client directly especially when we are only listening to mainly 3-4 navigational words. </p>
<h3>TTS with Slowsoft</h3>
<p>Finally you have noticed above that there is a play method in the code above. That's where the TTS is hidden. As you see below we first show the speaker symbol in the application, signalling that now is the time to listen. We then send the text to Slowsoft via their API and in our case define the dialect &quot;CHE-gr&quot; and the speed and pitch of the output.</p>
<pre><code class="language-python">#play function
    def play(self,text):
        socketio.emit('showspeaker')
        headers = {'Accept': 'audio/wav','Content-Type': 'application/json', "auth": "xxxxxx"}
        with open("response.wav", "wb") as f: 
            resp = requests.post('https://slang.slowsoft.ch/webslang/tts', headers = headers, data = json.dumps({"text":text,"voiceorlang":"gsw-CHE-gr","speed":100,"pitch":100}))
            f.write(resp.content)
            os.system("mplayer response.wav")</code></pre>
<p>The text snippets are simply parts of the recipe. I tried to cut them into digestible parts, where each part contains roughly one action. Here having an already structured recipe in the <a href="http://open-recipe-format.readthedocs.io/en/latest/topics/tutorials/walkthrough.html">open recipe</a> format helps a lot, because we don't need to do any manual processing before sending the data. </p>
<h3>Wakeup-word</h3>
<p>We took our prototype for a spin and realized in our experiments that it is a must to have a wake-up. We simply couldn’t time the input correctly to enter it when the app was listening, this was a big pain for user experience. </p>
<p>I know that nowadays smart speakers like alexa or google home provide their own wakeup word, but we wanted to have our own. Is that even possible? Well, you have different options here. You could train a deep network from scratch with <a href="https://www.tensorflow.org/mobile/tflite/">tensorflow-lite</a> or create your own model by following along this tutorial on how to create a <a href="https://www.tensorflow.org/tutorials/audio_recognition">simple</a> speech recognition with tensorflow. Yet the main drawback is that you might need a lot (and I mean A LOT as in 65 thousand samples) of audio samples. That is not really applicable for most users. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/088ea4/snowboy.png" alt=""></figure>
<p>Luckily you can also take an existing deep network and train it to understand YOUR wakeup words. That means that it will not generalize as well to other persons, but maybe that is not that much of a problem. You might as well think of it as a feature, saying, that your assistant only listens to you and not your kids :). A solution of this form exists under the name <a href="https://snowboy.kitt.ai">snowboy</a>, where a couple of ex-Googlers created a startup that lets you create your own wakeup words, and then download those models. That is exactly what I did for this prototype. All you need to do is to go on the snowboy website and provide three samples of your wakeup-word. It then computes a model that you can download. You can also use their <a href="http://docs.kitt.ai/snowboy/#restful-api-calls">REST API</a> to do that, the idea here is that you can include this phase directly in your application making it very convenient for a user to set up his own wakeup- word. </p>
<pre><code class="language-python">#wakeup class 

import snowboydecoder
import sys
import signal

class Wakeup():
    def __init__(self):
        self.detector = snowboydecoder.HotwordDetector("betty.pmdl", sensitivity=0.5)
        self.interrupted = False
        self.wakeup()

    def signal_handler(signal, frame):
        self.interrupted = True

    def interrupt_callback(self):
        return self.interrupted

    def custom_callback(self):
        self.interrupted = True
        self.detector.terminate()
        return True

    def wakeup(self):
        self.interrupted = False
        self.detector.start(detected_callback=self.custom_callback, interrupt_check=self.interrupt_callback,sleep_time=0.03)
        return self.interrupted
</code></pre>
<p>All it needs then is to create a wakeup class that you might run from any other app that you include it in. In the code above you’ll notice that we included our downloaded model there (“betty.pmdl”) and the rest of the methods are there to interrupt the wakeup method once we hear the wakeup word.</p>
<p>We then included this class in your main application as a blocking call, meaning that whenever we hit the part where we are supposed to listen to the wakeup word, we will remain there unless we hear the word:</p>
<pre><code class="language-python">#integration into main app
...
            #record
            socketio.emit("showear")
            wakeup.Wakeup()
            socketio.emit("showmic")
            text = recognize.RecognizeSpeech('myspeech.wav', 2)
…</code></pre>
<p>So you noticed in the code above that we changed included the <em>wakeup.Wakeup()</em> call that now waits until the user has spoken the word, and only after that we then record 2 seconds of audio to send it to processing with wit.ai. In our testing that improved the user experience tremendously. You also see that we signall the listening to the user via graphical clues, by showing a little ear, when the app is listening for the wakeup word, and then showing a microphone when the app is ready is listening to your commands. </p>
<h3>Demo</h3>
<p>So finally time to show you the Tech-Demo. It gives you an idea how such an app might work and also hopefully gives you a starting point for new ideas and other improvements. While it's definitely not perfect it does its job and allows me to cook handsfree :). Mission accomplished! </p>
<figure class="embed-responsive embed-responsive--16/9"><iframe src="//player.vimeo.com/video/270594859" frameborder="0" webkitallowfullscreen="true" mozallowfullscreen="true" allowfullscreen="true"></iframe></figure>
<h2>What's next?</h2>
<p>In the first part of this blog post series we have seen quite an <a href="https://www.liip.ch/en/blog/betti-bossi-recipe-assistant-prototype-with-automatic-speech-recognition-asr-and-text-to-speech-tts-on-socket-io">extensive overview</a> over the current capabilities of TTS systems. While we have seen an abundance of options on the commercial side, sadly we didn’t find the same amount of sophisticated projects on the open source side. I hope this imbalance catches up in the future especially with the strong IoT movement, and the need to have these kind of technologies as an underlying stack for all kinds of smart assistant projects. Here is an <a href="https://www.kickstarter.com/projects/seeed/respeaker-an-open-modular-voice-interface-to-hack?lang=de">example</a> of a Kickstarter project for a small speaker with built in open source ASR and TTS.</p>
<p>In the <a href="https://www.liip.ch/en/blog/recipe-assistant-prototype-with-asr-and-tts-on-socket-io-part-2-ux-workshop">second blog post</a>, we discussed the user experience of audio centered assistants. We realized that going audio-only, might not always provide the best user experience, especially when the user is presented with a number of alternatives that he has to choise from. This was especially the case in the exploration phase, where you have to select a recipe and in the cooking phase where the user needs to go through the list of ingredients.  Given that the <a href="https://www.amazon.de/Amazon-Echo-2nd-Generation-Anthrazit-Stoff-/dp/B06ZXQV6P8">Alexas</a>, <a href="https://www.apple.com/homepod/">Homepods</a> and the <a href="https://www.digitec.ch/de/s1/product/google-home-weiss-grau-multiroom-system-6421169">Google Home</a> smart boxes are on their way to take over the audio-based home assistant area, I think that their usage will only make sense in a number of very simple to navigate domains, as in “Alexa play me something from Jamiroquai”. In more difficult domains, such as cooking, mobile phones might be an interesting alternative, especially since they are much more portable (they are mobile after all), offer a screen and almost every person already has one. </p>
<p>Finally in the last part of the series I have shown you how to integrate a number of solutions together - wit.ai for ASR, slowsoft for TTS, snowboy for wakeupword and socket.io and flask for prototyping - to create a nice working prototype of a hands free cooking assistant. I have uploaded the code on github, so feel free to play around with it to sketch your own ideas. For us a next step could be taking the prototype to the next level, by really building it as an app for the Iphone or Android system, and especially improve on the speed of the ASR. Here we might use the existing <a href="https://developer.apple.com/machine-learning/">coreML</a> or <a href="https://www.tensorflow.org/mobile/tflite/">tensorflow light</a> frameworks or check how well we could already use the inbuilt ASR capabilities of the devices. As a final key take away we realized that building a hands free recipe assistant definitely is something different, than simply having the mobile phone read out the recipe out loud for you. </p>
<p>As always I am looking forward to your comments and insights and hope to update you on our little project soon.</p>]]></description>
                  <enclosure url="http://liip.rokka.io/www_card_2/3cb44c/gadget-google-assistant-google-home-1072851.jpg" length="560390" type="image/jpeg" />
          </item>
        <item>
      <title>Recipe Assistant Prototype with Automatic Speech Recognition (ASR) and Text to Speech (TTS) on Socket.IO - Part 1 TTS Market Overview</title>
      <link>https://www.liip.ch/fr/blog/betti-bossi-recipe-assistant-prototype-with-automatic-speech-recognition-asr-and-text-to-speech-tts-on-socket-io</link>
      <guid>https://www.liip.ch/fr/blog/betti-bossi-recipe-assistant-prototype-with-automatic-speech-recognition-asr-and-text-to-speech-tts-on-socket-io</guid>
      <pubDate>Mon, 28 May 2018 00:00:00 +0200</pubDate>
      <description><![CDATA[<h2>Intro</h2>
<p>In one of our monthly innodays, where we try out new technologies and different approaches to old problems, we had the idea to collaborate with another company. Slowsoft is a provider of text to speech (TTS) solutions. To my knowledge they are the only ones who are able to generate Swiss German speech synthesis in various Swiss accents. We thought it would be a cool idea to combine it with our existing automatic speech recognition (ASR) expertise and build a cooking assistant that you can operate completely hands free. So no more touching your phone with your dirty fingers only to check again how many eggs you need for that cake. We decided that it would be great to go with some recipes from a famous swiss cookbook provider. </p>
<h2>Overview</h2>
<p>Generally there are quite a few text to speech solutions out there on the market. In the first out of two blog posts would like to give you a short overview of the available options. In the second blog post I will then describe at which insights we arrived in the UX workshop and how we then combined wit.ai with the solution from slowsoft in a quick and dirty web-app prototype built on socket.io and flask. </p>
<p>But first let us get an overview over existing text to speech (TTS) solutions. To showcase the performance of existing SaaS solutions I've chosen a random recipe from Betty Bossi and had it read by them:</p>
<pre><code class="language-text">Ofen auf 220 Grad vorheizen. Broccoli mit dem Strunk in ca. 1 1/2 cm dicke Scheiben schneiden, auf einem mit Backpapier belegten Blech verteilen. Öl darüberträufeln, salzen.
Backen: ca. 15 Min. in der Mitte des Ofens.
Essig, Öl und Dattelsirup verrühren, Schnittlauch grob schneiden, beigeben, Vinaigrette würzen.
Broccoli aus dem Ofen nehmen. Einige Chips mit den Edamame auf dem Broccoli verteilen. Vinaigrette darüberträufeln. Restliche Chips dazu servieren. </code></pre>
<h3>But first: How does TTS work?</h3>
<p>The classical way works like this: You have to record at least dozens of hours of raw speaker material in a professional studio. Depending on the task, the material can range from navigation instructions to jokes, depending on your use case. The next trick is called &quot;unit-selection&quot;, where recorded speech is sliced into a high number (10k - 500k) of elementary components called <a href="https://en.wikipedia.org/wiki/Phone">phones</a>, in order to be able to recombine those into new words, that the speaker has never recorded. The recombination of these components is not an easy task because the characteristics depend on the neighboring phonemes and the accentuation or <a href="https://en.wikipedia.org/wiki/Prosody">prosody</a>. These depend on a lot on the context. The problem is to find the right combination of these units that satisfy the input text and the accentuation and which can be joined together without generating glitches. The raw input text is first translated into a phonetic transcription which then serves as the input to selecting the right units from the database that are then concatenated into a waveform. Below is a great example from Apple's Siri <a href="https://machinelearning.apple.com/2017/08/06/siri-voices.html">engineering team</a> showing how the slicing takes place. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/3096e9/components.png" alt=""></figure>
<p>Using an algorithm called <a href="https://en.wikipedia.org/wiki/Viterbi_algorithm">Viterbi</a> the units are then concatenated in such a way that they create the lowest &quot;cost&quot;, in cost resulting from selecting the right unit and concatenating two units together. Below is a great conceptual graphic from Apple's engineering blog showing this cost estimation. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/166653/cost.png" alt=""></figure>
<p>Now in contrast to the classical way of TTS <a href="http://josesotelo.com/speechsynthesis/">new methods based on deep learning</a> have emerged. Here deep learning networks are used to predict the unit selection. If you are interested how the new systems work in detail, I highly recommend the <a href="https://machinelearning.apple.com/2017/08/06/siri-voices.html">engineering blog entry</a> describing how Apple crated the Siri voice. As a final note I'd like to add that there is also a format called <a href="https://de.wikipedia.org/wiki/Speech_Synthesis_Markup_Language">speech synthetisis markup language</a>, that allows users to manually specify the prosody for TTS systems, this can be used for example to put an emphasis on certain words, which is quite handy.  So enough with the boring theory, let's have a look at the available solutions.</p>
<h2>SaaS / Commercial</h2>
<h3>Google TTS</h3>
<p>When thinking about SaaS solutions, the first thing that comes to mind these days, is obviously Google's <a href="https://cloud.google.com/text-to-speech/">TTS solution</a> which they used to showcase Google's virtual assistant capabilities on this years Google IO conference. Have a look <a href="https://www.youtube.com/watch?v=d40jgFZ5hXk">here</a> if you haven't been wowed today yet. When you go to their website I highly encourage you to try out their demo with a German text of your choice. It really works well - the only downside for us was that it's not really Swiss German. I doubt that they will offer it for such a small user group - but who knows. I've taken a recipe and had it read by Google and frankly liked the output. </p>
<figure class="embed-responsive embed-responsive--16/9"><iframe src="//player.vimeo.com/video/270423560" frameborder="0" webkitallowfullscreen="true" mozallowfullscreen="true" allowfullscreen="true"></iframe></figure>
<h3>Azure Cognitive Services</h3>
<p>Microsoft also offers TTS as part of their Azure <a href="https://azure.microsoft.com/en-us/services/cognitive-services/speech/">cognitive services</a> (ASR, Intent detection, TTS). Similar to Google, having ASR and TTS from one provider, definitely has the benefit of saving us one roundtrip since normally you would need to perform the following trips:</p>
<ol>
<li>Send audio data from client to server, </li>
<li>Get response to client (dispatch the message on the client)</li>
<li>Send our text to be transformed to speech (TTS) from client to server </li>
<li>Get the response on client. Play it to the user.</li>
</ol>
<p>Having ASR and TTS in one place reduces it to:</p>
<ol>
<li>ASR From client to server. Process it on the server. </li>
<li>TTS response to client. Play it to the user.</li>
</ol>
<p>Judging the speech synthesis quality, I personally I think that Microsoft's solution didn't sound as great as Googles synthesis. But have a look for yourself. </p>
<figure class="embed-responsive embed-responsive--16/9"><iframe src="//player.vimeo.com/video/270423598" frameborder="0" webkitallowfullscreen="true" mozallowfullscreen="true" allowfullscreen="true"></iframe></figure>
<h3>Amazon Polly</h3>
<p>Amazon - having placed their bets on Alexa - of course has a sophisticated TTS solution, which they call <a href="https://console.aws.amazon.com/polly/home/SynthesizeSpeech">Polly</a>. I love the name :). To be where they are now, they have acquired a startup called Ivona already back in 2013, which were back then producing state of the art TTS solutions. Having tried it I liked the soft tone and the fluency of the results. Have a check yourself:</p>
<figure class="embed-responsive embed-responsive--16/9"><iframe src="//player.vimeo.com/video/270423539" frameborder="0" webkitallowfullscreen="true" mozallowfullscreen="true" allowfullscreen="true"></iframe></figure>
<h3>Apple Siri</h3>
<p>Apple offers TTS as part of their iOS SDK in the name of <a href="https://developer.apple.com/sirikit/">SikiKit</a>. I haven’t had the chance yet to play in depth with it. Wanting to try it out I made the error to think that apples TTS solution on the Desktop is the same as SiriKit. Yet SiriKit is nothing like the built in TTS on the MacOS. To have a bit of a laugh on your Macbook you can do a really poor TTS in the command line you can simply use a command:</p>
<pre><code class="language-bash">say -v fred "Ofen auf 220 Grad vorheizen. Broccoli mit dem Strunk in ca. 1 1/2 cm dicke Scheiben schneiden, auf einem mit Backpapier belegten Blech verteilen. Öl darüberträufeln, salzen.
Backen: ca. 15 Min. in der Mitte des Ofens."</code></pre>
<p>While the output sounds awful, below is the same text read by Siri on the newest iOS 11.3. That shows you how far TTS systems have evolved in the last years. Sorry for the bad quality but somehow it seems impossible to turn off the external microphone when recording on an IPhone. </p>
<figure class="embed-responsive embed-responsive--16/9"><iframe src="//player.vimeo.com/video/270441878" frameborder="0" webkitallowfullscreen="true" mozallowfullscreen="true" allowfullscreen="true"></iframe></figure>
<h3>IBM Watson</h3>
<p>In this arms race IBM also offers a TTS system, with a way to also define the prosody manually, using the <a href="https://de.wikipedia.org/wiki/Speech_Synthesis_Markup_Language">SSML markup language standard</a>. I didn't like their output in comparison to the presented alternatives, since it sounded quite artificial in comparison. But give it a try for yourself.</p>
<figure class="embed-responsive embed-responsive--16/9"><iframe src="//youtube.com/embed/2Er2xl7MPBo" frameborder="0" webkitallowfullscreen="true" mozallowfullscreen="true" allowfullscreen="true"></iframe></figure>
<h3>Other commercial solutions</h3>
<p>Finally there are also competitors beyond the obvious ones such as <a href="https://www.nuance.com">Nuance</a> (formerly Scansoft - originating from Xerox research). Despite their page promising a <a href="http://ttssamples.syntheticspeech.de/ttsSamples/nuance-zoe-news-1.mp3">lot</a>, I found the quality of the TTS in German to be a bit lacking. </p>
<figure class="embed-responsive embed-responsive--16/9"><iframe src="//player.vimeo.com/video/270423596" frameborder="0" webkitallowfullscreen="true" mozallowfullscreen="true" allowfullscreen="true"></iframe></figure>
<p>Facebook doesn't offer a TTS solution, yet - maybe they have rather put their bets on Virtual Reality instead. Other notable solutions are <a href="http://www.acapela-group.com/">Acapella</a>, <a href="http://www.innoetics.com">Innoetics</a>, <a href="http://www.onscreenvoices.com">TomWeber Software</a>, <a href="https://www.aristech.de/de/">Aristech</a> and <a href="https://slowsoft.ch">Slowsoft</a> for Swiss TTS.</p>
<h2>OpenSource</h2>
<p>Instead of providing the same kind of overview for the open source area, I think it's easier to list a few projects and provide a sample of the synthesis. Many of these projects are academic in nature, and often don't give you all the bells and whistles and fancy APIs like the commercial products, but with some dedication could definitely work if you put your mind to it.  </p>
<ul>
<li><a href="http://espeak.sourceforge.net">Espeak</a>. <a href="http://ttssamples.syntheticspeech.de/ttsSamples/espeak-s1.mp3">sample</a> - My personal favorite. </li>
<li><a href="http://www.speech.cs.cmu.edu/flite/index.html">Festival</a> a project from the CMU university, focused on portability. No sample.</li>
<li><a href="http://mary.dfki.de">Mary</a>. From the german &quot;Forschungszentrum für Künstliche Intelligenz&quot; DKFI. <a href="http://ttssamples.syntheticspeech.de/ttsSamples/pavoque_s1.mp3">sample</a></li>
<li><a href="http://tcts.fpms.ac.be/synthesis/mbrola.html">Mbrola</a> from the University of Mons <a href="http://ttssamples.syntheticspeech.de/ttsSamples/de7_s1.mp3">sample</a></li>
<li><a href="http://tundra.simple4all.org/demo/index.html">Simple4All</a> - a EU funded Project. <a href="http://ttssamples.syntheticspeech.de/ttsSamples/simple4all_s1.mp3">sample</a></li>
<li><a href="https://mycroft.ai">Mycroft</a>. More of an open source assistant, but runs on the Raspberry Pi.</li>
<li><a href="https://mycroft.ai/documentation/mimic/">Mimic</a>. Only the TTS from the Mycroft project. No sample available.</li>
<li>Mozilla has published over 500 hours of material in their <a href="https://voice.mozilla.org/de/data">common voice project</a>. Based on this data they offer a deep learning ASR project <a href="https://github.com/mozilla/DeepSpeech">Deep Speech</a>. Hopefully they will offer TTS based on this data too someday. </li>
<li><a href="http://josesotelo.com/speechsynthesis/">Char2Wav</a> from the University of Montreal (who btw. maintain the theano library). <a href="http://josesotelo.com/speechsynthesis/files/wav/pavoque/original_best_bidirectional_text_0.wav">sample</a></li>
</ul>
<p>Overall my feeling is that unfortunately most of the open source systems have not yet caught up with the commercial versions. I can only speculate about the reasons, as it might take a significant amount of good raw audio data to produce comparable results and a lot of fine tuning on the final model for each language. For an elaborate overview of all TTS systems, especially the ones that work in German, I highly recommend to check out the <a href="http://ttssamples.syntheticspeech.de">extensive list</a> that Felix Burkhardt from the Technical University of Berlin has compiled. </p>
<p>That sums up the market overview of commercial and open source solutions. Overall I was quite amazed how fluent some of these solutions sounded and think the technology is ready to really change how we interact with computers. Stay tuned for the next blog post where I will explain how we put one of these solutions to use to create a hands free recipe reading assistant.</p>]]></description>
                  <enclosure url="http://liip.rokka.io/www_card_2/29d939/baking-bread-knife-brown-162786.jpg" length="2948380" type="image/jpeg" />
          </item>
        <item>
      <title>Progressive web apps, Meteor, Azure and the Data science stack or The future of web development conference.</title>
      <link>https://www.liip.ch/fr/blog/progressive-web-apps-meteor-azure-and-the-data-science-stack-or-the-future-of-web-development-conference</link>
      <guid>https://www.liip.ch/fr/blog/progressive-web-apps-meteor-azure-and-the-data-science-stack-or-the-future-of-web-development-conference</guid>
      <pubDate>Wed, 09 May 2018 00:00:00 +0200</pubDate>
      <description><![CDATA[<h3>Back to the future</h3>
<p>Although the conference (hosted in Zürich last week in the Crown Plaza) had explicitly the word future in the title, I found that often the new trends felt a bit like &quot;back to the future&quot;. Why ? Because it seems that some rather old concepts like plain old SQL, &quot;offline first&quot; or pure javascript frameworks or are making a comeback in web development - but with a twist.  This already  brings us to the first talk. </p>
<h3>Modern single page apps with meteor</h3>
<figure><img src="https://liip.rokka.io/www_inarticle/ce3196/meteor.png" alt=""></figure>
<p>Timo Horstschaefer from <a href="https://www.ledgy.com">Ledgy</a> showed how to create modern single page apps with <a href="https://www.meteor.com">meteor.js</a>. Although every framework promises to &quot;ship more with less code&quot;, he showed that for their project Ledgy - which is a mobile app to allocate shares among stakeholders - they were able to actually write it in less than 3 months using 13'000 lines of code. In comparison to other web frameworks where there is a backend side, that is written in one language (e.g. ruby - rails, python - django etc..) and a js-heavy frontend framework (e.g. react or angular) meteor does things differently by also offering a tightly coupled frontend and a backend part written purely in js. The backend is mostly a node component. In their case it is really slim, by only having 500 lines of code. It is mainly responsible for data consistency and authentication, while all the other logic simply runs in the client. Such client projects really shine especially when having to deal with shaky Internet connections, because meteor takes care of all the data transmission in the backend, and catches up on the changes once it has regained accessibility. Although meteor seemed to have had a rough patch in the community in 2015 and 2016 it is heading for a strong come back. The framework is highly opinionated, but I personally really liked the high abstraction level, which seemed to allow the team a blazingly fast time to market. A quite favorable development seems to be that Meteor is trying to open up beyond MongoDB as a database by offering their own GraphQL client (Apollo) that even outshines Facebook's own client, and so offers developers freedom on the choice of a database solution.</p>
<p>I highly encourage you to have a look at Timo's <a href="http://mypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC/dms/File/Moderne%20Single%20Page-Apps%20mit%20Meteor%20_%20Timo.pdf">presentation.</a> </p>
<h3>The data science stack</h3>
<figure><img src="https://liip.rokka.io/www_inarticle/8b4877/datastack.png" alt=""></figure>
<p>Then it was my turn to present the data science stack. I won't bother you about the contents of my talk, since I've already blogged about it in detail <a href="https://www.liip.ch/en/blog/the-data-science-stack-2018">here</a>. If you still want to  have a look at the presentation, you can <a href="http://mypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC/dms/File/Liip%20Data%20Stack.pdf">download</a> it of course. In the talk offered a very subjective birds eyes view on how the data centric perspective touches modern web standards. An interesting feedback from the panel was the question if such an overview really helps our developers to create better solutions. I personally think that having such maps or collections for orientation helps especially people in junior positions to expand their field of view. I think it might also help senior staff to look beyond their comfort zone, and overcome the saying &quot;if everything you have is a hammer, then every problem looks like a nail to you&quot; - so using the same set of tools for every project. Yet I think the biggest benefit might be to offer the client a really unbiased perspective on his options, of which he might have many more than some big vendors are trying to make him believe. </p>
<h3>From data science stack to data stack</h3>
<figure><img src="https://liip.rokka.io/www_inarticle/ed727f/azure.png" alt=""></figure>
<p>Meinrad Weiss from Microsoft offered interesting insights into a glimpse of the Azure universe, showing us the many options on how data can be stored in an azure cloud. While some facts were indeed surprising, for example Microsoft being unable to find two data centers that were more than 400 miles apart in Switzerland (apparently the country is too small!) other facts like the majority of clients still operating in the SQL paradigm were less surprising. One thing that really amazed me was their &quot;really big&quot; storage solution so basically everything beyond 40 peta!-bytes: The data is spread into 60! storage blobs that operate independently of the computational resources, which can be scaled for demand on top of the data layer. In comparison to a classical hadoop stack where the computation and the data are baked into one node, here the customer can scale up his computational power temporarily and then scale it down after he has finished his computations, so saving a bit of money. In regards to the bill though such solutions are not cheap - we are talking about roughly 5 digits per month entrance price, so not really the typical KMU scenario. Have a look at the <a href="http://mypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC/dms/File/AzureAndData_Meinrad_Microsoft.pdf">presentation</a> if you want a quick refresher on current options for big data storage at Microsoft Azure. An interesting insight was also that while a lot of different paradigms have emerged in the last years, Microsoft managed to include them all (e.g. Gremlin Graph, Cassandra, MongoDB) in their database services unifying their interfaces in one SQL endpoint. </p>
<h3>Offline First or progressive web apps</h3>
<figure><img src="https://liip.rokka.io/www_inarticle/7a4898/pwa.png" alt=""></figure>
<p>Nicro Martin, a leading Web and Frontend Developer from the <a href="https://sayhelloagency.com">Say Hello</a> agency showcased how the web is coming back to mobile again. Coming back? Yes you heard right. If thought you were doing mobile first for many years now, you are right to ask why it is coming back. As it turns out (according to a recent comscore report from 2017) although people are indeed using their mobile heavily, they are spending 87% of their time inside apps and not browsing the web. Which might be surprising. On the other hand though, while apps seem to dominate the mobile usage, more than 50% of people don't install any new apps on their phone, simply because they are happy with the ones the have. Actually they spend 80% of their time in the top 3 apps. That poses a really difficult problem for new apps - how can they get their foot into the door with such a highly habitualized behavior. One potential answer might be <a href="https://developers.google.com/web/progressive-web-apps/">Progressive Web apps</a>, a standard defined by Apple and Google already quite a few years ago, that seeks to offer a highly responsive and fast website behavior that feels almost like an application. To pull this off, the main idea is that a so called &quot;service worker&quot; - a piece of code that is installed on the mobile and continues running in the background - is making it possible for these web apps  to for example send notifications to users while she is not using the website. So rather something that users know from their classical native apps. Another very trivial benefit is that you can install these apps on your home screen, and by tapping them it feels like really using an app and not browsing a website (e.g. there is no browser address bar). Finally the whole website can operate in offline mode too, thanks to a smart caching mechanism, that allows developers to decide what to store on the mobile in contrast to what the browser cache normally does. If you feel like trying out one of these apps I highly recommend to try out <a href="http://mobile.twitter.com">mobile.twitter.com</a>, where Google and Twitter sat together and tried to showcase everything that is possible with this new technology. If you are using an Android phone, these apps should work right away, but if you are using an Apple phone make sure to at least have the most recent update 11.3 that finally supports progressive apps for apple devices. While Apple slightly opened the door to PWAs I fear that their lack of support for the major features might have something to do with politics. After all, developers circumventing the app store and interacting with their customers without an intermediary doesn’t leave much love for Apples beloved app store.  Have a look at Martin's great <a href="https://slides.nicomartin.ch/pwa-internet-briefing.html">presentation</a> here. </p>
<h3>Conclusion</h3>
<p>Overall although the topics were a bit diverse, but I definitely enjoyed the conference. A big thanks goes to the organizers of <a href="http://internet-briefing.ch">Internet Briefing series</a> who do an amazing job of constantly organizing those conferences in a monthly fashing. These are definitely a good way to exchange best practices and eventually learn something new. For me it was the motivation to finally get my hands dirty with progressive web apps, knowing that you don't really need much to make these work.  </p>
<p>As usual I am happy to hear your comments on these topics and hope that you enjoyed that little summary.</p>]]></description>
                  <enclosure url="http://liip.rokka.io/www_card_2/25d9a4/abstract-art-colorful-942317.jpg" length="1981006" type="image/jpeg" />
          </item>
        <item>
      <title>Sentiment detection with Keras, word embeddings and LSTM deep learning networks</title>
      <link>https://www.liip.ch/fr/blog/sentiment-detection-with-keras-word-embeddings-and-lstm-deep-learning-networks</link>
      <guid>https://www.liip.ch/fr/blog/sentiment-detection-with-keras-word-embeddings-and-lstm-deep-learning-networks</guid>
      <pubDate>Fri, 04 May 2018 00:00:00 +0200</pubDate>
      <description><![CDATA[<h3>Overview SaaS</h3>
<p>When it comes to sentiment detection it has become a bit of a commodity. Especially the big 5 vendors offer their own sentiment detection as a service. Google offers an <a href="https://cloud.google.com/natural-language/docs/sentiment-tutorial">NLP API</a> with sentiment detection. Microsoft offers sentiment detection through their <a href="https://azure.microsoft.com/en-us/services/cognitive-services/text-analytics/">Azure</a> platform. IBM has come up with a solution called <a href="https://www.ibm.com/watson/services/tone-analyzer/">Tone Analyzer</a>, that tries to get the &quot;tone&quot; of the message, which goes a bit beyond sentiment detection. Amazon offers a solution called <a href="https://aws.amazon.com/de/blogs/machine-learning/detect-sentiment-from-customer-reviews-using-amazon-comprehend/">comprehend</a> that runs on aws as a lambda. Facebook surprisingly doesn't offer an API or an open source project here, although they are the ones with user generated content, where people often are not <a href="https://www.nzz.ch/digital/facebook-fremdenfeindlichkeit-hass-kommentare-ld.1945">so nice</a> to each other. Interestingly they do not offer any assistance for page owners in that specific matter.</p>
<p>Beyond the big 5 there are a few noteworthy of companies like <a href="https://aylien.com">Aylien</a> and <a href="https://monkeylearn.com">Monkeylearn</a>, that are worth checking out. </p>
<h3>Overview Open Source Solutions</h3>
<p>Of course there are are open source solutions or libraries that offer sentiment detection too.<br />
Generally all of these tools offer more than just sentiment analysis. Most of the outlined SaaS solutions above as well as the open source libraries offer a vast amount of different NLP tasks:</p>
<ul>
<li>part of speech tagging (e.g. &quot;going&quot; is a verb), </li>
<li>stemming (finding the &quot;root&quot; of a word e.g. am,are,is -&gt; be), </li>
<li>noun phrase extraction (e.g. car is a noun), </li>
<li>tokenization (e.g. splitting text into words, sentences), </li>
<li>words inflections (e.g. what's the plural of atlas), </li>
<li>spelling correction and translation. </li>
</ul>
<p>I like to point you to pythons <a href="http://text-processing.com/demo/sentiment/">NLTK library</a>, <a href="http://textblob.readthedocs.io/en/dev/quickstart.html#sentiment-analysis">TextBlob</a>, <a href="https://www.clips.uantwerpen.be/pages/pattern-en#sentiment">Pattern</a> or R's <a href="https://cran.r-project.org/web/packages/tm/index.html">Text Mining</a> module and Java's <a href="http://alias-i.com/lingpipe/demos/tutorial/sentiment/read-me.html">LingPipe</a> library. Finally, I encourage you to have a look at the latest <a href="https://spacy.io">Spacy NLP suite</a>, which doesn't offer sentiment detection per se but has great NLP capabilities. </p>
<p>If you are looking for more options I encourage you to take a look at the full list that I have compiled in our <a href="http://datasciencestack.liip.ch/#nlp">data science stack</a>. </p>
<h3>Let's get started</h3>
<p>So you see, when you need sentiment analysis in your web-app or mobile app you already have a myriad of options to get started. Of course you might build something by yourself if your language is not supported or you have other legal compliances to meet when it comes to data privacy.</p>
<p>Let me walk you through all of the steps needed to make a well working sentiment detection with <a href="https://keras.io">Keras</a> and <a href="https://de.wikipedia.org/wiki/Long_short-term_memory">long short-term memory networks</a>. Keras is a very popular python deep learning library, similar to <a href="http://tflearn.org">TFlearn</a> that allows to create neural networks without writing too much boiler plate code. LSTM networks are a special form or network architecture especially useful for text tasks which I am going to explain later. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/30a13b/keras.png" alt=""></figure>
<h3>Step 1: Get the data</h3>
<p>Being a big movie nerd, I have chosen to classify IMDB reviews as positive or negative for this example. As a benefit the IMDB sample comes already with the Keras <a href="https://keras.io/datasets/">datasets</a> library, so you don't have to download anything. If you are interested though, not a lot of people know that IMDB offers its <a href="https://www.imdb.com/interfaces/">own datasets</a> which can be <a href="https://datasets.imdbws.com">downloaded</a> publicly. Among those we are interested in the ones that contain movie reviews, which have been marked by hand to be either positive or negative. </p>
<pre><code class="language-python">#download the data
from keras.datasets import imdb 
top_words = 5000 
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=top_words)</code></pre>
<p>The code above does a couple of things at once: </p>
<ol>
<li>It downloads the data </li>
<li>It downloads the first 5000 top words for each review </li>
<li>It splits the data into a test and a training set. </li>
</ol>
<figure><img src="https://liip.rokka.io/www_inarticle/fb9a1c/processed.png" alt=""></figure>
<p>If you look at the data you will realize it has been already pre-processed. All words have been mapped to integers and the integers represent the words sorted by their frequency. This is very common in text analysis to represent a dataset like this. So 4 represents the 4th most used word, 5 the 5th most used word and so on... The integer 1 is reserved reserved for the start marker, the integer 2 for an unknown word and 0 for padding. </p>
<p>If you want to peek at the reviews yourself and see what people have actually written, you can reverse the process too:</p>
<pre><code class="language-python">#reverse lookup
word_to_id = keras.datasets.imdb.get_word_index()
word_to_id = {k:(v+INDEX_FROM) for k,v in word_to_id.items()}
word_to_id["&lt;PAD&gt;"] = 0
word_to_id["&lt;START&gt;"] = 1
word_to_id["&lt;UNK&gt;"] = 2
id_to_word = {value:key for key,value in word_to_id.items()}
print(' '.join(id_to_word[id] for id in train_x[0] ))</code></pre>
<p>The output might look like something like this:</p>
<pre><code class="language-python">&lt;START&gt; this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert &lt;UNK&gt; is an amazing actor and now the same being director &lt;UNK&gt; father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for &lt;UNK&gt; and would recommend it to everyone to watch and the fly &lt;UNK&gt; was amazing really cried at the end it was so sad and you know w</code></pre>
<h3>One-hot encoder</h3>
<p>If you want to do the same with your text (e.g. my example are some work reviews) you can use Keras already built in &quot;one-hot&quot; encoder feature that will allow you to encode your documents with integers. The method is quite useful since it will remove any extra marks (e.g. !&quot;#$%&amp;...) and split sentences into words by space and transform the words into lowercase. </p>
<pre><code class="language-python">#one hot encode your documents
from numpy import array
from keras.preprocessing.text import one_hot
docs = ['Gut gemacht',
        'Gute arbeit',
        'Super idee',
        'Perfekt erledigt',
        'exzellent',
        'naja',
        'Schwache arbeit.',
        'Nicht gut',
        'Miese arbeit.',
        'Hätte es besser machen können.']
# integer encode the documents
vocab_size = 50
encoded_docs = [one_hot(d, vocab_size) for d in docs]
print(encoded_docs)</code></pre>
<p>Although the encoding will not be sorted like in our example before (e.g. lower numbers representing more frequent words), this will still give you a similar output:</p>
<pre><code>[[18, 6], [35, 39], [49, 46], [41, 39], [25], [16], [11, 39], [6, 18], [21, 39], [15, 23, 19, 41, 25]]</code></pre>
<h3>Step 2: Preprocess the data</h3>
<p>Since the reviews differ heavily in terms of lengths we want to trim each review to its first 500 words. We need to have text samples of the same length in order to feed them into our neural network. If reviews are shorter than 500 words we will pad them with zeros. Keras being super nice, offers a set of <a href="https://keras.io/preprocessing/text/">preprocessing</a> routines that can do this for us easily. </p>
<pre><code class="language-python"># Truncate and pad the review sequences 
from keras.preprocessing import sequence 
max_review_length = 500 
X_train = sequence.pad_sequences(X_train, maxlen=max_review_length) 
X_test = sequence.pad_sequences(X_test, maxlen=max_review_length) </code></pre>
<figure><img src="https://liip.rokka.io/www_inarticle/27e1ad/padded.png" alt=""></figure>
<p>As you see above (I've just output the padded Array as a pandas dataframe for visibility) a lot of the reviews have padded 0 at the front which means, that the review is shorter than 500 words. </p>
<h3>Step 3: Build the model</h3>
<p>Surprisingly we are already done with the data preparation and can already start to build our model. </p>
<pre><code class="language-python"># Build the model 
embedding_vector_length = 32 
model = Sequential() 
model.add(Embedding(top_words, embedding_vector_length, input_length=max_review_length)) 
model.add(LSTM(100)) 
model.add(Dense(1, activation='sigmoid')) 
model.compile(loss='binary_crossentropy',optimizer='adam', metrics=['accuracy']) 
print(model.summary()) </code></pre>
<p>The two most important things in our code are the following:</p>
<ol>
<li>The Embedding layer and </li>
<li>The LSTM Layer. </li>
</ol>
<p>Lets cover what both are doing. </p>
<h3>Word embeddings</h3>
<p>The embedding layer will learn a word embedding for all the words in the dataset. It has three arguments the input_dimension in our case the 500 words. The output dimension aka the vector space in which words will be embedded. In our case we have chosen 32 dimensions so a vector of the length of 32 to hold our word coordinates. </p>
<p>There are already pre-trained word embeddings (e.g. GloVE or <a href="https://radimrehurek.com/gensim/models/word2vec.html">Word2Vec</a>) that you can <a href="https://nlp.stanford.edu/projects/glove/">download</a> so that you don't have to train your embeddings all by yourself. Generally, these word embeddings are also based on specialized algorithms that do the embedding always a bit different, but we won't cover it here. </p>
<p>How can you imagine what an  embedding actually is? Well generally words that have a similar meaning in the context should be embedded next to each other. Below is an example of word embeddings in a two-dimensional space:</p>
<figure><img src="https://liip.rokka.io/www_inarticle/88d44e/embeddings.png" alt=""></figure>
<p>Why should we even care about word embeddings? Because it is a really useful trick. If we were to feed our reviews into a neural network and just one-hot encode them we would have very sparse representations of our texts. Why? Let us have a look at the sentence &quot;I do my job&quot; in &quot;bag of words&quot; representation with a vocabulary of 1000: So a matrix that holds 1000 words (each column is one word), has four ones in it (one for <strong>I</strong>, one for <strong>do</strong> one for <strong>my</strong> and one for <strong>job</strong>) and 996 zeros. So it would be very sparse. This means that learning from it would be difficult, because we would need 1000 input neurons each representing the occurrence of a word in our sentence. </p>
<p>In contrast if we do a word embedding we can fold these 1000 words in just as many dimensions as we want, in our case 32. This means that we just have an input vector of 32 values instead of 1000. So the word &quot;I&quot; would be some vector with values (0.4,0.5,0.2,...) and the same would happen with the other words. With word embedding like this, we just need 32 input neurons. </p>
<h3>LSTMs</h3>
<p>Recurrent neural networks are networks that are used for &quot;things&quot; that happen recurrently so one thing after the other (e.g. time series, but also words). Long Short-Term Memory networks (LSTM) are a specific type of Recurrent Neural Network (RNN) that are capable of learning the relationships between elements in an input sequence. In our case the elements are words. So our next layer is an LSTM layer with 100 memory units.</p>
<p>LSTM networks maintain a state, and so overcome the problem of a vanishing gradient problem in recurrent neural networks (basically the problem that when you make a network deep enough the information for learning will &quot;vanish&quot; at some point). I do not want to go into detail how they actually work, but <a href="http://colah.github.io/posts/2015-08-Understanding-LSTMs/">here</a> delivers a great visual explanation. Below is a schematic overview over the building blocks of LSTMs.</p>
<p>So our output of the embedding layer is a 500 times 32 matrix. Each word is represented through its position in those 32 dimensions. And the sequence is the 500 words that we feed into the LSTM network. </p>
<p>Finally at the end we have a dense layer with one node with a sigmoid activation as the output. </p>
<p>Since we are going to have only the decision when the review is positive or negative we will use binary_crossentropy for the loss function. The optimizer is the standard one (adam) and the metrics are also the standard accuracy metric. </p>
<p>By the way, if you want you can build a sentiment analysis without LSTMs, then you simply need to replace it by a flatten layer:</p>
<pre><code class="language-python">#Replace LSTM by a flatten layer
#model.add(LSTM(100)) 
model.add(Flatten()) </code></pre>
<h3>Step 4: Train the model</h3>
<p>After defining the model Keras gives us a summary of what we have built. It looks like this:</p>
<pre><code class="language-python">#Summary from Keras
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_1 (Embedding)      (None, 500, 32)           160000    
_________________________________________________________________
lstm_1 (LSTM)                (None, 100)               53200     
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 101       
=================================================================
Total params: 213,301
Trainable params: 213,301
Non-trainable params: 0
_________________________________________________________________
None</code></pre>
<p>To train the model we simply call the fit function,supply it with the training data and also tell it which data it can use for validation. That is really useful because we have everything in one call. </p>
<pre><code class="language-python">#Train the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), nb_epoch=3, batch_size=64) </code></pre>
<p>The training of the model might take a while, especially when you are only running it on the CPU instead of the GPU. When the model training happens, what you want to observe is the loss function, it should constantly be going down, this shows that the model is improving. We will make the model see the dataset 3 times, defined by the epochs parameter. The batch size defines how many samples the model will see at once - in our case 64 reviews. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/1868ba/training.png" alt=""></figure>
<p>To observe the training you can fire up tensor board which will run in the browser and give you a lot of different analytics, especially the loss curve in real time. To do so type in your console:</p>
<pre><code class="language-bash">sudo tensorboard --logdir=/tmp</code></pre>
<h3>Step 5: Test the model</h3>
<p>Once we have finished training the model we can easily test its accuracy. Keras provides a very handy function to do that:</p>
<pre><code class="language-python">#Evaluate the model
scores = model.evaluate(X_test, y_test, verbose=0) 
print("Accuracy: %.2f%%" % (scores[1]*100))</code></pre>
<p>In our case the model achieved an accuracy of around 90% which is excellent, given the difficult task. By the way if you are wondering what the results would have been with the Flatten layer it is also around 90%. So in this case I would use <a href="https://en.wikipedia.org/wiki/Occam%27s_razor">Occam's razor</a> and in case and in doubt: go with the simpler model.</p>
<h3>Step 6: Predict something</h3>
<p>Of course at the end we want to use our model in an application. So we want to use it to create predictions. In order to do so we need to translate our sentence into the corresponding word integers and then pad it to match our data. We can then feed it into our model and see if how it thinks we liked or disliked the movie.</p>
<pre><code class="language-python">#predict sentiment from reviews
bad = "this movie was terrible and bad"
good = "i really liked the movie and had fun"
for review in [good,bad]:
    tmp = []
    for word in review.split(" "):
        tmp.append(word_to_id[word])
    tmp_padded = sequence.pad_sequences([tmp], maxlen=max_review_length) 
    print("%s. Sentiment: %s" % (review,model.predict(array([tmp_padded][0]))[0][0]))
i really liked the movie and had fun. Sentiment: 0.715537
this movie was terrible and bad. Sentiment: 0.0353295</code></pre>
<p>In this case a value close to 0 means the sentiment was negative and a value close to 1 means its a positive review. You can also use &quot;model.predict_classes&quot; to just get the classes of positive and negative. </p>
<h3>Conclusion or what’s next?</h3>
<p>So we have built quite a cool sentiment analysis for IMDB reviews that predicts if a movie review is positive or negative with 90% accuracy. With this we are already <a href="https://en.wikipedia.org/wiki/Sentiment_analysis">quite close</a> to industry standards. This means that in comparison to a <a href="https://www.liip.ch/en/blog/whats-your-twitter-mood">quick prototype</a> that a colleague of mine built a few years ago we could potentially improve on it now. The big benefit while comparing our self-built solution with an SaaS solution on the market is that we own our data and model. We can now deploy this model on our own infrastructure and use it as often as we like. Google or Amazon never get to see sensitive customer data, which might be relevant for certain business cases. We can train it with German or even Swiss German language given that we find a nice dataset, or simply build one ourselves. </p>
<p>As always I am looking forward to your comments and insights! As usual you can download the Ipython notebook with the code <a href="https://github.com/plotti/keras_sentiment/blob/master/Imdb%20Sentiment.ipynb">here</a>.</p>
<p>P.S. The people from monkeylearn contacted me and pointed out that they have written quite an extensive introduction to sentiment detection here: <a href="https://monkeylearn.com/sentiment-analysis/">https://monkeylearn.com/sentiment-analysis/</a> so I point you to that in case you want to read up on the general concepts.</p>]]></description>
                  <enclosure url="http://liip.rokka.io/www_card_2/674f1c/clamp-clips-close-up-160824.jpg" length="2751344" type="image/jpeg" />
          </item>
        <item>
      <title>Tensorflow and TFlearn or can deep learning predict if DiCaprio could have survived the Titanic?</title>
      <link>https://www.liip.ch/fr/blog/tensorflow-and-tflearn-or-can-deep-learning-predict-if-dicaprio-could-have-survived-the-titanic</link>
      <guid>https://www.liip.ch/fr/blog/tensorflow-and-tflearn-or-can-deep-learning-predict-if-dicaprio-could-have-survived-the-titanic</guid>
      <pubDate>Wed, 25 Apr 2018 00:00:00 +0200</pubDate>
      <description><![CDATA[<p>Getting your foot into deep learning might feel weird, since there is so much going on at the same time. </p>
<ul>
<li>First, there are myriads of frameworks like <a href="http://tensorflow.org">tensorflow</a>, <a href="https://caffe2.ai">caffe2</a>, <a href="http://torch.ch">torch</a>, <a href="http://www.deeplearning.net/software/theano/">theano</a> and <a href="https://github.com/Microsoft/cntk">Microsofts open source deep learning toolkit CNTK</a>. </li>
<li>Second, there are dozens of different ideas how networks can be put to work e.g <a href="https://en.wikipedia.org/wiki/Recurrent_neural_network">recurrent neural networks</a>, <a href="https://en.wikipedia.org/wiki/Long_short-term_memory">long short-term memory networks</a>, <a href="https://de.wikipedia.org/wiki/Generative_Adversarial_Networks">generative adversarial networks</a>, <a href="https://de.wikipedia.org/wiki/Convolutional_Neural_Network">convolutional neural networks</a>. </li>
<li>And then finally there are even more frameworks on top of these frameworks such as <a href="https://keras.io">keras</a>, <a href="http://tflearn.org">tflearn</a>. </li>
</ul>
<p>In this blogpost I thought I'd just take the subjectively two most popular choices Tensorflow and Tflearn and show you how they work together. We won't put much emphasis on the network layout, its going to be a plain vanilla two hidden layers fully connected network. </p>
<p>Tensorflow is the low-level library for deep learning. If you want you could just use this library, but then you need to write way more boilerplate code. Since I am not such a big fan of boilerplate (hi java) we are not going to do this.  Instead we will use Tflearn. Tflearn used to be an own opensource library that provides an abstraction on top of tensorflow. Last year Google integrated that project very densely with Tensorflow to make the learning curve less steep and the handling more convenient. </p>
<p>I will use both to predict the survival rate in a commonly known data set called the <a href="https://www.kaggle.com/c/titanic">Titanic Dataset</a>. Beyond this dataset there are of course <a href="http://scikit-learn.org/stable/datasets/index.html">myriads</a> of such classical sets of which the most popular is the <a href="https://www.kaggle.com/uciml/iris">Iris dataset</a>. It  handy to know these datasets, since a lot of tutorials are built around those sets, so when you are trying to figure out how something works you can google for those and the method you are trying to apply. A bit like the hello world of programming. Finally since these sets are well studied you can try the methods shown in the blogposts on other datasets and compare your results with others. But let’s focus on our Titanic dataset first.</p>
<h3>Goal: Predict survivors on the Titanic</h3>
<p>Being the most famous shipwreck in history, the Titanic sank after colliding with an iceberg on 15.04 in the year 1912. From all the 2224 passengers almost 1502 died, because there were not enough lifeboats for the passengers and the crew. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/781d2e/titanic-infographic-11.jpg" alt="Titanic"></figure>
<p>Now we could say it was shear luck who survived and who sank, but we could be also a be more provocative and say that some groups of people were more likely to survive than others, such as women, children or ... the upper-class. </p>
<p>Now making such crazy assumptions about the upper class is not worth a dime, if we cannot back it up with data. In our case instead of doing boring descriptive statistics, we will train a machine learning model with Tensorflow and Tflearn that will predict survival rates for Leo DiCaprio and Kate Winslet for us. </p>
<h3>Step 0: Prerequisites</h3>
<p>To follow along in this tutorial you will obviously need the titanic data set (which can be automatically downloaded by Tflearn) and both a working Tensorflow and Tflearn installation. <a href="http://tflearn.org/installation/">Here</a> is a good tutorial how to install both. Here is a quick recipe on how to install both on the mac, although it surely will run out of date soon (e.g. new versions etc..):</p>
<pre><code class="language-bash">sudo pip3 install https://ci.tensorflow.org/view/tf-nightly/job/tf-nightly-mac/TF_BUILD_IS_OPT\=OPT,TF_BUILD_IS_PIP\=PIP,TF_BUILD_PYTHON_VERSION\=PYTHON3,label\=mac-slave/lastSuccessfulBuild/artifact/pip_test/whl/tf_nightly-1.head-py3-none-any.whl</code></pre>
<p>If you happen to have a macbook with an NVIDIA graphic card, you can also  install Tensorflow with GPU support (your computations will run parallel the graphic card CPU which is much faster). Before attempting this, please check your graphics card in your &quot;About this mac&quot; first. The chances that your macbook has one are slim.</p>
<pre><code class="language-bash">sudo pip3 install https://storage.googleapis.com/tensorflow/mac/gpu/tensorflow_gpu-1.1.0-py2-none-any.whl</code></pre>
<p>Finally install TFlearn - in this case the bleeding edge version:</p>
<pre><code class="language-bash">sudo pip3 install git+https://github.com/tflearn/tflearn.git</code></pre>
<p>If you are having problems with the install <a href="https://www.tensorflow.org/install/install_mac#CommonInstallationProblems">here</a> is a good troubleshooting page to sort you out. </p>
<p>To get started in an Ipython notebook or a python file we need to load all the necessary libraries first. We will use numpy and pandas to make our life a bit easier and sklearn to split our dataset into a train and test set. Finally we will also obviously need tflearn and the datasets. </p>
<pre><code class="language-python">#import libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split 
import tflearn
from tflearn.data_utils import load_csv
from tflearn.datasets import titanic</code></pre>
<h3>Step 1: Load the data</h3>
<p>The Titanic dataset is stored in a CSV file. Since this toy dataset comes with TFLearn, we can use the TFLearn load_csv() function to load the data from the CSV file into a python list. By specifying the 'target_column' we indicate that the labels - so the thing we try to predict - (survived or not) are located in the first column. We then store our data in a pandas dataframe to easier inspect it (e.g. df.head()), and then split it into a train and test dataset. </p>
<pre><code class="language-python"># Download the Titanic dataset
titanic.download_dataset('titanic_dataset.csv')

# Load CSV file, indicate that the first column represents labels
data, labels = load_csv('titanic_dataset.csv', target_column=0,
                        categorical_labels=True, n_classes=2)

# Make a df out of it for convenience
df = pd.DataFrame(data)

# Do a test / train split
X_train, X_test, y_train, y_test = train_test_split(df, labels, test_size=0.33, random_state=42)
X_train.head()</code></pre>
<figure><img src="https://liip.rokka.io/www_inarticle/703967/dataset.png" alt="dataframe"></figure>
<p>Studying the data frame you also see that we have a couple of infos for each passenger. In this case I took a look at the first entry:</p>
<ul>
<li>Passenger Class (1 = 1st; 2 = 2nd; 3 = 3rd)</li>
<li>name (e.g. Allen, Miss. Elisabeth Walton)</li>
<li>gender (e.g. female/male)</li>
<li>age(e.g. 29)</li>
<li>number of siblings/spouses aboard (e.g. 0)</li>
<li>number of parents/children aboard (e.g. 0)</li>
<li>ticket number (e.g. 24160) and</li>
<li>passenger fare (e.g. 211.3375)</li>
</ul>
<h3>Step 2: Transform</h3>
<p>As we expect that the ticket number is a string we can transcode it into a category. But since we don’t know which Ticket number Leo and Kate had, let’s just remove it as a feature. Similarly the name of a passenger in its form as a simple string is not going to be relevant either without preprocessing. To keep things short in this tutorial, we are simply going to remove both columns. We also want to dichotomize or <a href="http://pbpython.com/categorical-encoding.html">label-encode</a> the gender for each passenger mapping male to 1 and female to 0. Finally we want to transform the data frame back into a numpy float32 array, because that's what our network expects. To achieve those things I wrote a small function that works on a pandas dataframe does those things:</p>
<pre><code class="language-python">#Transform the data
def preprocess(r):
    r = r.drop([1, 6], axis=1,errors='ignore')
    r[2] = r[2].astype('category')
    r[2] = r[2].cat.codes
    for column in r.columns:
        r[column] = r[column].astype(np.float32)
    return r.values
X_train = preprocess(X_train)
pd.DataFrame(X_train).head()</code></pre>
<p>We see that after the transformation the gender in the data frame is encoded as zeros and ones. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/738ba2/transformed.png" alt="transformed"></figure>
<h3>Step 3: Build the network</h3>
<p>Now we can finally build our deep learning network which is going to learn the data. First of all, we specify the shape of our input data. The input sample has a total of 6 features, and we will process samples per batch to save memory. The None parameter means an unknown dimension, so we can change the total number of samples that are processed in a batch. So our data input shape is [None, 6]. Finally, we build a three-layer neural network with this simple sequence of statements. </p>
<pre><code class="language-python">net = tflearn.input_data(shape=[None, 6])
net = tflearn.fully_connected(net, 32)
net = tflearn.fully_connected(net, 32)
net = tflearn.fully_connected(net, 2, activation='softmax')
net = tflearn.regression(net)</code></pre>
<p>If you want to visualize this network you can use Tensorboard to do so, although there will be not much to see (see below). Tensorflow won't draw all the nodes and edges but rather abstract whole layers as one box. To have a look at it you need to start it in your console and it will then become available on <a href="http://localhost:6006">http://localhost:6006</a> . Make sure to use a chrome browser when you are looking at the graphs, safari crashed for me. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/6e070b/tensorboard.png" alt="tensorboard"></figure>
<pre><code class="language-bash">sudo tensorboard --logdir=/tmp</code></pre>
<p>What we basically have are 6 nodes, which are our inputs. These inputs are then connected to 32 nodes, which are then all fully connected to another 32 nodes, which are then connected to our 2 output nodes: one for survival, the other for death. The activation function <a href="https://en.wikipedia.org/wiki/Softmax_function">softmax</a> is the way to define when a node &quot;fires&quot;. It is one option among among others like <a href="https://de.wikipedia.org/wiki/Sigmoidfunktion">sigmoid</a> or <a href="https://en.wikipedia.org/wiki/Rectifier">relu</a>.  Below you see a schematic I drew with graphviz based on a dot file, that you can download <a href="https://github.com/plotti/titanic_with_tflearn">here</a> . Instead of 32 nodes in the hidden layers I just drew 8, but you hopefully get the idea.  </p>
<figure><img src="https://liip.rokka.io/www_inarticle/265e2c/graph-txt.png" alt="Graph"></figure>
<h3>Step 4: Train it</h3>
<p>TFLearn provides a wrapper called Deep Neural Network (DNN) that automatically performs neural network classifier tasks, such as training, prediction, save/restore, and more. I think this is pretty handy. We will run it for 20 epochs, which means that the network will see all the data 20 times with a batch size of 32, which means that it will take in 32 samples at once. We will create one model without <a href="https://en.wikipedia.org/wiki/Cross-validation">cross validation</a> and one with it to see which one performs better. </p>
<pre><code class="language-python"># Define model
model = tflearn.DNN(net)
# Start training (apply gradient descent algorithm)
model.fit(X_train, y_train, n_epoch=20, batch_size=32, show_metric=True)
# With cross validation if you want
model2 = tflearn.DNN(net)
model2.fit(data, labels, n_epoch=10, batch_size=16, show_metric=True, validation_set=0.1) </code></pre>
<h3>Step 4: Evaluate it</h3>
<p>Well finally we've got our model and can now see how well it really performs. This is easy to do with:</p>
<pre><code class="language-python">#Evaluation
X_test = preprocess(X_test)
metric_train = model.evaluate(X_train, y_train)
metric_test = model.evaluate(X_test, y_test)
metric_train_1 = model2.evaluate(X_train, y_train)
metric_test_1 = model2.evaluate(X_test, y_test)
print('Model 1 Accuracy on train set: %.9f' % metric_train[0])
print("Model 1 Accuracy on test set: %.9f" % metric_test[0])
print('Model 2 Accuracy on train set: %.9f' % metric_train_1[0])
print("Model 2 Accuracy on test set: %.9f" % metric_test_1[0])</code></pre>
<p>The output gave me very similar results for the train set (0.78) and the test set (0.77) for both the normal and the cross validated model. So for this small example the cross validation does not really seem to play a difference. Both models do a fairly good job at predicting the survival rate of the Titanic passengers.  </p>
<h3>Step 5: Use it to predict</h3>
<p>We can finally see what Leonardo DiCaprio's (Jack) and Kate Winslet's (Rose) survival chances really were when they boarded that ship. To do so I modeled both by their attributes. So for example Jack boarded third class (today called the economy class), was male, 19 years old had no siblings or parents on board and payed only 5$ for his passenger fare. Rose traveled of course first class, was female, 17 years old, had a sibling and two parents on board and paid 100$ for her ticket. </p>
<pre><code class="language-python"># Let's create some data for DiCaprio and Winslet
dicaprio = [3, 'Jack Dawson', 'male', 19, 0, 0, 'N/A', 5.0000]
winslet = [1, 'Rose DeWitt Bukater', 'female', 17, 1, 2, 'N/A', 100.0000]
# Preprocess data
dicaprio, winslet = preprocess([dicaprio, winslet], to_ignore)
# Predict surviving chances (class 1 results)
pred = model.predict([dicaprio, winslet])
print("DiCaprio Surviving Rate:", pred[0][1])
print("Winslet Surviving Rate:", pred[1][1])</code></pre>
<p>The output gives us:</p>
<ul>
<li><strong>DiCaprio Surviving Rate: 0.128768</strong></li>
<li><strong>Winslet Surviving Rate: 0.903721</strong></li>
</ul>
<figure><img src="https://liip.rokka.io/www_inarticle/31dc67/sinking.jpg" alt="Rigged game"></figure>
<h3>Conclusion, what's next?</h3>
<p>So after all we know it was a rigged game. Given his background DiCaprio really had low chances of surviving this disaster. While we didn't really learn anything new about the outcome of the movie, I hope that you enjoyed this quick intro into Tensorflow and Tflearn that are not really hard to get into and don't have to end with a disaster. </p>
<p>In our example there was really no need to pull out the big guns, a simple regression or any other machine learning method would have worked fine. Tensorflow, TFlearn or Keras really shine though when it comes to Image, Text and Audio recognition tasks. With the very popular Keras library we are almost able to reduce the boilerplate for these tasks even more, which I will cover in one of the future blog posts. In the meantime I encourage you to play around with neural networks in your browser in this <a href="https://playground.tensorflow.org">excellent</a> <a href="https://teachablemachine.withgoogle.com">examples</a> and am looking forward for your comments and hope that you enjoyed this little fun blog post. If you want you can download the Ipython notebook for this example <a href="https://github.com/plotti/titanic_with_tflearn/blob/master/Titanic%20example.ipynb">here</a> .</p>]]></description>
                  <enclosure url="http://liip.rokka.io/www_card_2/4aeba2/istock-458924911.jpg" length="1407052" type="image/jpeg" />
          </item>
        <item>
      <title>The Data Science Stack 2018</title>
      <link>https://www.liip.ch/fr/blog/the-data-science-stack-2018</link>
      <guid>https://www.liip.ch/fr/blog/the-data-science-stack-2018</guid>
      <pubDate>Mon, 16 Apr 2018 00:00:00 +0200</pubDate>
      <description><![CDATA[<p>More than one year ago I sat down and went through my various github stars and browser bookmarks to compile what I then called the Data Science stack. It was basically an exhaustive collection of tools from which some I use on a daily basis, while others I have only heard of. The outcome was a big PDF poster which you can download <a href="https://www.liip.ch/en/blog/data-stack">here</a>. </p>
<p>The good thing about it was, that every tool I had in mind could be found there somewhere, and like a map I could instantly see to which category it belonged. As a bonus I was able to identify my personal white spots on the map. The bad thing about it was, that as soon as I have compiled the list, it was out of date. So I transferred the collection into a google sheet and whenever a new tool emerged on my horizon I added it there. Since then -  in almost a year - I have added over 102 tools to it. </p>
<h2>From PDF to Data Science Stack website</h2>
<p>While it would be OK to release another PDF of the stack year after year, I thought that might be  a better idea to turn this into website, where everybody can add tools to it.<br />
So without further ado I present you the <a href="http://datasciencestack.liip.ch">http://datasciencestack.liip.ch</a> page. Its goal is still to provide an orientation like the PDF, but eventually never become stale. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/2dd1be/front.png" alt="frontpage"></figure>
<p><strong>Adding Tools: </strong>Adding tools to my google sheet felt a bit lonesome, so I asked others internally to add tools whenever they find new ones too. Finally when moving away from the old google sheet and opening our collection process to everybody I have added a little button on the website that allows everybody to add tools by themselves to the appropriate category. Just send us the name, link and a quick description and we will add it there after a quick sanity check. The goal is to gather user generated input too!  The I am thinking also about turning the website into a “github awesome” repository, so that adding tools can be done more in a programmer friendly way. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/855d2c/add.png" alt="adding tools for everyone"></figure>
<p><strong>Search:</strong> When entering new tools, I realized that I was not sure if that tool already exists on the page, and since tools are hidden away after the first five the CTRL+F approach didn’t really work. That's why the website now has a little search box to search if a tool is already in our list. If not just add it to the appropriate category. </p>
<p><strong>Mailing List:</strong> If you are a busy person and want to stay on top of things, I would not expect you to regularly check back and search for changed entries. This is why I decided to send out a quarterly mailing that contains the new tools we have added since our last data science stack update. This helps you to quickly reconnect to this important topic and maybe also to discover a data science gem you have not heard of yet. </p>
<p><strong>JSON download:</strong> Some people asked me for the raw data of the PDF and at that time I was not able to give it to them quickly enough. That's why I added a json route that allows you to simply download the whole collection as a json file and create your own visualizations / maps or stacks with the tools that we have collected. Maybe something cool is going to come out of this. </p>
<p><strong>Communication:</strong> Scanning through such a big list of options can sometimes feel a bit overwhelming, especially since we don’t really provide any additional info or orientation on the site. That’s why I added multiple ways of contacting us, in case you are just right now searching for a solution for your business. I took the liberty to also link our blog posts that are tagged with machine learning at the bottom of the page, because often we make use of the tools in these. </p>
<p><strong>Zebra integration:</strong> Although it's nowhere visible on the website, I have hooked up the data science stack to our internal “technology database” system, called Zebra (actually Zebra does a lot more, but for us the technology part is relevant). Whenever someone enters a new technology into our technology db, it is automatically added for review to the data science stack. Like this we are basically tapping into the collective knowledge of all of our employees our company. A screenshot below gives a glimpse of our tech db on zebra capturing not only the tool itself but also the common feelings towards it. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/680599/zebra.png" alt="Zebra integration"></figure>
<h2>Insights from collecting tools for one more year</h2>
<p>Furthermore, I would like to provide you with the questions that guided me in researching each area and the insights that I gathered in the year of maintaining this list. Below you see a little chart showing to which categories I have added the most tools in the last year. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/caabb8/graphs2.png" alt="overview"></figure>
<h3>Data Sources</h3>
<p>One of the remaining questions, for us is what tools do offer good and legally compliant ways to capture user interaction?  Instead of Google Analytics being the norm, we are always on the lookout for new and fresh solutions in this area. Despite Heatmap Analytics, another new category I added is «Tag Management˚ Regarding the classic website analytics solutions, I was quite surprised that there are still quite a lot of new solutions popping up. I added a whole lot of solutions, and entirely new categories like mobile analytics and app store analytics after discovering that great github awesome list of analytics solutions <a href="https://github.com/onurakpolat/awesome-analytics">here</a>.</p>
<figure><img src="https://liip.rokka.io/www_inarticle/2aff10/sources2.png" alt="data sources"></figure>
<h3>Data Processing</h3>
<p>How can we initially clean or transform the data? How and where can we store logs that are created by these transformation events? And where do we also take additional valuable data? Here I’ve added quite a few of tools in the ETL area and in the message queue category. It looks like eventually I will need  to split up the “message queue” category into multiple ones, because it feels like this one drawer in the kitchen where everything ends up in a big mess. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/732417/processing.png" alt="data processing"></figure>
<h3>Database</h3>
<p>What options are out there to store the data? How can we search through it? How can we access data sources efficiently? Here I mainly added a few specialized solutions, such as databases focused on storing mainly time series or graph/network data. I might either have missed something, but I feel that since there is no new paradigm shift on the horizon right now (like graph oriented, or nosql, column oriented or newsql dbs). It is probably in the area of big-data where most of the new tools emerged. An awesome list that goes beyond our collection can be found <a href="https://github.com/onurakpolat/awesome-bigdata">here</a>.</p>
<figure><img src="https://liip.rokka.io/www_inarticle/64a802/database.png" alt="database"></figure>
<h3>Analysis</h3>
<p>Which stats packages are available to analyze the data? What frameworks are out there to do machine learning, deep learning, computer vision, natural language processing? Obviously, due to the high momentum of deep learning leads to many new entries in this category. In the “general” category I’ve added quite a few entries, showing that there is still a huge momentum in the various areas of machine learning beyond only deep learning. Interestingly I did not find any new stats software packages, probably hinting that the paradigm of these one size fits all solutions is over. The party is probably taking place in the cloud, where the big five have constantly added more and more specialized machine learning solutions. For example for text, speech, image, video or chatbot/assistant related tasks, just to name a few. At least those were the areas where I added most of the new tools. Going beyond the focus on python there is the awesome <a href="https://github.com/josephmisiti/awesome-machine-learning">list</a> that covers solutions for almost every programming language. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/ab1df5/analysis.png" alt="analysis"></figure>
<h3>Visualization, Dashboards, and Applications</h3>
<p>What happens with the results? What options do we have to visually communicate them? How do we turn those visualizations into dashboards or entire applications? Which additional ways of to communicate with user beside reports/emails are out there? Surprisingly I’ve only added a few new entries here, may it be due to the fact that I accidentally have been quite thorough at research this area last year, or simply because of the fact that somehow the time of js visualizations popping up left and right has cooled off a bit and the existing solutions are rather maturing. Yet this awesome <a href="https://github.com/fasouto/awesome-dataviz">list</a> shows that development in this area is still far from cooling off. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/cd8663/viz.png" alt="visualization"></figure>
<h3>Business Intelligence</h3>
<p>What solutions do exist  that try to integrate data sourcing, data storage, analysis and visualization in one package? What BI solutions are out there for big data? Are there platforms/solutions that offer more of a flexible data-scientist approach (e.g. free choice of methods, models, transformations)? Here I have added solutions that were platforms in the cloud, it seems that it is only logical to offer less and less of desktop oriented BI solutions, due to the restrained computational power or due to the high complexity of maintaining BI systems on premise. Although business intelligence solutions are less community and open source driven as the other stacks, there are also <a href="https://github.com/thenaturalist/awesome-business-intelligence">awsome lists</a> where people curate those solutions. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/7fa6f4/bi.png" alt="business intelligence"></figure>
<p>You might have noticed that I tried to slip in an awsome list on github into almost every category to encourage you to look more in depth into each area. If you want to spend days of your life discovering awesome things, I strongly suggest you to check out this collection of awesome lists <a href="https://github.com/jnv/lists">here</a> or <a href="https://github.com/sindresorhus/awesome or">here</a>.</p>
<h3>Conclusion or what's next?</h3>
<p>I realized that keeping the list up to date in some areas seems almost impossible, while others gradually mature over time and the amount of new tools in those areas is easy to oversee. I also had to recognize that maintaining an exhaustive and always up to date list in those 5 broad categories seems quite a challenge. That's why I went out to get help. I’ve looked for people in our company interested particularly in one of these areas and nominated them technology ambassadors of this part of the stack. Their task will be to add new tools whenever they pop up on their horizon. </p>
<p>I have also come to the conclusion that the stack is quite useful when offering customers a bit of an overview at the beginning of a journey. It adds value to just know what popular solutions are out there and start digging around yourself. Yet separating more mature tools from the experimental ones or knowing which open source solutions have a good community behind it, is quite a hard task for somebody without experience. Somehow it would be great to highlight “the pareto principle” in this stack by pointing out to only a handful of solutions and saying you will be fine when you use those. Yet I also have to acknowledge that this will not replace a good consultation in the long run. </p>
<p>Already looking towards the improvement of this collection, I think that each tool needs some sort of scoring: While there could be plain vanilla tools that are mature and do the job, there are also the highly specialized very experimental tools that offers help in very niche area only. While this information is somewhat buried in my head, it would be good to make it explicit on the website. Here I am highly recommending what Thoughtworks has come up with in their <a href="https://www.thoughtworks.com/radar">technology radar</a>. Although their radar goes well beyond our little domain of data services, it offers a great idea to differentiate tools. Namely into four categories: </p>
<ul>
<li>Adopt: We feel strongly that the industry should be adopting these items. We see them when appropriate on our projects. </li>
<li>Trial: Worth pursuing. It is important to understand how to build up this capability. Enterprises should try this technology on a project that can handle the risk. </li>
<li>Asses: Worth exploring with the goal of understanding how it will affect your enterprise. </li>
<li>Hold: Proceed with caution.</li>
</ul>
<figure><img src="https://liip.rokka.io/www_inarticle/37daaf/radar.png" alt="Technology radar"></figure>
<p>Assessing tools according to these criteria is no easy task - thoughtworks is doing it by nominating a high profile jury that vote regularly on these tools. With 4500 employees, I am sure that their assessment is a representative sample of the industry. For us and our stack, a first start would be to adopt this differentiation, fill it out myself and then get other liipers to vote on these categories. To  a certain degree we have already started this task internally in our tech db, where each employee assessed a common feeling towards a tool. </p>
<p>Concluding this blogpost, I realized that the simple task of “just” having a list with relevant tools for each area seemed quite easy at the start. The more I think about it, and the more experience I collect in maintaining this list, the more realize that eventually such a list is growing into a knowledge and technology management system. While such systems have their benefits (e.g. in onboarding or quickly finding experts in an area) I feel that turning this list into one will be walking down this rabbit hole of which I might never re-emerge. Let’s see what the next year will bring.</p>]]></description>
                  <enclosure url="http://liip.rokka.io/www_card_2/2dd1be/front.jpg" length="4200904" type="image/png" />
          </item>
        <item>
      <title>The Facebook Scandal - or how to predict psychological traits from Facebook likes.</title>
      <link>https://www.liip.ch/fr/blog/the-facebook-scandal-or-how-to-predict-psychological-traits-from-facebook-likes</link>
      <guid>https://www.liip.ch/fr/blog/the-facebook-scandal-or-how-to-predict-psychological-traits-from-facebook-likes</guid>
      <pubDate>Fri, 06 Apr 2018 00:00:00 +0200</pubDate>
      <description><![CDATA[<p>The facebook scandal or the tempest in a teapot seems to be everywhere these days. Due to it Facebook has lost billions in stock market value, governments on both sides of the Atlantic have opened investigations, and a social movement is calling on users to #DeleteFacebook. While the <a href="http://www.zeit.de/wirtschaft/2018-03/plattformkapitalismus-internetplattformen-regulierung-facebook-cambridge-analytica">press</a> around the <a href="https://www.usatoday.com/story/opinion/2018/03/22/facebook-mark-zuckerberg-not-our-friends-cambridge-analytica-column/448767002/">world</a> and also in <a href="https://www.nzz.ch/meinung/facebook-unschuldig-am-pranger-ld.1369562">Switzerland</a> are discussing back and forth, how big Facebook’s role was in the current data miss-use between, I thought it would be great to answer three central questions, which even I had after reading a couple of articles: </p>
<ol>
<li>It seems that CA obtained the data somewhat semi-legally, but everyone seems very upset about that. How did they do it and can you do it, too? :)</li>
<li>People feel manipulated with big data and big buzzwords, yet practically how much can one deduce from our facebook likes?  </li>
<li>How does the whole thing work, predicting my most inner psychological traits from facebook likes, it seems almost like wizardry. </li>
</ol>
<p>To provide answers to these questions I have structured the article mainly around a theoretical part and a practical part.  In the theoretical part I'd like to give you an introduction on psychology on Facebook, explaining the research behind it,  showing how such data was collected initially and finally highlighting its implications for political campaigning. In the practical part I will walk you through a step by step example that shows how machine learning can deduce psychological traits from facebook data, showing you the methods, little tricks and actual results.  </p>
<h3>Big 5 Personality Traits</h3>
<p>Back in 2016 an article was frantically shared in the DACH region. With its quite sensation-seeking title <a href="https://www.dasmagazin.ch/2016/12/03/ich-habe-nur-gezeigt-dass-es-die-bombe-gibt/?reduced=true">Ich habe nur geziegt dass es die Bombe gibt</a>, the article claimed that basically our most “inner” personality traits - often called the big 5 or  <a href="https://en.wikipedia.org/wiki/Big_Five_personality_traits">OCEAN</a> - can be extracted from our Facebook usage and are used to manipulate us in political campaigns. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/3c894b/taits.png" alt="personality traits"></figure>
<p>This article itself was based on the <a href="http://www.pnas.org/content/early/2015/01/07/1418680112">research paper</a> of Michal Kosinski and other Cambridge researchers who studied the correlation between Facebook likes and our personality traits, while their <a href="http://www.pnas.org/content/110/15/5802">older research</a> about this topic goes even back to 2013. Although some of these researchers ended up in the eye of the storm of this scandal, they are definitely not the <a href="https://www.sciencedirect.com/science/article/pii/S0191886917307328">only ones</a> studying this interesting field. </p>
<p>In the OCEAN definition our personality traits are: openness to experience, conscientiousness, extraversion, agreeableness, and neuroticism. While psychologists normally have to ask people around 100 questions to determine these traits - you can do an online test yourself <a href="https://www.psychometrictest.org.uk/big-five-personality/">online</a> to get an idea - the article discussed a way where Facebook data could be used to infer them instead. They showed that &quot;(i) computer predictions based on a generic digital footprint (Facebook Likes) are more accurate (r = 0.56) than those made by the participants’ Facebook friends using a personality questionnaire (r = 0.49)&quot;. This basically meant that by using a machine learning approach based on your Facebook data they were able to train a model that was better in describing you, than a friend of yours. They also found out that, the more facebook likes they have for a single person, the better the model is  predicting those traits. So as you might have expected having more on users pays off in these terms. The picture below shows a birds-eye view of their approach.</p>
<figure><img src="https://liip.rokka.io/www_inarticle/cca203/approach.png" alt="the original approach from the paper"></figure>
<p>They also came to the conclusion that &quot;computers outpacing humans in personality judgment, presents significant opportunities and challenges in the areas of psychological assessment, marketing, and privacy.&quot;  So in other words hinting that their approach might be quite usable beyond the academic domain. </p>
<h3>Cambridge Analytica</h3>
<p>This is how Cambridge Analytica (CA) comes into play, which has its name originally from the cooperation of the psychometrics department in Cambridge. Founded in 2014, their goal was to make use of such psychometric results in the political campaign domain. Making use of psychometric insights is nothing new per-se since at least  the 1980ties different actors have been making use of <a href="https://de.wikipedia.org/wiki/Kategorie:Sozialwissenschaftliche_Klassifikation">various psychometric classifications</a> for various applications. For example the <a href="https://de.wikipedia.org/wiki/Sinus-Milieus">Sinus Milieus</a> are quite popular in the DACH region, mainly for marketing areas. </p>
<p>The big dramatic shift in comparison to these “old” survey based approaches is that CA was doing two things differently: Firstly they focused specifically on Facebook as a data source and secondly they used Facebook as a platform primarily for their interventions. Instead of asking long and expensive questionnaires, they were able to collect such data in an easy manner and also, they reach their audience in a highly individualized way. </p>
<h3>How did Cambridge Analytica get the data?</h3>
<p>Cambridge researchers had originally created a Facebook app that was used to collect data for the research paper mentioned above. Users filled out a personality test on Facebook and then agreed on, that the app can collect their profile data. There is nothing wrong with that, except for the fact that at this time Facebook (and the users) also allowed the app to collect the personality profiles of their friends, who never participated in the first place. At this point I am not sure which of many of such apps (“mypersonality”, “thisisyourdigitallife” or others) was used to gather the data, but the result was that with this snowball system approach CA quickly collected data about roughly 50 Mio users. This data became the base for the work of CA. And CA was heading towards using this data to change the way political campaigns work forever. </p>
<h3>Campaigns: From Mass communication to individualized communication</h3>
<p>I would argue that the majority of people are still used to the classic political campaigns that are broadcasted to us via billboards-ads or TV-ads and have one message for everybody, thus following the old paradigm of <a href="https://en.wikipedia.org/wiki/Mass_communication">mass communication</a>. While such messages have a target group (e.g.women over 40), it is still hard to reach exactly those people, since so many other people are reached too, making this approach rather costly and ineffective.</p>
<p>With the internet era things changed quickly here: In nowadays online advertising world a very detailed target group can be reached easily (via Google Ads or Facebook advertising) and each group can potentially receive a custom marketing message. Such <a href="https://de.wikipedia.org/wiki/Programmatic_Advertising">programmatic advertising</a> surely disrupted the adverising world, but CA’s idea was not to use this mechanism for advertising but for political campaigns.  While quite a lot of <a href="https://www.washingtonpost.com/politics/cruz-campaign-paid-750000-to-psychographic-profiling-company/2015/10/19/6c83e508-743f-11e5-9cbb-790369643cf9_story.html">details</a> are known about those campaigns and some <a href="https://en.wikipedia.org/wiki/Cambridge_Analytica">shady</a> practices have been made visible, here I want to focus on the idea behind them. The really smart idea about these campaigns is that a political candidate can appeal to many different divergent groups of voters at the same time! This means that they could appear  as safety loving to risk-averse persons and could appear as risk-loving to young entrepreneurs at the same time. So the same mechanisms that are used to convince people of a product are now being used to convince people to vote for a politician - and every one of these voters might do so because of individual reasons. Great isn't it? </p>
<figure><img src="https://liip.rokka.io/www_inarticle/64dd5a/dr-dave-inked.jpg" alt="doctor or tattoo artist?"></figure>
<p>This brings our theoretical part to the end. We know now they hows and whys behind Cambridge Analytica and why these new form of political campaigns matter. It's time now to find out how they were able to infer personality traits from facebook likes. I am sorry to disappoint you though, that I won’t cover  on how to run a political campaign on Facebook in the next practical part.</p>
<h3>Step1 : Get, read in and view data</h3>
<p>What most people probably don’t know is, that one of the initial projects <a href="http://mypersonality.org">website</a> still offers a big sample of the collected (and of course anonymized) data for research purposes. I downloaded it and put it to use it for our small example.  After reading in the data from the csv format we see that we have 3 Tables.</p>
<pre><code class="language-python">users = pd.read_csv("users.csv")
likes = pd.read_csv("likes.csv")
ul = pd.read_csv("users-likes.csv")</code></pre>
<figure><img src="https://liip.rokka.io/www_inarticle/19d73b/users.png" alt="users table"></figure>
<p>The user table with 110 thousand contains information about demographic attributes (such as age and gender) as well as their measurements on the 5 different personality traits: openness to experience(ope), conscientiousness(con), extraversion(ext), agreeableness(agr), and neuroticism(neu).</p>
<figure><img src="https://liip.rokka.io/www_inarticle/6e471c/likes.png" alt="pages table"></figure>
<p>The likes table contains a list of all 1.5 Mio pages that have been liked by users. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/965ac0/edges.png" alt="edges"></figure>
<p>The edges table contains all the 10 Mio edges between the users and the pages. In fact seen from a network research perspective our  “small sample” network is already quite big for normal <a href="https://en.wikipedia.org/wiki/Social_network_analysis">social network analysis</a> standards. </p>
<h3>Step 2: Merge Users with Likes in one dataframe aka the adjacency matrix</h3>
<p>In step 2 we now want to create a network between users and pages. In order to do this we have to convert the edge-list to a so called <a href="https://en.wikipedia.org/wiki/Adjacency_matrix">adjacency matrix</a> (see image with an example below). In our case  we want to use a sparse adjacency format, since we have roughly 10 Mio edges. The conversion, as shown below, transforms our edgelist into categorical integer numbers which are then used to create our matrix. By doing it in an ordered way, the rows of the resulting matrix still match the rows from our users table. This saves us a lot of cumbersome lookups.  </p>
<figure><img src="https://liip.rokka.io/www_inarticle/a4b915/network4.png" alt="from edge list to adjacency matrix"></figure>
<pre><code class="language-python">#transforming the edge list into a sparse adjacency matrix
rows = ul["userid"].astype('category',ordered=True,categories=users["userid"]).cat.codes # important to maintain order
cols = ul["likeid"].astype("category",ordered=True,categories=likes["likeid"]).cat.codes
ones = np.ones(len(rows), np.uint32).tolist()
sparse_matrix = csc_matrix((ones, (rows, cols)))
</code></pre>
<p>If you are impatient like me, you might have tried to use this matrix directly to predict the users traits. I have tried it and the results are rather unsatisfactory, due to the fact that we have way too many very “noisy” features, which give us models that predict almost nothing of our personality traits. The next two steps can be considered feature engineering or simply good tricks that work well with network data. </p>
<h3>Trick 1: Prune the network</h3>
<p>To obtain more meaningful results, we want to prune this network and only retain the most active users and pages. Theoretically - while working with a library like <a href="https://networkx.github.io">networkx</a> - we could simply say: </p>
<p>«Lets throw out all users with with a <a href="https://en.wikipedia.org/wiki/Network_science">degree</a> less than 5 and all pages with a degree of less than 20.» This would give us a network of users that are highly active and of pages that seem to be highly relevant. Often under social network research computing <a href="https://en.wikipedia.org/wiki/Degeneracy">k-cores</a> gives you a similar pruning effect. </p>
<p>But since our network is quite big I have used a poor man's version of that pruning approach: We will be going through a while loop that throws out columns (in our case users) where the sum of their edges less than 50 and rows (pages) where the sum of their edges less than 150. This is equivalent to throwing out users that follow less than 50 pages and throwing out pages that have less than 150 users. Since a removal of a page or user might hurt one of our conditions again, we will continue reduce the network as long as still columns or rows need to be removed. Once both conditions are met the loop will stop. While pruning we are also updating our user and pages lists (via boolean filtering), so we can track which users like which pages and vice versa. </p>
<pre><code class="language-python"># pruning the network into most relevant users and pages
print(sparse_matrix.shape)
max = 50
while True:
    i = sum(sparse_matrix.shape)
    columns_bool = (np.sum(sparse_matrix,axis=0)&gt;3*max).getA1()
    sparse_matrix = sparse_matrix[:, columns_bool] #columns
    likes = likes[columns_bool]    
    rows_bool = (np.sum(sparse_matrix,axis=1)&gt;max).getA1()
    sparse_matrix = sparse_matrix[rows_bool] #rows
    users = users[rows_bool]
    print(sparse_matrix.shape)
    print(userst.shape)
    if sum(sparse_matrix.shape) == i: 
        break</code></pre>
<p>This process quite significantly reduces our network size to roughly a couple thousand users (19k) with a couple of thousands pages (8-9k) or attributes each. What I will not show here is my second failed attempt at predicting the psychological traits. While the results were slightly better due to the better signal to noise ratio, they were still quite unsatisfactory. That's where our second very classic trick comes into play: <a href="https://en.wikipedia.org/wiki/Dimensionality_reduction">dimensionality reduction</a>.</p>
<h3>Trick 2: Dimensionality reduction or in our case SVD</h3>
<p>The logic behind dimensionality reduction can be explained intuitively: Usually when we analyse user’s attributes we often find attributes, that describe the &quot;same things&quot; (latent variables or factors). Factors are artificial categories emerge by combining “similar” traits, that behave in a very “similar” way (e.g. your age and your amount of wrinkles). In our case these variables are pages that a user liked. So there might be a page called &quot;Britney Spears&quot; and another one &quot;Britney Spears Fans&quot; and all users that like the first also like the second. Intuitively we would want both pages to &quot;behave&quot; like one, so kind of merge them into one page. </p>
<p>For such approaches a number of methods are available - although they all work a little bit different - the most used examples are <a href="https://en.wikipedia.org/wiki/Principal_component_analysis">Principal component analysis</a>, <a href="https://en.wikipedia.org/wiki/Singular-value_decomposition">Singular value decomposition</a>, <a href="https://en.wikipedia.org/wiki/Linear_discriminant_analysis">Linear discriminant analysis</a>. These methods allow us to “summarize” or “compress” the dataset in as many dimensions as we want. And as a benefit these dimensions are sorted in the way that the most important ones come first. </p>
<p>So instead of looking at a couple of thousand pages per user, we can now group them into 5 “buckets”. Each bucket will contain pages that are similar in regard to how users perceive these pages. Finally we can correlate these factors with the users personality traits. Scikit learn offers us a great way to perform a PCA on a big dataset with the incremental PCA method that even works with datasets that don’t fit into RAM. An even more popular approach (often used in recommender systems) is SVD, which is fast, easy to compute and yields good results. </p>
<pre><code class="language-python">#performing dimensionality reduction
svd = TruncatedSVD(n_components=5) # SVD
#ipca = IncrementalPCA(n_components=5, batch_size=10) # PCA 
df = svd.fit_transform(sparse_matrix)
#df = ipca.fit_transform(sparse_matrix)</code></pre>
<p>In the case above I have reduced the thousands of pages into just 5 features for visualization purposes. We can now do a pairwise comparison between the personality traits and the 5 factors that we computed and visualize it in a heatmap. </p>
<pre><code class="language-python">#generating a heatmap of user traits vs factors
users_filtered = users[users.userid.isin(matrix.index)]
tmp = users_filtered.iloc[:,1:9].values # remove userid, convert to np array
combined = pd.DataFrame(np.concatenate((tmp, df), axis=1)) # one big df
combined.columns=["gender","age","political","ope","con","ext","agr","neu","fac1","fac2","fac3","fac4","fac5"]
heatmap = combined.corr().iloc[8:13].iloc[:,0:7] # remove unwanted columns
sns.heatmap(heatmap, annot=True)</code></pre>
<figure><img src="https://liip.rokka.io/www_inarticle/bf7686/matrix.png" alt="heatmap"></figure>
<p>In the heatmap above we see that factor 3 seems to be quite highly positively correlated with the user's openness. We also see that factor 1 is negatively correlated with age. So the older you get the less you probably visit pages from this area. Generally we see though that the correlations between some factors and traits are not very high though (e.g. agreeableness) </p>
<h3>Step 3: Finally build a machine learning model to predict personality traits</h3>
<p>Armed with our new features we can come back and try to build a model that finally will do what I promised: namely predict the user's traits based solely on those factors. What I am not showing here, is the experimentation of choosing the right model for the job. After trying out a few models like <a href="http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html">Linear Regression</a>, <a href="http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html">Lasso</a> or <a href="http://scikit-learn.org/stable/modules/tree.html">decision trees</a> the <a href="http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoLars.html">LassoLars Model</a> with cross validation worked quite well. In all the approaches I’ve split the data into a test (90% of the data) and training set (10% of the data), to be able to compute the accuracy of the models on unseen data. I also applied some poormans hyperparameter tuning, where all the predictions are going through different variants of k of the SVD dimensionality reduction. </p>
<pre><code class="language-python">#training and testing the model
out = []
out.append(["k","trait","mse","r2","corr"])
for k in [2,5,10,20,30,40,50,60,70,80,90,100]:
    print("Hyperparameter SVD dim k: %s" % k)
    svd = TruncatedSVD(n_components=k)
    sparse_matrix_svd = svd.fit_transform(sparse_matrix)
    df_svd = pd.DataFrame(sparse_matrix_svd)
    df_svd.index=userst["userid"]
    total = 0
    for target in ["ope","con","ext","agr","neu"]:
        y = userst[target]
        y.index = userst["userid"]
        tmp = pd.concat([y,df_svd],axis=1)
        data = tmp[tmp.columns.difference([target])]
        X_train, X_test, y_train, y_test = train_test_split(data, y, test_size=0.1)
        clf=LassoLarsCV(cv=5, precompute=False)
        clf.fit(X_train,y_train)
        y_pred = clf.predict(X_test)
        mse = mean_squared_error(y_test,y_pred)
        r2 = r2_score(y_test,y_pred)
        corr = pearsonr(y_test,y_pred)[0]
        print('   Target %s Corr score: %.2f. R2 %s. MSE %s' % (target,corr,r2,mse))
        out.append([k,target,mse,r2,corr])
    print(" k %s. Total R2 %s" % (k,total))</code></pre>
<p>To see which amount of dimensions gave us the best results we can simply look at the printout or visualize it nicely with seaborn below. In the case above I found that solutions with 90 dimensions gave me quite good results. A more production ready way of doing is can be done with <a href="http://scikit-learn.org/stable/auto_examples/plot_compare_reduction.html">GridSearch</a>, but I wanted to keep the amount of code for this example minimal. </p>
<pre><code class="language-python">#visualizing hyperparameter search
gf = pd.DataFrame(columns=out[0],data=out[1:-1])
g = sns.factorplot(x="k", y="r2", data=gf,size=10, kind="bar", palette="muted")</code></pre>
<figure><img src="https://liip.rokka.io/www_inarticle/dda61b/hyper.png" alt="results"></figure>
<h3>Step 4: Results</h3>
<p>So now we can finally look at how the model performed on each trait, when using 90 dimensions in the SVD. We see that we are not great at explaining the user’s traits. Our <a href="https://en.wikipedia.org/wiki/Coefficient_of_determination">R^2 score</a> shows that for the most traits we can explain them only roughly between 10-20% using Facebook Likes. Among all traits openness seems to be the most predictable attribute. It seems to make sense, as open people would be willing to share more of their likes on facebook. While for some of you it might feel a bit disappointed that we were not able to predict 100% of the psychological traits for a user, you should know that in a typical <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2927808/">re-test</a>  of psychological traits, researchers are also only are able to explain roughly 60-70% of the traits again. So we did not too bad after all. Last but not least, it's worth mentioning that if we are able to predict roughly 10-20% per trait times 5 traits, overall we know quite a bit about the user in general. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/8fdbf2/results-lasso.png" alt="results"></figure>
<h3>Conclusion or what's next?</h3>
<p>From of my point of view I think this small example shows two things: </p>
<p>Firstly, we learned that it is hard to predict our personality traits well from JUST Facebook likes. In this example we were only able to predict a maximum 20% of a personality trait (in our case for openness). While there are of course myriads of ways to improve our model (e.g. by using more data, different types data, smarter features, better methods) quite a bunch of variance might be still remain unexplained - not too surprising in the area of psychology or social sciences. While I have not shown it here, the easiest way to improve our model would be simply to allow demographic attributes as features too. So knowing a users age and gender would allow us to improve roughly 10-20%. But this would then rather feel like one of these old-fashioned approaches. What we could do instead is to use the user's likes to predict the their gender and age ; but lets save this for another blog post though. </p>
<p>Secondly, we should not forget that even knowing a person's personality very well, might not transfer into highly effective political campaigns that would be able to swing a users vote. After all that is what CA promised to their customers, and that's what the media is fearing. Yet the general approach to run political campaigns in the same way such as marketing campaigns is probably here to stay. Only the future will show how effective such campaigns really are. After all Facebook is not an island: Although you might see Facebook ads showing your candidate in the best light for you, there are still the old fashioned broadcast media (that still makes the majority of consumption today), where watching an hour interview with your favorite candidate might shine a completely different light on him. I am looking forward to see if old virtues such as authenticity or sincerity might not give a better mileage than personalized facebook ads.</p>]]></description>
                  <enclosure url="http://liip.rokka.io/www_card_2/16dc35/apps-blur-button-267350.jpg" length="1501643" type="image/jpeg" />
          </item>
    
  </channel>
</rss>
