<?xml version="1.0" encoding="utf-8"?>
<!-- generator="Kirby" -->
<rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom">

  <channel>
    <title>Mot-cl&#233;: keras &#183; Blog &#183; Liip</title>
    <link>https://www.liip.ch/fr/blog/tags/keras</link>
    <generator>Kirby</generator>
    <lastBuildDate>Tue, 07 Aug 2018 00:00:00 +0200</lastBuildDate>
    <atom:link href="https://www.liip.ch" rel="self" type="application/rss+xml" />

        <description>Articles du blog Liip avec le mot-cl&#233; &#8220;keras&#8221;</description>
    
        <language>fr</language>
    
        <item>
      <title>Zoo Pokedex Part 2: Hands on with Keras and Resnet50</title>
      <link>https://www.liip.ch/fr/blog/zoo-pokedex-part-2-hands-on-with-keras-and-resnet50</link>
      <guid>https://www.liip.ch/fr/blog/zoo-pokedex-part-2-hands-on-with-keras-and-resnet50</guid>
      <pubDate>Tue, 07 Aug 2018 00:00:00 +0200</pubDate>
      <description><![CDATA[<h3>Short Recap from Part 1</h3>
<p>In the <a href="https://www.liip.ch/en/blog/poke-zoo-or-making-deep-learning-tell-oryxes-apart-from-lamas-in-a-zoo-part-1-the-idea-and-concepts">last blog post</a> I briefly discussed the potential of using deep learning to build a zoo pokedex app that could be used to motivate zoo goers to engage with the animals and the information. We also discussed the <a href="http://image-net.org">imagenet competition</a> and how deep learning has drastically changed the image recognition game. We went over the two main tricks that deep learning architectures do, namely convolutions and pooling, that allow such deep learning networks to perform extremely well. Last but not least we realized that all you have to do these days is to stand on the shoulders of giants by using the existing networks (e.g. Resnet50)  to be able to write applications that have a similar state of the art precision.  So finally in this blog post it’s time to put these giants to work for us.</p>
<h3>Goal</h3>
<p>The goal is to write an image detection app that will be able to distinguish animals in our zoo. Now for obvious reasons I will make our zoo really small, thus only containing two types of animals:</p>
<ul>
<li>Oryxes and</li>
<li>LLamas (why there is a second L in english is beyond my comprehension).</li>
</ul>
<figure><img src="https://liip.rokka.io/www_inarticle/8c74f3/lamavsoryx.jpg" alt=""></figure>
<p>Why those animals? Well they seem fluffy, but mostly because the original imagenet competition does not contain these animals. So it represents a quite realistic scenario of a Zoo having animals that need to be distinguished but having existing deep learning networks that have not been trained for those. I really have picked these two kinds of animals mostly by random just to have something to show. (Actually I checked if the Zürich Zoo has these so i can take our little app and test it in real life, but that's already part of the third blog post regarding this topic)</p>
<h3>Getting the data</h3>
<p>Getting data is easier than ever in the age of the internet. Probably in the 90ties I would have had to go to some archive or even worse take my own camera and shoot lots and lots of pictures of these animals to use them as training material. Today I can just ask Google to show me some. But wait - if you have actually tried using Google Image search as a resource you will realize that downloading their images in huge amounts is a pain in the ass. The image api is highly limited in terms of what you can get for free, and writing scrapers that download such images is not really fun. That's why I went to the competition and used Microsoft's cognitive services to download images for each animal. </p>
<h3>Downloading image data from Microsoft</h3>
<p>Microsoft offers quite a convenient image search API via their <a href="https://azure.microsoft.com/en-us/services/cognitive-services/">cogitive services</a>. You can sign up there to get a free tier for a couple of days, which should be enough to get you started. What you basically need is an API Key and then you can already start downloading images to create your datasets. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/b79e82/microsoft.jpg" alt=""></figure>
<pre><code class="language-ruby "># Code to download images via Microsoft cognitive api
require 'HTTParty'
require 'fileutils'

API_KEY = "##############"
SEARCH_TERM = "alpaka"
QUERY = "alpaka"
API_ENDPOINT  = "https://api.cognitive.microsoft.com/bing/v7.0/images/search"
FOLDER = "datasets"
BATCH_SIZE = 50
MAX = 1000

# Make the dir
FileUtils::mkdir_p "#{FOLDER}/#{SEARCH_TERM}"

# Make the request
headers = {'Ocp-Apim-Subscription-Key' =&gt; API_KEY}
query = {"q": QUERY, "offset": 0, "count": BATCH_SIZE}
puts("Searching for #{SEARCH_TERM}")
response = HTTParty.get(API_ENDPOINT,:query =&gt; query,:headers =&gt; headers)
total_matches = response["totalEstimatedMatches"]

i = 0
while response["nextOffset"] != nil &amp;&amp; i &lt; MAX
    response["value"].each do |image|
        i += 1
        content_url = image["contentUrl"]
        ext = content_url.scan(/^\.|jpg$|gif$|png$/)[0]
        file_name = "#{FOLDER}/#{SEARCH_TERM}/#{i}.#{ext}"
        next if ext == nil
        next if File.file?(file_name)
        begin
            puts("Offset #{response["nextOffset"]}. Downloading #{content_url}")
            r = HTTParty.get(content_url)
            File.open(file_name, 'wb') { |file| file.write(r.body) }
        rescue
            puts "Error fetching #{content_url}"
        end
    end
    query = {"q": SEARCH_TERM, "offset": i+BATCH_SIZE, "count": BATCH_SIZE}
    response = HTTParty.get(API_ENDPOINT,:query =&gt; query,:headers =&gt; headers)
end</code></pre>
<p>The ruby code above simple uses the API in batches and downloads llamas and oryxes into their separate directories and names them accordingly. What you don’t see is that I went through these folders by hand and removed images that were not really the animal, but for example a fluffy shoe, that showed up in the search results. I also de-duped each folder. You can scan the images quickly on your mac using the thumbnail preview or use an image browser that you are familiar with to do the job. </p>
<h3>Problem with not enough data</h3>
<p>Ignoring probable copyright issues (Am i allowed to train my neural network on copyrighted material) and depending on what you want to achieve you might run into the problem, that it’s not really that easy to gather 500 or 5000 images of oryxes and llamas. Also to make things a bit challenging I tried to see if it was possible to train the neural networks using only 100 examples of each animal while using roughly 50 examples to validate the accuracy of the networks. </p>
<p>Normally everyone would tell you that you need definitely more image material because deep learning networks need a lot of data to become useful. But in our case we are going to use two dirty tricks to try to get away with our really small collection: data augmentation and reuse of already pre-trained networks. </p>
<h3>Image data generation</h3>
<p>A really neat handy trick that seems to be prevalent everyday now is to take the images that you already have and change them slightly artificially. That means rotating them, changing the perspective, zooming in on them. What you end up is, that instead of having one image of a llama, you’ll have 20 pictures of that animal, just every picture being slightly different from the original one. This trick allows you to create more variation without actually having to download more material. It works quite well, but is definitely inferior to simply having more data.  </p>
<p>We will be using <a href="http://keras.io">Keras</a> a deep learning library on top of tensorflow, that we have used before in <a href="https://www.liip.ch/en/blog/tensorflow-and-tflearn-or-can-deep-learning-predict-if-dicaprio-could-have-survived-the-titanic">other</a> blog posts to <a href="https://www.liip.ch/en/blog/sentiment-detection-with-keras-word-embeddings-and-lstm-deep-learning-networks">create a good sentiment detection</a>. In the domain of image recognition Keras can really show its strength, by already having built in methods to do image data generation for us, without having to involve any third party tools. </p>
<pre><code class="language-python"># Creating a Image data generator
train_datagen = ImageDataGenerator(preprocessing_function=preprocess_input,
    shear_range=0.2, zoom_range=0.2, horizontal_flip=True)</code></pre>
<p>As you can see above we have created an image data generator, that uses sheering, zooming and horizontal flipping to change our llama pictures. We don’t do a vertical flip for example because its rather unrealistic that you will hold your phone upside down.  Depending on the type of images (e.g. aerial photography) different transformations might or might not make sense.</p>
<pre><code class="language-python"># Creating variations to show you some examples
img = load_img('data/train/alpaka/Alpacca1.jpg')
x = img_to_array(img) 
x = x.reshape((1,) + x.shape)  
i = 0
for batch in train_datagen.flow(x, batch_size=1,
                          save_to_dir='preview', save_prefix='alpacca', save_format='jpeg'):
    i += 1
    if i &gt; 20:
        break  # otherwise the generator would loop indefinitely</code></pre>
<figure><img src="https://liip.rokka.io/www_inarticle/31a080/variations.png" alt=""></figure>
<p>Now if you want to use that generators in our model directly you can use the convenient flow from directory method, where you can even define the target size, so you don’t have to scale down your training images with an external library. </p>
<pre><code class="language-python"># Flow from directory method
train_generator = train_datagen.flow_from_directory(train_data_dir,
    target_size=(sz, sz),
    batch_size=batch_size, class_mode='binary')</code></pre>
<h3>Using Resnet50</h3>
<p>In order to finally step on the shoulder of giants we can simply import the resnet50 model, that we talked about earlier. <a href="http://ethereon.github.io/netscope/#/gist/db945b393d40bfa26006">Here</a> is a detailed description of each layer and <a href="https://arxiv.org/pdf/1512.03385.pdf">here is the matching paper</a> that describes it in detail. While there are <a href="https://keras.io/applications/">different alternatives that you might also use</a> the resnet50 model has a fairly high accuracy, while not being too “big” in comparison to the computationally expensive <a href="http://www.robots.ox.ac.uk/~vgg/">VGG</a> network architecture.</p>
<p>On a side note: The name “res” comes from residual. A residual can be understood a a subtraction of features that were learned from the input a leach layer. ResNet has a very neat trick that allows deeper network to learn from residuals by “short-circuiting” them with the deeper layers. So directly connecting the input of an n-th layer to some (n+x)th layer. This short-circuiting has been proven to make the training easier. It does so by helping with the problem of degrading accuracy, where networks that are too deep are becoming exponentially harder to train. </p>
<pre><code class="language-python">#importing resnet into keras
from keras.models import load_model
base_model = ResNet50(weights='imagenet')</code></pre>
<figure><img src="https://liip.rokka.io/www_inarticle/3b54cc/comparison.jpg" alt=""></figure>
<p>As you can see above, importing the network is really dead easy in keras. It might take a while to download the network though. Notice that we are downloading the weights too, not only the architecture.</p>
<h3>Training existing models</h3>
<p>The next part is the exciting one. Now we finally get to train the existing networks on our own data. The simple but ineffective approach would be to download or just re-build the architecture of the successful network and train those with our data. The problem with that approach is, that we only have 100 images per class. 100 images per class  are not even remotely close to being enough data to train those networks well enough to be useful. </p>
<p>Instead we will try another technique (which I somewhat stole from the <a href="https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html">great keras blog</a>): We will freeze all weights of the downloaded network and add three final layers at the end of the network and then train those. </p>
<h3>Freezing the base model</h3>
<p>Why is this useful you might ask: Well by doing so we can freeze all of the existing layers of the resnet50 network and just train the final layer. This makes sense, since the imagenet task is about recognizing everyday objects from everyday photographs, and it is already very good at recognising “basic” features such as legs, eyes, circles, heads, etc… All of this “smartness” is already encoded in the weights (see the last blog post). If we throw these weights away we will lose these nice smart properties. But instead we can just glue another pooling layer and a dense layer at the very end of it, followed by a sigmoid activation layer, that's needed to distinguish between our two classes. That's by the way why it says “include_top=False” in the code, in order to not include the initial 1000 classes layer, that was used for the imagenet competition. Btw. If you want to read up on the different alternatives to the resnet50 you will find them <a href="https://keras.io/applications/">here</a>.</p>
<pre><code class="language-python"># Adding three layers on top of the network
base_model = ResNet50(weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(1, activation='sigmoid')(x)</code></pre>
<p>Finally we can now re-train the network with our own image material and hope for it to turn out to be quite useful. I’ve had some trouble finding the right optimizer that had proper results. Usually you will have to experiment with the right learning rate to find a configuration that has an improving accuracy in the training phase.</p>
<pre><code class="language-python">#freezing all the original weights and compiling the network
from keras import optimizers
optimizer = optimizers.RMSprop(lr=0.00001, rho=0.9, epsilon=None, decay=0.0)
model = Model(inputs=base_model.input, outputs=predictions)
for layer in base_model.layers: layer.trainable = False
model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
model.fit_generator(train_generator, train_generator.n // batch_size, epochs=3, workers=4,
        validation_data=validation_generator, validation_steps=validation_generator.n // batch_size)</code></pre>
<p>The training shouldn’t take long, even when you are using just a CPU instead of a GPU and the output might look something like this:</p>
<figure><img src="https://liip.rokka.io/www_inarticle/dc208a/training.png" alt=""></figure>
<p>You’ll notice that we reached an accuracy of 71% which isn’t too bad, given that we have only 100 original images of each class. </p>
<h3>Fine-tuning</h3>
<p>One thing that we might do now is to unfreeze some of the very last layers in the network and re-train the network again, allowing those layers to change slightly. We’ll do this in the hope that allowing for more “wiggle-room”, while changing most of the actual weights, the network might give us better results. </p>
<pre><code class="language-python "># Make the very last layers trainable
split_at = 140
for layer in model.layers[:split_at]: layer.trainable = False
for layer in model.layers[split_at:]: layer.trainable = True
model.compile(optimizer=optimizers.RMSprop(lr=0.00001, rho=0.9, epsilon=None, decay=0.0), loss='binary_crossentropy', metrics=['accuracy'])    
model.fit_generator(train_generator, train_generator.n // batch_size, epochs=1, workers=3,
        validation_data=validation_generator, validation_steps=validation_generator.n // batch_size)</code></pre>
<figure><img src="https://liip.rokka.io/www_inarticle/2e0324/improvement.png" alt=""></figure>
<p>And indeed it helped our model to go from 71% accuracy to 82%! You might want play around with the learning rates a bit or maybe split it at a different depth, in order to tweak results. But generally I think that just adding more images would be the easiest way to achieve 90% accuracy.  </p>
<h3>Confusion matrix</h3>
<p>In order to see how well our model is doing we might also compute a confusion matrix, thus calculating the true positives, true negatives, and the false positives and false negatives. </p>
<pre><code class="language-python"># Calculating confusion matrix
from sklearn.metrics import confusion_matrix
r = next(validation_generator)
probs = model.predict(r[0])
classes = []
for prob in probs:
    if prob &lt; 0.5:
        classes.append(0)
    else:
        classes.append(1)
cm = confusion_matrix(classes, r[1])
cm</code></pre>
<p>As you can see above I simply took the first batch from the validation generator (so the images of which we know if its a alpakka  or an oryx) and then use the <a href="http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html#sphx-glr-auto-examples-model-selection-plot-confusion-matrix-py">confusion matrix from scikit-learn</a> to output something. So in the example below we see that 28 resp. 27 images of each class were labeled correctly while making an error in 4 resp. 5 images. I would say that’s quite a good result, given that we used only so little data.</p>
<pre><code class="language-python">#example output of confusion matrix
array([[28,  5],
       [ 4, 27]])</code></pre>
<h3>Use the model to predict images</h3>
<p>Last but not least we can of course finally use the model to predict if an animal in our little zoo is an oryx or an alpakka. </p>
<pre><code class="language-python"># Helper function to display images
def load_image(img_path, show=False):

    img = image.load_img(img_path, target_size=(224, 224))
    img_tensor = image.img_to_array(img)                    # (height, width, channels)
    img_tensor = np.expand_dims(img_tensor, axis=0)         # (1, height, width, channels), add a dimension because the model expects this shape: (batch_size, height, width, channels)
    #img_tensor /= 255.                                      # imshow expects values in the range [0, 1]

    if show:
        plt.imshow(img_tensor[0]/255)                           
        plt.axis('off')
        plt.show()

    return img_tensor

# Load two sample images
oryx = load_image("data/valid/oryx/106.jpg", show=True)
alpaca = load_image("data/valid/alpaca/alpaca102.jpg", show=True)
model.predict(alpaka)
model.predict(oryx)</code></pre>
<figure><img src="https://liip.rokka.io/www_inarticle/6d2129/prediction.png" alt=""></figure>
<p>As you can see in the output, our model successfully labeled the alpaca as an alpaca since the value was less than 0.5 and the oryx as an oryx, since the value was &gt; 0.5. Hooray! </p>
<h3>Conclusion or What’s next?</h3>
<p>I hope that the blog post was useful to you, and showed you that you don’t really need much in order to get started with deep learning for image recognition. I know that our example zoo pokedex is really small at this point, but I don’t see a reason (apart from the lack of time and resources) why it should be a problem to scale out from our 2 animals to 20 or 200. </p>
<p>On the technical side, now that we have a model running that’s kind of useful, it would be great to find out how to use it in on a smartphone e.g. the IPhone, to finally have a pokedex that we can really try out in the wild. I will cover that bit in the third part of the series, showing you how to export existing models to Apple mobile phones making use of the <a href="https://developer.apple.com/machine-learning/">CoreML</a> technology. As always I am looking forward to your comments and corrections and point you to the ipython notebook that you can download <a href="https://github.com/plotti/zoo/blob/master/Zoo%20prediction.ipynb">here</a>.</p>]]></description>
                  <enclosure url="http://liip.rokka.io/www_card_2/460c46/adorable-adult-animals-1040396.jpg" length="2036526" type="image/jpeg" />
          </item>
        <item>
      <title>Sentiment detection with Keras, word embeddings and LSTM deep learning networks</title>
      <link>https://www.liip.ch/fr/blog/sentiment-detection-with-keras-word-embeddings-and-lstm-deep-learning-networks</link>
      <guid>https://www.liip.ch/fr/blog/sentiment-detection-with-keras-word-embeddings-and-lstm-deep-learning-networks</guid>
      <pubDate>Fri, 04 May 2018 00:00:00 +0200</pubDate>
      <description><![CDATA[<h3>Overview SaaS</h3>
<p>When it comes to sentiment detection it has become a bit of a commodity. Especially the big 5 vendors offer their own sentiment detection as a service. Google offers an <a href="https://cloud.google.com/natural-language/docs/sentiment-tutorial">NLP API</a> with sentiment detection. Microsoft offers sentiment detection through their <a href="https://azure.microsoft.com/en-us/services/cognitive-services/text-analytics/">Azure</a> platform. IBM has come up with a solution called <a href="https://www.ibm.com/watson/services/tone-analyzer/">Tone Analyzer</a>, that tries to get the &quot;tone&quot; of the message, which goes a bit beyond sentiment detection. Amazon offers a solution called <a href="https://aws.amazon.com/de/blogs/machine-learning/detect-sentiment-from-customer-reviews-using-amazon-comprehend/">comprehend</a> that runs on aws as a lambda. Facebook surprisingly doesn't offer an API or an open source project here, although they are the ones with user generated content, where people often are not <a href="https://www.nzz.ch/digital/facebook-fremdenfeindlichkeit-hass-kommentare-ld.1945">so nice</a> to each other. Interestingly they do not offer any assistance for page owners in that specific matter.</p>
<p>Beyond the big 5 there are a few noteworthy of companies like <a href="https://aylien.com">Aylien</a> and <a href="https://monkeylearn.com">Monkeylearn</a>, that are worth checking out. </p>
<h3>Overview Open Source Solutions</h3>
<p>Of course there are are open source solutions or libraries that offer sentiment detection too.<br />
Generally all of these tools offer more than just sentiment analysis. Most of the outlined SaaS solutions above as well as the open source libraries offer a vast amount of different NLP tasks:</p>
<ul>
<li>part of speech tagging (e.g. &quot;going&quot; is a verb), </li>
<li>stemming (finding the &quot;root&quot; of a word e.g. am,are,is -&gt; be), </li>
<li>noun phrase extraction (e.g. car is a noun), </li>
<li>tokenization (e.g. splitting text into words, sentences), </li>
<li>words inflections (e.g. what's the plural of atlas), </li>
<li>spelling correction and translation. </li>
</ul>
<p>I like to point you to pythons <a href="http://text-processing.com/demo/sentiment/">NLTK library</a>, <a href="http://textblob.readthedocs.io/en/dev/quickstart.html#sentiment-analysis">TextBlob</a>, <a href="https://www.clips.uantwerpen.be/pages/pattern-en#sentiment">Pattern</a> or R's <a href="https://cran.r-project.org/web/packages/tm/index.html">Text Mining</a> module and Java's <a href="http://alias-i.com/lingpipe/demos/tutorial/sentiment/read-me.html">LingPipe</a> library. Finally, I encourage you to have a look at the latest <a href="https://spacy.io">Spacy NLP suite</a>, which doesn't offer sentiment detection per se but has great NLP capabilities. </p>
<p>If you are looking for more options I encourage you to take a look at the full list that I have compiled in our <a href="http://datasciencestack.liip.ch/#nlp">data science stack</a>. </p>
<h3>Let's get started</h3>
<p>So you see, when you need sentiment analysis in your web-app or mobile app you already have a myriad of options to get started. Of course you might build something by yourself if your language is not supported or you have other legal compliances to meet when it comes to data privacy.</p>
<p>Let me walk you through all of the steps needed to make a well working sentiment detection with <a href="https://keras.io">Keras</a> and <a href="https://de.wikipedia.org/wiki/Long_short-term_memory">long short-term memory networks</a>. Keras is a very popular python deep learning library, similar to <a href="http://tflearn.org">TFlearn</a> that allows to create neural networks without writing too much boiler plate code. LSTM networks are a special form or network architecture especially useful for text tasks which I am going to explain later. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/30a13b/keras.png" alt=""></figure>
<h3>Step 1: Get the data</h3>
<p>Being a big movie nerd, I have chosen to classify IMDB reviews as positive or negative for this example. As a benefit the IMDB sample comes already with the Keras <a href="https://keras.io/datasets/">datasets</a> library, so you don't have to download anything. If you are interested though, not a lot of people know that IMDB offers its <a href="https://www.imdb.com/interfaces/">own datasets</a> which can be <a href="https://datasets.imdbws.com">downloaded</a> publicly. Among those we are interested in the ones that contain movie reviews, which have been marked by hand to be either positive or negative. </p>
<pre><code class="language-python">#download the data
from keras.datasets import imdb 
top_words = 5000 
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=top_words)</code></pre>
<p>The code above does a couple of things at once: </p>
<ol>
<li>It downloads the data </li>
<li>It downloads the first 5000 top words for each review </li>
<li>It splits the data into a test and a training set. </li>
</ol>
<figure><img src="https://liip.rokka.io/www_inarticle/fb9a1c/processed.png" alt=""></figure>
<p>If you look at the data you will realize it has been already pre-processed. All words have been mapped to integers and the integers represent the words sorted by their frequency. This is very common in text analysis to represent a dataset like this. So 4 represents the 4th most used word, 5 the 5th most used word and so on... The integer 1 is reserved reserved for the start marker, the integer 2 for an unknown word and 0 for padding. </p>
<p>If you want to peek at the reviews yourself and see what people have actually written, you can reverse the process too:</p>
<pre><code class="language-python">#reverse lookup
word_to_id = keras.datasets.imdb.get_word_index()
word_to_id = {k:(v+INDEX_FROM) for k,v in word_to_id.items()}
word_to_id["&lt;PAD&gt;"] = 0
word_to_id["&lt;START&gt;"] = 1
word_to_id["&lt;UNK&gt;"] = 2
id_to_word = {value:key for key,value in word_to_id.items()}
print(' '.join(id_to_word[id] for id in train_x[0] ))</code></pre>
<p>The output might look like something like this:</p>
<pre><code class="language-python">&lt;START&gt; this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert &lt;UNK&gt; is an amazing actor and now the same being director &lt;UNK&gt; father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for &lt;UNK&gt; and would recommend it to everyone to watch and the fly &lt;UNK&gt; was amazing really cried at the end it was so sad and you know w</code></pre>
<h3>One-hot encoder</h3>
<p>If you want to do the same with your text (e.g. my example are some work reviews) you can use Keras already built in &quot;one-hot&quot; encoder feature that will allow you to encode your documents with integers. The method is quite useful since it will remove any extra marks (e.g. !&quot;#$%&amp;...) and split sentences into words by space and transform the words into lowercase. </p>
<pre><code class="language-python">#one hot encode your documents
from numpy import array
from keras.preprocessing.text import one_hot
docs = ['Gut gemacht',
        'Gute arbeit',
        'Super idee',
        'Perfekt erledigt',
        'exzellent',
        'naja',
        'Schwache arbeit.',
        'Nicht gut',
        'Miese arbeit.',
        'Hätte es besser machen können.']
# integer encode the documents
vocab_size = 50
encoded_docs = [one_hot(d, vocab_size) for d in docs]
print(encoded_docs)</code></pre>
<p>Although the encoding will not be sorted like in our example before (e.g. lower numbers representing more frequent words), this will still give you a similar output:</p>
<pre><code>[[18, 6], [35, 39], [49, 46], [41, 39], [25], [16], [11, 39], [6, 18], [21, 39], [15, 23, 19, 41, 25]]</code></pre>
<h3>Step 2: Preprocess the data</h3>
<p>Since the reviews differ heavily in terms of lengths we want to trim each review to its first 500 words. We need to have text samples of the same length in order to feed them into our neural network. If reviews are shorter than 500 words we will pad them with zeros. Keras being super nice, offers a set of <a href="https://keras.io/preprocessing/text/">preprocessing</a> routines that can do this for us easily. </p>
<pre><code class="language-python"># Truncate and pad the review sequences 
from keras.preprocessing import sequence 
max_review_length = 500 
X_train = sequence.pad_sequences(X_train, maxlen=max_review_length) 
X_test = sequence.pad_sequences(X_test, maxlen=max_review_length) </code></pre>
<figure><img src="https://liip.rokka.io/www_inarticle/27e1ad/padded.png" alt=""></figure>
<p>As you see above (I've just output the padded Array as a pandas dataframe for visibility) a lot of the reviews have padded 0 at the front which means, that the review is shorter than 500 words. </p>
<h3>Step 3: Build the model</h3>
<p>Surprisingly we are already done with the data preparation and can already start to build our model. </p>
<pre><code class="language-python"># Build the model 
embedding_vector_length = 32 
model = Sequential() 
model.add(Embedding(top_words, embedding_vector_length, input_length=max_review_length)) 
model.add(LSTM(100)) 
model.add(Dense(1, activation='sigmoid')) 
model.compile(loss='binary_crossentropy',optimizer='adam', metrics=['accuracy']) 
print(model.summary()) </code></pre>
<p>The two most important things in our code are the following:</p>
<ol>
<li>The Embedding layer and </li>
<li>The LSTM Layer. </li>
</ol>
<p>Lets cover what both are doing. </p>
<h3>Word embeddings</h3>
<p>The embedding layer will learn a word embedding for all the words in the dataset. It has three arguments the input_dimension in our case the 500 words. The output dimension aka the vector space in which words will be embedded. In our case we have chosen 32 dimensions so a vector of the length of 32 to hold our word coordinates. </p>
<p>There are already pre-trained word embeddings (e.g. GloVE or <a href="https://radimrehurek.com/gensim/models/word2vec.html">Word2Vec</a>) that you can <a href="https://nlp.stanford.edu/projects/glove/">download</a> so that you don't have to train your embeddings all by yourself. Generally, these word embeddings are also based on specialized algorithms that do the embedding always a bit different, but we won't cover it here. </p>
<p>How can you imagine what an  embedding actually is? Well generally words that have a similar meaning in the context should be embedded next to each other. Below is an example of word embeddings in a two-dimensional space:</p>
<figure><img src="https://liip.rokka.io/www_inarticle/88d44e/embeddings.png" alt=""></figure>
<p>Why should we even care about word embeddings? Because it is a really useful trick. If we were to feed our reviews into a neural network and just one-hot encode them we would have very sparse representations of our texts. Why? Let us have a look at the sentence &quot;I do my job&quot; in &quot;bag of words&quot; representation with a vocabulary of 1000: So a matrix that holds 1000 words (each column is one word), has four ones in it (one for <strong>I</strong>, one for <strong>do</strong> one for <strong>my</strong> and one for <strong>job</strong>) and 996 zeros. So it would be very sparse. This means that learning from it would be difficult, because we would need 1000 input neurons each representing the occurrence of a word in our sentence. </p>
<p>In contrast if we do a word embedding we can fold these 1000 words in just as many dimensions as we want, in our case 32. This means that we just have an input vector of 32 values instead of 1000. So the word &quot;I&quot; would be some vector with values (0.4,0.5,0.2,...) and the same would happen with the other words. With word embedding like this, we just need 32 input neurons. </p>
<h3>LSTMs</h3>
<p>Recurrent neural networks are networks that are used for &quot;things&quot; that happen recurrently so one thing after the other (e.g. time series, but also words). Long Short-Term Memory networks (LSTM) are a specific type of Recurrent Neural Network (RNN) that are capable of learning the relationships between elements in an input sequence. In our case the elements are words. So our next layer is an LSTM layer with 100 memory units.</p>
<p>LSTM networks maintain a state, and so overcome the problem of a vanishing gradient problem in recurrent neural networks (basically the problem that when you make a network deep enough the information for learning will &quot;vanish&quot; at some point). I do not want to go into detail how they actually work, but <a href="http://colah.github.io/posts/2015-08-Understanding-LSTMs/">here</a> delivers a great visual explanation. Below is a schematic overview over the building blocks of LSTMs.</p>
<p>So our output of the embedding layer is a 500 times 32 matrix. Each word is represented through its position in those 32 dimensions. And the sequence is the 500 words that we feed into the LSTM network. </p>
<p>Finally at the end we have a dense layer with one node with a sigmoid activation as the output. </p>
<p>Since we are going to have only the decision when the review is positive or negative we will use binary_crossentropy for the loss function. The optimizer is the standard one (adam) and the metrics are also the standard accuracy metric. </p>
<p>By the way, if you want you can build a sentiment analysis without LSTMs, then you simply need to replace it by a flatten layer:</p>
<pre><code class="language-python">#Replace LSTM by a flatten layer
#model.add(LSTM(100)) 
model.add(Flatten()) </code></pre>
<h3>Step 4: Train the model</h3>
<p>After defining the model Keras gives us a summary of what we have built. It looks like this:</p>
<pre><code class="language-python">#Summary from Keras
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_1 (Embedding)      (None, 500, 32)           160000    
_________________________________________________________________
lstm_1 (LSTM)                (None, 100)               53200     
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 101       
=================================================================
Total params: 213,301
Trainable params: 213,301
Non-trainable params: 0
_________________________________________________________________
None</code></pre>
<p>To train the model we simply call the fit function,supply it with the training data and also tell it which data it can use for validation. That is really useful because we have everything in one call. </p>
<pre><code class="language-python">#Train the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), nb_epoch=3, batch_size=64) </code></pre>
<p>The training of the model might take a while, especially when you are only running it on the CPU instead of the GPU. When the model training happens, what you want to observe is the loss function, it should constantly be going down, this shows that the model is improving. We will make the model see the dataset 3 times, defined by the epochs parameter. The batch size defines how many samples the model will see at once - in our case 64 reviews. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/1868ba/training.png" alt=""></figure>
<p>To observe the training you can fire up tensor board which will run in the browser and give you a lot of different analytics, especially the loss curve in real time. To do so type in your console:</p>
<pre><code class="language-bash">sudo tensorboard --logdir=/tmp</code></pre>
<h3>Step 5: Test the model</h3>
<p>Once we have finished training the model we can easily test its accuracy. Keras provides a very handy function to do that:</p>
<pre><code class="language-python">#Evaluate the model
scores = model.evaluate(X_test, y_test, verbose=0) 
print("Accuracy: %.2f%%" % (scores[1]*100))</code></pre>
<p>In our case the model achieved an accuracy of around 90% which is excellent, given the difficult task. By the way if you are wondering what the results would have been with the Flatten layer it is also around 90%. So in this case I would use <a href="https://en.wikipedia.org/wiki/Occam%27s_razor">Occam's razor</a> and in case and in doubt: go with the simpler model.</p>
<h3>Step 6: Predict something</h3>
<p>Of course at the end we want to use our model in an application. So we want to use it to create predictions. In order to do so we need to translate our sentence into the corresponding word integers and then pad it to match our data. We can then feed it into our model and see if how it thinks we liked or disliked the movie.</p>
<pre><code class="language-python">#predict sentiment from reviews
bad = "this movie was terrible and bad"
good = "i really liked the movie and had fun"
for review in [good,bad]:
    tmp = []
    for word in review.split(" "):
        tmp.append(word_to_id[word])
    tmp_padded = sequence.pad_sequences([tmp], maxlen=max_review_length) 
    print("%s. Sentiment: %s" % (review,model.predict(array([tmp_padded][0]))[0][0]))
i really liked the movie and had fun. Sentiment: 0.715537
this movie was terrible and bad. Sentiment: 0.0353295</code></pre>
<p>In this case a value close to 0 means the sentiment was negative and a value close to 1 means its a positive review. You can also use &quot;model.predict_classes&quot; to just get the classes of positive and negative. </p>
<h3>Conclusion or what’s next?</h3>
<p>So we have built quite a cool sentiment analysis for IMDB reviews that predicts if a movie review is positive or negative with 90% accuracy. With this we are already <a href="https://en.wikipedia.org/wiki/Sentiment_analysis">quite close</a> to industry standards. This means that in comparison to a <a href="https://www.liip.ch/en/blog/whats-your-twitter-mood">quick prototype</a> that a colleague of mine built a few years ago we could potentially improve on it now. The big benefit while comparing our self-built solution with an SaaS solution on the market is that we own our data and model. We can now deploy this model on our own infrastructure and use it as often as we like. Google or Amazon never get to see sensitive customer data, which might be relevant for certain business cases. We can train it with German or even Swiss German language given that we find a nice dataset, or simply build one ourselves. </p>
<p>As always I am looking forward to your comments and insights! As usual you can download the Ipython notebook with the code <a href="https://github.com/plotti/keras_sentiment/blob/master/Imdb%20Sentiment.ipynb">here</a>.</p>
<p>P.S. The people from monkeylearn contacted me and pointed out that they have written quite an extensive introduction to sentiment detection here: <a href="https://monkeylearn.com/sentiment-analysis/">https://monkeylearn.com/sentiment-analysis/</a> so I point you to that in case you want to read up on the general concepts.</p>]]></description>
                  <enclosure url="http://liip.rokka.io/www_card_2/674f1c/clamp-clips-close-up-160824.jpg" length="2751344" type="image/jpeg" />
          </item>
    
  </channel>
</rss>
