<?xml version="1.0" encoding="utf-8"?>
<!-- generator="Kirby" -->
<rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom">

  <channel>
    <title>Mot-cl&#233;: analytics &#183; Blog &#183; Liip</title>
    <link>https://www.liip.ch/fr/blog/tags/analytics</link>
    <generator>Kirby</generator>
    <lastBuildDate>Thu, 26 Jul 2018 00:00:00 +0200</lastBuildDate>
    <atom:link href="https://www.liip.ch" rel="self" type="application/rss+xml" />

        <description>Articles du blog Liip avec le mot-cl&#233; &#8220;analytics&#8221;</description>
    
        <language>fr</language>
    
        <item>
      <title>6 Tips for SEO Writing</title>
      <link>https://www.liip.ch/fr/blog/seo-writing</link>
      <guid>https://www.liip.ch/fr/blog/seo-writing</guid>
      <pubDate>Thu, 26 Jul 2018 00:00:00 +0200</pubDate>
      <description><![CDATA[<h2>SEO writing is anything but easy</h2>
<p>We all want to reach the top of Google &amp; other search engines with the help of SEO: Because organic conversion is free, and users trust organic search results more than ads. After all, they’ve found the result themselves. </p>
<p>However, writing for search engines and in line with SEO rules is not an easy task. Writers used to only focus on content, but now we also have to take technical SEO requirements into account and focus on a keyword and supporting keywords. A challenging process. Our six tips  show how to write search enginge optimized copy that will make your content stand above the crowd.</p>
<h2>SEO advice 1: first content, then keywords</h2>
<p>If you try and fit as many SEO keywords into a text as possible, it soon starts sounding quite stiff. So we recommend going ‘old school’ when writing a text: introduction, main section, conclusion. Make sure that your text is tailored to the reading habits of web users. Once the text is finished, it’s time to  weave in your SEO keyword and the supporting keywords. SEO writing therefore comes at the end, another step in the process.</p>
<h2>SEO advice 2: use synonyms</h2>
<p>Thankfully, our colleagues at Google &amp; other search engines are clever engineers, and are making their search engines increasingly smart. Their algorithms can now recognise synonyms and contexts. This means that you can  use alternative words and even long-winded descriptions for your keyword and supporting keywords when getting your text SEO ready.</p>
<h2>SEO advice 3: play with formats</h2>
<p>Who reads all the way to the end of a text nowadays? A long article might make sense for specialist topics, but our experience is that people look for entertainment. Since search engines love what’s popular, text only is no good idea (retention time and page views per session are key concepts). Here’s a few formatting ideas to stop your SEO from becoming too boring.</p>
<ul>
<li>Advice article (like this one)</li>
<li>Photos with captions</li>
<li>Infographic with written explanation</li>
<li>Interview</li>
<li>Article with ‘decorative materials’ like titles, subtitles, quotations and info boxes</li>
<li>Q&amp;A article (buzzword: featured snippets in search engines)</li>
</ul>
<p>Whatever presentation you choose, test the mobile view as well. Make sure that your SEO content is still easy and enjoyable to read on mobile devices.</p>
<h2>SEO advice 4: dip into your bag of tricks</h2>
<p>What every SEO content manual says but is often forgotten: body text isn’t the only place you can put SEO keywords. Make sure that the keyword is in the URL, title tag, all image and video names, and above all in the heading (‘H1’). This means that your keyword has already appeared at least five times without overdoing the text. However, with all these technical tricks, don’t forget the copy itself – so make sure that the SEO keyword appears particularly frequently in the first few paragraphs of text. Google &amp; other search engines will love you for it.</p>
<h2>SEO advice 5: do the maths</h2>
<p>As a rule of thumb, an SEO keyword should make up 3-5% of the text. There are various online tools you can use to check keyword density, this is <a href="https://seo-semantix.de/keyword-tool">one we like</a>.</p>
<p>The statistics for this text are the following<br />
Main keyword:  ‘SEO’ used 34 times = 4%<br />
Supporting keyword: ‘keyword’ used 17 times = 2%</p>
<p>However, this is quite a long text. A 300-word text is enough for a search engine to work with. This means that the keyword needs to appear about 15 times. Instead of using a calculation tool we recommend just highlighting the keywords in the text and then counting them up – this means that you keep an overview, and you know how much redrafting you need to do to keep the search engine happy.</p>
<h2>SEO advice 6: don’t forget the translation</h2>
<p>Give your translation company the right SEO keywords – because SEO keywords don’t come from dictionaries, they come from SEO specialists. And don't forget to keep an eye on what the translation company is doing, since translators are not SEO specialists. Make sure that the translator sticks as closely to your guidelines (and these tips) as the original text does. </p>
<h2>Learnings</h2>
<p>The good news for all fans of relevant, exciting content: high-quality material wins the day. Google clearly emphasised this by </p>
<p><a href="https://searchengineland.com/google-panda-is-now-part-of-googles-core-ranking-signals-240069">integrating a quality algorithm</a> into the search engine’s core algorithm with Panda. And that’s positive news because it rewards good work and not only SEO technicalities. However - good performance is much more than just SEO writing. If you want to know what else it takes, you best <a href="https://www.liip.ch/en/work/seo?gclid=EAIaIQobChMI5Jju5fXn2wIVzMqyCh1HDAvMEAAYASAAEgI6VPD_BwE">talk to this person</a>.</p>
<h2>Checklist</h2>
<ul>
<li>
<p><strong>SEO advice 1: content first, then keywords</strong><br />
Have you written your text? Now you can add in your keyword to work on your SEO.</p>
</li>
<li>
<p><strong>SEO advice 2: use synonyms</strong><br />
Write the ‘right’ text and use paraphrases for your keyword.</p>
</li>
<li>
<p><strong>SEO advice 3: play with formats</strong><br />
Use formats that suit your users (and your keyword) so they truly engage with your content. </p>
</li>
<li>
<p><strong>SEO advice 4: dip into your bag of tricks</strong><br />
Weave your keyword in everywhere – URL, title tag, H1-Hn, alt text, image names, image captions.</p>
</li>
<li>
<p><strong>SEO advice 5: do the maths</strong><br />
4-5% keyword density is perfect for search engines.</p>
</li>
<li>
<p><strong>SEO advice 6: don’t forget the translation</strong><br />
Give your translation company your keyword and supporting keywords– they won’t automatically know them.</p>
</li>
</ul>
<h3>The experts behind this article</h3>
<p>Thanks to <a href="https://www.liip.ch/en/team/fabian-ryf">Fabian</a>, <a href="https://www.liip.ch/en/team/christoph-meier">Christoph</a>,  <a href="https://www.liip.ch/en/team/jenny-zehnder">Jenny</a> and  <a href="https://www.liip.ch/en/team/benoit-pointet">Benoît</a> for content and copy cleverness, and to <a href="https://www.liip.ch/en/team/jeremie-fontana">Jérémie</a> for the  visual. This article would not have been possible without you!</p>]]></description>
                  <enclosure url="http://liip.rokka.io/www_card_2/2272ac/content-seo-checklist.jpg" length="610722" type="image/png" />
          </item>
        <item>
      <title>The Data Stack &#8211; Download the most complete overview of the data centric landscape.</title>
      <link>https://www.liip.ch/fr/blog/data-stack</link>
      <guid>https://www.liip.ch/fr/blog/data-stack</guid>
      <pubDate>Mon, 13 Feb 2017 00:00:00 +0100</pubDate>
      <description><![CDATA[<p>(Web)-Developers are used to stacks, most prominent among them probably the LAMP Stack or the more current MEAN Stack. Of course there are plenty around, but on the other hand, I have not heard too many data scientists talking about so much about data stacks – may it because we think, that in a lot of cases all you need is some python a CSV, pandas, and scikit-learn to do the job.</p>
<p>But when we sat down recently with our team, I realized that we indeed use a myriad of different tools, frameworks, and SaaS solutions. I thought it would be useful to organize them in a meaningful data stack. I have not only included the tools we are using, but I sat down and started researching. It turned out into an extensive list aka. the <strong> data stack PDF.</strong>  This poster will:</p>
<ul>
<li>provide an overview of solutions available in the 5 layers (Sources, Processing, Storage, Analysis, Visualization)</li>
<li>offer you a way to discover new tools and</li>
<li>offer orientation in a very densely populated area</li>
</ul>
<p>So without further ado, here is my data stack overview <a href="http://bit.ly/data_stack">Click to open PDF</a>. Feel free to share it with your friends too.</p>
<figure><a href="http://bit.ly/data_stack"><img src="https://liip.rokka.io/www_inarticle/d702df/liip-data-stack.jpg" alt=""></a></figure>
<p>Liip data stack version 1.0</p>
<h2><a href="http://liip.to/data_stack">Click here to get notified by email when I release version 2.0 of the data stack.</a></h2>
<p>Let me lay out some of the questions that guided me in researching each area and throw in my 5 cents while researching each one of them:</p>
<ul>
<li><strong>Data Sources:</strong>  Where does our data usually come from? For us, it's websites with sophisticated event tracking. But for some projects the data has to be scraped, comes from social media outlets or comes from <a href="https://blog.liip.ch/archive/2016/10/17/counting-people-stairs-particle-photon-node-js.html">IoT devices</a>.</li>
<li><strong>Data Processing:</strong>  How can we initially clean or transform the data? How and where can we store the logs that those events create? Also from where do we also take additional valuable data?</li>
<li><strong>Database:</strong>  What options are out there to store the data? How can we search through it? How can we access big data sources efficiently?</li>
<li><strong>Analysis:</strong>  Which stats packages are available to analyze the data? Which frameworks are out there to do machine learning, deep learning, computer vision, natural language processing?</li>
<li><strong>Visualization, Dashboards, and Applications:</strong>  What happens with the results? What options do we have to visually communicate them? How do we turn those visualizations into dashboards or whole applications? Which additional ways of communicating with the user beside reports/emails are out there?</li>
<li><strong>Business Intelligence:</strong> </li>
</ul>
<p>What solutions are out there that try to integrate the data sourcing, data storage, analysis and visualization in one package? What solutions BI solutions are out there for big data? Are there platforms/solutions that offer more of a flexible data-scientist approach?</p>
<h3>My observations when compiling the list:</h3>
<h4>Data Sources</h4>
<ul>
<li>For scrapers, there are actually quite a lot of open source projects out there that work really well, probably because those are used mostly by developers.</li>
<li>While there is quite a few software as a service solutions with slightly different focus, capturing website data in most cases is done via google analytics, although Piwik offers a nice on-premise alternative.</li>
<li>We have been <a href="https://blog.liip.ch/archive/2016/10/17/counting-people-stairs-particle-photon-node-js.html">experimenting quite a bit</a> with IoT devices and analytics, and it turns out that there seems to be quite a few integrated data-collection and analysis software as a service solutions out there, although you are always able to to use your own (see later) solutions.</li>
<li>For social media data, the data comes either from the platforms themselves via an API (which is probably the default for most projects), but there are some convenient data providers out there that allow you to ingest social media data across all platforms.</li>
</ul>
<h4>Data Processing</h4>
<ul>
<li>While there are excellent open source logging services like graylog or logstash, it can sometimes save a lot of time to use those pricey saas solutions because people have solved all the quirks and tiny problems that open source solutions sometimes have.</li>
<li>While there are some quite old and mature open source solutions (e.g. RabbitMQ or Kafka) in the message queues or streams category, it turned out that there a lot of new open source  stream analytics solutions (Impala, Flink or Flume) in the market and almost all of the big four (Microsoft, Google, Facebook, Amazon) offer their own approaches.</li>
<li>The data cleansing or transformation category is quite a mixed bag. While on one hand there are a number of very mature industry standard solutions (e.g. Talend), there are also alternatives for end users that allow them simply to clean their data without any programming knowledge (e.g. Trifacta or Open Refine)</li>
</ul>
<h4>Databases</h4>
<ul>
<li>Databases: If you haven't followed the development in the databases area closely like me, you might think that solutions will fall either in the SQL (e.g. MySQL) or the NoSQL (e.g. MongoDB) bucket. But apparently a LOT has been going on here, probably among the most notable are the graph based databases (e.g. Neo4J) and the Column Oriented databases (e.g. Hana or Monet DB) that offer a much better performance for BI tasks. There are also some recent experimental highly promising solutions like databases in the GPU (e.g. Mapd) or ones that only sample (e.g. BlinkDB) the whole dataset.</li>
<li>The distributed big data ecosystem: It is mostly populated by mature projects from the Apache foundation that integrate quite well in the Hadoop ecosystem. Worth mentioning are of course the distributed machine learning solutions for large scale processing like Spark or Mahout, that are really handy. There are also a lot of mature options like Cloudera or Hortonworks that offer out of the box integrations.</li>
<li>In Memory Databases or Search: Of course you the first thing that comes to mind is elastic(search) that proved over the years to be a reliable solution. Overall the area is populated by quite a lot of stable open source projects (e.g. Lucene or Solr) while on the other hand, you can now directly tap into search as a service (e.g. AzureSearch or Cloudsearch) from the major vendors. The most interesting projects I will try follow are the fastest in-memory database Exasol and its “competitor” VoltDB.</li>
</ul>
<h4>Analysis / ML Frameworks</h4>
<ul>
<li>Deep Learning Frameworks: Obviously, on one hand, you will find the kind of low-level frameworks like Tensorflow, Torch, and Theano in there. But on the other hand, there are also high-level alternatives that build up upon those like TFlearn (that has been integrated into Tensorflow now) or Keras, which allow you to make progress faster with less coding but also without being able to control all the details. Finally, there are also alternatives to hosting these solutions yourself, in services like the Google ML platform.</li>
<li>Statistic software packages: While maybe a long time ago you could only choose from commercial solutions like SPSS, Matlab or SAS, nowadays there is really a myriad of open source solutions out there. Whole ecosystems have developed around those languages (python, R, Julia etc.). But also even without programming, you can analyze data quite efficiently with tools like Rapidminer, Orange or Rattle. For me, nothing beats the combination of pandas and an ipython notebook.</li>
<li>General ML libraries: I put the focus here on mainly the python ecosystem, although the <a href="https://blog.liip.ch/archive/2015/10/08/machine-learning-on-google-analytics.html">other ones</a> are probably as diverse as this one. With scipy, numpy or scikit-learn we've got a one-stop shop for all your ML needs, but nowadays there are also libraries that take care of the hyperparameter optimization (e.g. REP) or model selection (AutoML). So again here you can also choose your level of immersion yourself.</li>
<li>Computer vision: While you will find a lot of open source libraries that rely on OpenCV somehow a myriad of awesome SaaS solutions (e.g. Google CV, Microsoft CV) from big vendors have popped up in the last years. These will probably beat everything you might hastily build over the weekend but are going to cost you a bit. The deep learning movement has really made computer vision, object detection etc.. really accessible for anyone.</li>
<li>Natural language processing: Here I noticed a similar movement. We used NLP libraries to process social media data (e.g. <a href="https://blog.liip.ch/archive/2016/06/07/whats-your-twitter-mood.html">sentiment</a>analysis) and found that there are really great open source projects or libraries out there. While there are various options for text processing (e.g. natural for node.js, or nltk for python or coreNLP from Stanford), it is deep learning and the SaaS products built upon it that have really made natural language processing available for anyone. I am very impressed with the results of these tools, although I doubt that we will come anywhere close in the next years to computers really understanding us. After all its the holy grail of AI.</li>
</ul>
<h4>Dashboards / Visualization</h4>
<ul>
<li>Visualization: I was really surprised how many js libraries are out there, that allow you to do the fanciest data visualizations in the browser. I mean its great to have those solid libraries like ggplot or matplotlib, or the fancy ones like bokeh or seaborn but if you want to communicate your results to the user in a periodic way, you will need to go through the mobile / browser. I guess we have to thank the strong D3 community for the great developments in this area, but also there are a lot of awesome: SaaS and open source solutions that go way beyond just visualization like Shiny for R or Redash that feel more like a business intelligence solution.</li>
<li>Dashboards: I am personally a big fan of dashing.io because it is simply free and it's in ruby, but plotly has really surprised me as a very useful tool to just create a dashboard without a hassle. There is a myriad of SaaS solutions out there that I stumbled upon when researching this field, which I will have to try. I am not sure if they will all hold up to the shiny expectations, that those websites sell.</li>
<li>Bot Frameworks: Although I think of bots or agents more of a way of interacting with a user, I have put them into the visualization area because they didn't fit in anywhere else. P-Brain.ai and Wit.ai or botpress turn out to be a really fast way to get started here when you just want to build a (slack)-bot. I am however not sure if chatbots will be able to deliver the right results, given the hype with those.</li>
</ul>
<h4>Business Intelligence</h4>
<ul>
<li>Business Intelligence: I thought I knew more or less the alternatives that are out there. But having researched a bit, boy was I surprised to find how much is actually out there. Basically, every vendor of the big four has a very mature solution out there. Yet I found it really hard to distinguish between the different SaaS solutions out there, maybe it's because of the marketing talk, or maybe because they just all do the same thing. It's interesting to compare how potentially business intelligence solutions are offering the capabilities of the before mentioned data stack, but given the variety of different solutions in each layer, I think more and more people will be tempted to pick and chose instead of buying the expensive all in one solution. There are however open source alternatives, of which some feel quite mature (e.g. Kibana or Metabase) while others are quite small but really useful (e.g. Blazer). Also don't judge me too hard, if I put Tableau in there, some may say it's just a visualization tool, others perceive it as a BI solution – I think the boundaries are really blurry in this terrain.</li>
<li>BI on Hadoop: I had to introduce this category because I discovered that a lot of solutions are particularly tailored to working on the Hadoop stack. It's great to see that there are options out there and I am eager to explore this terrain in the future.</li>
<li>Data Science Platforms: What I noticed too is that somehow that data scientists are becoming a target group of integrated business intelligence solutions or data science platforms. I had some experience with BigML and Snowplow before, but it turns out that there is a lot of different platforms popping up, that might make your life much easier. For example, when it comes to deploying your models (like Yhat) or having a totally automated way of learning models (e.g. Datarobot). I am really excited to see what things will pop up here in the future.</li>
</ul>
<p>What I realized that this task of creating an overview of the different tools and solutions in the data-centric area will never be complete. Even when writing this blog post I had to add 14 more tools to the list. And I am aware of the fact that I might have missed some major tools out there, simply because it's hard to be unbiased when researching.</p>
<p>That is why I created a little email list that you can sign up to, and I will send you the updated version of this stack somewhere this year. So sign up to stay up to date (I promise I will not spam you) and write me a comment to let me know of new solutions or to let me know in the comments how you would have segmented this field or what your favorite tools are.</p>
<p><a href="http://liip.to/data_stack">Click here to get notified by email when I release version 2.0 of the data stack.</a></p>]]></description>
                  <enclosure url="http://liip.rokka.io/www_card_2/f8f743/book-stack-books-bookshop-264635.jpg" length="459561" type="image/jpeg" />
          </item>
        <item>
      <title>Competitors Intelligence &#8211; Realtime Marktforschung</title>
      <link>https://www.liip.ch/fr/blog/competitors-intelligence</link>
      <guid>https://www.liip.ch/fr/blog/competitors-intelligence</guid>
      <pubDate>Fri, 22 Jul 2016 00:00:00 +0200</pubDate>
      <description><![CDATA[<p>Wer heute einen Wettbewerbsvorteil haben will, muss seine Mitbewerber gut im Auge behalten. </p>
<p>Datencrawler und SaaS Lösungen wie z.B. „ <a href="http://www.boomerangcommerce.com/">Boomerang Commerce</a>“ unterstützen Unternehmen bei der Datensammlung und Datenaufbereitung. Mit diesen Daten lassen Preisdifferenzen, Produktsegmente und Aktionen vergleichen und Benachteiligung entgegenwirken.</p>
<p>Competitors Intelligence bedeutet demnach, dass ein Shop die Aktivitäten und die Preisgestaltung seiner Mitbewerber überwacht. Dies geschieht oft mittels Crawler, welche die benötigten Daten in eine zentrale Datenbank einspeisen, aus welcher die entsprechenden Analysen veranlasst werden.</p>
<p>Als Groupon Goods nach einem Director of Pricing suchte, war Competitive Intelligence der erste Aufgabenpunkt. Zum Aufgabenbereich gehören Preisvergleich, High-Level Sortimentsstrategie und generelle Retail Trends – alles aufgrund von Daten der Konkurrenz.</p>
<p>Groupon zeigt: Competitive Intelligence ist viel mehr als reine Datenanalyse. Mit Competitor Intelligence werden nicht nur eigenen Daten analyisiert, sondern einen Überblick über den gesamten Markt, sprich Konkurrenzpreise und Verfügbarkeit der Produkte geschaffen. So kann das Unternehmen viel schneller auf steigende Nachfragen reagieren und die eigenen Preise automatisiert an die Nachfrage anpassen – Stichwort „Dynamic Pricing”.</p>]]></description>
          </item>
        <item>
      <title>Predicting how long the b&#246;&#246;gg is going to burn this year with a bit of eyeballing and machine learning.</title>
      <link>https://www.liip.ch/fr/blog/predicting-how-long-the-boogg-is-going-to-burn-this-year-with-a-bit-of-eyeballing-and-machine-learning</link>
      <guid>https://www.liip.ch/fr/blog/predicting-how-long-the-boogg-is-going-to-burn-this-year-with-a-bit-of-eyeballing-and-machine-learning</guid>
      <pubDate>Wed, 13 Apr 2016 00:00:00 +0200</pubDate>
      <description><![CDATA[<p>The tradition demands it that if the böögg explodes after a short time, there will be a lot of summer days, if it takes longer then we will have more rainy days. It reminds me a bit of the <a href="https://en.wikipedia.org/wiki/Groundhog_Day">groundhog day</a>. If you want to know more about the böögg, you should check out the wikipedia page  <a href="https://de.wikipedia.org/wiki/Sechsel%C3%A4uten"><a href="https://de.wikipedia.org/wiki/Sechseläuten">https://de.wikipedia.org/wiki/Sechseläuten</a></a>.</p>
<p>Now people have started to bet on how long it will take for the böögg to explode this year. There is even a <a href="https://www.zuerich.com/en/bang">website</a>  that lets you bet on it and you can win something. <strong>In my first instinct I inserted a random number (13 min 06 seconds) but then thought – isn't there a way to predict it better than with our guts feeling?</strong> Well it turns out there is – since we live in 2016 and have <a href="https://data.stadt-zuerich.ch">open data</a> on all kinds of things. Using this data, what is the prediction for this year?</p>
<h3>590 seconds – approximately 10 minutes.</h3>
<p>We will have to see on Monday to see if this prediction was right – but I can offer you to show now how I got to this prediction with a bit of eyeballing and machine learning. (Actually our dataset is so small that we wouldn't have to use any of the tools that I will show you, but its still fun.)</p>
<h3>Step 1: Get some böögg data</h3>
<p>Thanks to a very helpful <a href="https://github.com/philshem/Sechselaeuten-data">github</a> user we have an already pre-parsed representation of the 56 years of data that the wiki has to offer us on the explosion times of the böögg. To import it with <a href="http://pandas.pydata.org">pandas</a> you need only two of lines:</p>
<pre><code>import pandas as pd
source = "https://raw.githubusercontent.com/philshem/Sechselaeuten-data/master/boeoegg_burn_duration.csv" table = pd.read_csv(source, sep = ',', encoding = 'latin1')</code></pre>
<p>If you use <a href="http://jupyter.org">jupyer</a> notebooks  You will get a nice table of the data that looks like this.</p>
<table class="dataframe" border="1"><thead><tr><th></th>
<th>year</th>
<th>burn_duration_seconds</th>
</tr></thead><tbody><tr><th>0</th>
<td>1952</td>
<td>360</td>
</tr><tr><th>1</th>
<td>1953</td>
<td>480</td>
</tr><tr><th>2</th>
<td>1956</td>
<td>240</td>
</tr><tr><th>3</th>
<td>1958</td>
<td>480</td>
</tr><tr><th>4</th>
<td>1959</td>
<td>480</td>
</tr></tbody></table>
<h3>Step 2: Get some weather data</h3>
<p>Well, now in order to predict how long its going to take this year, the first thing that came to mind was weather data. If it rains it won't probably burn that well and if its cold that doesn't help either. MeteoSchweiz offers some <a href="http://www.meteoswiss.admin.ch/product/output/climate-data/homogenous-monthly-data-processing/data/homog_mo_SMA.txt">open data</a>. You can get it and parse it from here (Btw. Precipation = “Niederschlag”):</p>
<table class="dataframe" border="1"><thead><tr><th></th>
<th>Year</th>
<th>Month</th>
<th>Temperature</th>
<th>Precipitation</th>
</tr></thead><tbody><tr><th>0</th>
<td>1864</td>
<td>1</td>
<td>-6.6</td>
<td>25.7</td>
</tr><tr><th>1</th>
<td>1864</td>
<td>2</td>
<td>-1.5</td>
<td>32.9</td>
</tr><tr><th>2</th>
<td>1864</td>
<td>3</td>
<td>4.5</td>
<td>51.0</td>
</tr><tr><th>3</th>
<td>1864</td>
<td>4</td>
<td>6.8</td>
<td>46.9</td>
</tr><tr><th>4</th>
<td>1864</td>
<td>5</td>
<td>12.3</td>
<td>78.4</td>
</tr></tbody></table>
<h3>Step 3: Merge it.</h3>
<p>Now we have to find out on which day each year the Sechseläuten took place. Luckily someone posted a short python <a href="https://en.wikipedia.org/wiki/Sechseläuten">snippet</a> on wikipedia that helps us with this. It allows us to get the correct month and this lets us look up in our table how cold it was and how much precipitation (Niederschlag) there approximately was on those days.  The merged table looks like this:</p>
<table class="dataframe" border="1"><thead><tr><th></th>
<th>year</th>
<th>burn_duration_seconds</th>
<th>Temperature</th>
<th>Precipitation</th>
</tr></thead><tbody><tr><th>0</th>
<td>1952</td>
<td>360</td>
<td>10.4</td>
<td>103.6</td>
</tr><tr><th>1</th>
<td>1953</td>
<td>480</td>
<td>9.2</td>
<td>85.6</td>
</tr><tr><th>2</th>
<td>1956</td>
<td>240</td>
<td>6.6</td>
<td>95.6</td>
</tr><tr><th>3</th>
<td>1958</td>
<td>480</td>
<td>5.4</td>
<td>102.4</td>
</tr><tr><th>4</th>
<td>1959</td>
<td>480</td>
<td>9.5</td>
<td>65.3</td>
</tr></tbody></table>
<h3>Step 4: Eyeball the data</h3>
<p>So we are almost done. Now its time to have a look at what we will find in the data. Pandas and matplotlib is great for that.  In only two lines it allows us to create a <a href="https://www.google.ch/search?q=scatterplot+matrix&amp;client=safari&amp;rls=en&amp;source=lnms&amp;tbm=isch&amp;sa=X&amp;ved=0ahUKEwibmvjd7YvMAhWHiw8KHdsbC5MQ_AUIBygB&amp;biw=2048&amp;bih=989">scatterplot matrix</a> that shows us all correlations there are amongst the variables.</p>
<pre><code>from pandas.tools.plotting import scatter_matrix
scatter_matrix(table, alpha=0.8, figsize=(6, 6), diagonal='kde') #kde = Kernel Density Estimation</code></pre>
<figure><a href="https://www.liip.ch/content/4-blog/20160413-predicting-how-long-the-boogg-is-going-to-burn-this-year-with-a-bit-of-eyeballing-and-machine-learning/scatterplot-matrix.png"><img src="https://liip.rokka.io/www_inarticle/b5c28d244dd0830c9a32357720b3cda23cf18260/scatterplot-matrix.jpg" alt="scatterplot-matrix"></a></figure>
<p>The things on the diagonal are kind of fancy histograms, and in each cell you get a scatterplot of how well one variable (e.g. year) correlates with another (e.g. temperature). (Global warming anyone?). We might chose to have a more detailed look on the correlations between temperature, year and precipitation and the burning time. Let's see if that helps us to understand what is going on (Btw. the blueish area corresponds to how much confidence we have in the line that we put through the data. The narrower it is the more we “trust” it.)</p>
<figure><a href="https://www.liip.ch/content/4-blog/20160413-predicting-how-long-the-boogg-is-going-to-burn-this-year-with-a-bit-of-eyeballing-and-machine-learning/year.png"><img src="https://liip.rokka.io/www_inarticle/f86bd1198ae7d99581ca3004e27a1fd5714a3169/year.jpg" alt="year"></a></figure>
<figure><a href="https://www.liip.ch/content/4-blog/20160413-predicting-how-long-the-boogg-is-going-to-burn-this-year-with-a-bit-of-eyeballing-and-machine-learning/rain.png"><img src="https://liip.rokka.io/www_inarticle/16f1ae05f98e310950fbf1215e1d21f8450cd440/rain.jpg" alt="rain"></a></figure>
<figure><a href="https://www.liip.ch/content/4-blog/20160413-predicting-how-long-the-boogg-is-going-to-burn-this-year-with-a-bit-of-eyeballing-and-machine-learning/temp.png"><img src="https://liip.rokka.io/www_inarticle/a32f1ddf99e31381e89812462a77440eb960b8b6/temp.jpg" alt="temp"></a></figure>
<p>By eyeballing we see that in the current times the böögg takes longer and longer to explode. We also find that the less humididy or rain there is, the faster it tends to explode. And finally we also see that the hotter it is on that day the faster the böögg tends to explode. So great – this gives us already an idea of what is going on.</p>
<h3>Step 5: Build a model</h3>
<p>We could build a <a href="https://en.wikipedia.org/wiki/Linear_regression">linear regression model</a> to predict the data (something you probably still know from high school), but we could also use something fancier like the <a href="https://en.wikipedia.org/wiki/Lasso_(statistics">en.wikipedia.org/wiki/Lasso_(statistics</a> text: lasso regression), which is a form of machine learning linear regression that tends to put emphasis on the “true” contributors (independent variables) that drive the outcome (dependent variable). Normally we would split the data into a training and test set to measure how good our model is (and also use crossvalidation to find the correct parameters) but to make this post short we won't do any of that. Actually building the model is just one line of python:</p>
<pre><code>lasso_model = Lasso(alpha=0.01, fit_intercept=True).fit(X,y) #X is independent variables (like temperature, rain and rain), y is the vector of burning times that we are trying to guess</code></pre>
<figure><a href="https://www.liip.ch/content/4-blog/20160413-predicting-how-long-the-boogg-is-going-to-burn-this-year-with-a-bit-of-eyeballing-and-machine-learning/lasso_coefficients.png"><img src="https://liip.rokka.io/www_inarticle/c99d77137eb83210ceef06ada9710f66f79c802c/lasso-coefficients.jpg" alt="lasso_coefficients"></a></figure>
<p>We see that the lasso regression especially emphasizes the effect of the temperature on the burning time, while precipitation and year don't seem to be that important and have the inverse effect on the burning time. So something that we have already seen in the graphs above.</p>
<h3>Step 6: Predict stuff</h3>
<p>Well now its time to ask the wise model to make the prediction for this year. We need to guess the temperature this year and the precipitation. Well lets look what meteo predicts. The homepage states: “Der Umzug für Volk und Stände und das Anzünden des Bööggs um 18 Uhr sind die Höhepunkte des zöiftigen Zürcher Frühlingsfestes.”. So looking at the chart around 18 o'clock we see that we will have around 13°C and 0 precipitation.</p>
<figure><a href="https://www.liip.ch/content/4-blog/20160413-predicting-how-long-the-boogg-is-going-to-burn-this-year-with-a-bit-of-eyeballing-and-machine-learning/meteo.png"><img src="https://liip.rokka.io/www_inarticle/7ec2417c5bca2d6b1bbd8f24eaeae903e26f898b/meteo.jpg" alt="meteo"></a></figure>
<p>The last remaining step is to enter the parameters inside the model and see what it predicts:</p>
<pre><code>print "The böögg will burn this year for %s seconds." % lasso_model.predict([[2016,13,0]])[0]
The böögg will burn this year for 590.667449568 seconds.</code></pre>
<p>So finally we arrived at the answer: 590 seconds or approximately 10 minutes. Thats what we could enter into the <a href="https://www.zuerich.com/en/bang">contest</a>  and see if we win, but I guess our chances will be rather slim, since 10 minutes is a very rough guess that everybody gives and our model isn't very good at what it's doing (But to find that out is something for another blog post :) ) On a side note: To find out if the böögg is acually good at predicting how many summer days we will have this year might also be a fun topic :)</p>
<p>I hope you found this post entertaining and maybe have lost a bit of fear to work with python, pandas, matplotlib or scikit.</p>
<p>Drop <a href="mailto:thomas.ebermann@liip.ch">us</a> a line if you feel you might want to experiment with the Ipython Notebook or want to talk to us about your website, app or business data.</p>
<p>In any case see you on Monday @ the böögg to find out what will happen</p>
<p>Plotti</p>]]></description>
                  <enclosure url="http://liip.rokka.io/www_card_2/b6261b/bildschirmfoto-2018-01-21-um-12-21-24.jpg" length="2431892" type="image/png" />
          </item>
        <item>
      <title>Why Piwik matters now</title>
      <link>https://www.liip.ch/fr/blog/why-piwik-matters-now</link>
      <guid>https://www.liip.ch/fr/blog/why-piwik-matters-now</guid>
      <pubDate>Wed, 21 Oct 2015 00:00:00 +0200</pubDate>
      <description><![CDATA[<p><a href="https://piwik.org/">Piwik</a> is an open-source web analytics solution that has been around for quite some years now and has seen a recent <a href="https://github.com/piwik/piwik/graphs/contributors">revival</a> with the advent of Piwik 2.</p>
<p>It proposes <strong>all the necessary tools to capture, collect, process and analyse traffic data</strong> . Yes it has an API, yes fancy reports, segments, dashboards and goals, yes also to custom variables, …</p>
<p>Although I have immense respect for the product team behind Google Analytics, I must admit that Piwik brings three features that are unmet in Google Analytics.</p>
<h2>1. Data ownership and control</h2>
<p>Piwik's best value is when it's self-hosted on a controlled server. it then provides <strong>full control over the location, accesses, life and <a href="http://piwik.org/faq/troubleshooting/faq_42/">death</a> of visitors data</strong> . A must-have for many organisations on the Old Continent.</p>
<h2>2. Single path analysis</h2>
<p>Piwik supports single user path analysis through its <a href="https://piwik.org/docs/user-profile/">visitors profile</a> feature, a feature most competitor solutions don't provide. Single user path analysis is often the only way to get a lively picture of individual customer behaviours on a website, i.e. to <strong>get beyond the statistical representation of the behaviour flow</strong> .</p>
<p>When done correctly (i.e. when <a href="https://piwik.org/docs/privacy/">anonymizing visitor's IP</a>), single path analysis is respectful of privacy.</p>
<h2>3. Content performance analytics</h2>
<p>The web continuing to evolve from static pages to a fluid sequence of <em>content moments</em>, the measurement of content performance will become more and more important.</p>
<p>Piwik proposes a quite simple and elegant solution for <a href="http://piwik.org/docs/content-tracking/">content performance</a>, something that in other solutions is rather considered as advanced practices, when not dirty hacks.</p>
<h2>Give Piwik a try</h2>
<p><a href="http://piwik.org/faq/new-to-piwik/faq_17/">Try it online</a> or get a tour of its  <a href="http://piwik.org/features/">features</a>.</p>]]></description>
          </item>
    
  </channel>
</rss>
