<?xml version="1.0" encoding="utf-8"?>
<!-- generator="Kirby" -->
<rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom">

  <channel>
    <title>Mot-cl&#233;: data science &#183; Blog &#183; Liip</title>
    <link>https://www.liip.ch/fr/blog/tags/data+science</link>
    <generator>Kirby</generator>
    <lastBuildDate>Wed, 09 May 2018 00:00:00 +0200</lastBuildDate>
    <atom:link href="https://www.liip.ch" rel="self" type="application/rss+xml" />

        <description>Articles du blog Liip avec le mot-cl&#233; &#8220;data science&#8221;</description>
    
        <language>fr</language>
    
        <item>
      <title>Progressive web apps, Meteor, Azure and the Data science stack or The future of web development conference.</title>
      <link>https://www.liip.ch/fr/blog/progressive-web-apps-meteor-azure-and-the-data-science-stack-or-the-future-of-web-development-conference</link>
      <guid>https://www.liip.ch/fr/blog/progressive-web-apps-meteor-azure-and-the-data-science-stack-or-the-future-of-web-development-conference</guid>
      <pubDate>Wed, 09 May 2018 00:00:00 +0200</pubDate>
      <description><![CDATA[<h3>Back to the future</h3>
<p>Although the conference (hosted in Zürich last week in the Crown Plaza) had explicitly the word future in the title, I found that often the new trends felt a bit like &quot;back to the future&quot;. Why ? Because it seems that some rather old concepts like plain old SQL, &quot;offline first&quot; or pure javascript frameworks or are making a comeback in web development - but with a twist.  This already  brings us to the first talk. </p>
<h3>Modern single page apps with meteor</h3>
<figure><img src="https://liip.rokka.io/www_inarticle/ce3196/meteor.png" alt=""></figure>
<p>Timo Horstschaefer from <a href="https://www.ledgy.com">Ledgy</a> showed how to create modern single page apps with <a href="https://www.meteor.com">meteor.js</a>. Although every framework promises to &quot;ship more with less code&quot;, he showed that for their project Ledgy - which is a mobile app to allocate shares among stakeholders - they were able to actually write it in less than 3 months using 13'000 lines of code. In comparison to other web frameworks where there is a backend side, that is written in one language (e.g. ruby - rails, python - django etc..) and a js-heavy frontend framework (e.g. react or angular) meteor does things differently by also offering a tightly coupled frontend and a backend part written purely in js. The backend is mostly a node component. In their case it is really slim, by only having 500 lines of code. It is mainly responsible for data consistency and authentication, while all the other logic simply runs in the client. Such client projects really shine especially when having to deal with shaky Internet connections, because meteor takes care of all the data transmission in the backend, and catches up on the changes once it has regained accessibility. Although meteor seemed to have had a rough patch in the community in 2015 and 2016 it is heading for a strong come back. The framework is highly opinionated, but I personally really liked the high abstraction level, which seemed to allow the team a blazingly fast time to market. A quite favorable development seems to be that Meteor is trying to open up beyond MongoDB as a database by offering their own GraphQL client (Apollo) that even outshines Facebook's own client, and so offers developers freedom on the choice of a database solution.</p>
<p>I highly encourage you to have a look at Timo's <a href="http://mypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC/dms/File/Moderne%20Single%20Page-Apps%20mit%20Meteor%20_%20Timo.pdf">presentation.</a> </p>
<h3>The data science stack</h3>
<figure><img src="https://liip.rokka.io/www_inarticle/8b4877/datastack.png" alt=""></figure>
<p>Then it was my turn to present the data science stack. I won't bother you about the contents of my talk, since I've already blogged about it in detail <a href="https://www.liip.ch/en/blog/the-data-science-stack-2018">here</a>. If you still want to  have a look at the presentation, you can <a href="http://mypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC/dms/File/Liip%20Data%20Stack.pdf">download</a> it of course. In the talk offered a very subjective birds eyes view on how the data centric perspective touches modern web standards. An interesting feedback from the panel was the question if such an overview really helps our developers to create better solutions. I personally think that having such maps or collections for orientation helps especially people in junior positions to expand their field of view. I think it might also help senior staff to look beyond their comfort zone, and overcome the saying &quot;if everything you have is a hammer, then every problem looks like a nail to you&quot; - so using the same set of tools for every project. Yet I think the biggest benefit might be to offer the client a really unbiased perspective on his options, of which he might have many more than some big vendors are trying to make him believe. </p>
<h3>From data science stack to data stack</h3>
<figure><img src="https://liip.rokka.io/www_inarticle/ed727f/azure.png" alt=""></figure>
<p>Meinrad Weiss from Microsoft offered interesting insights into a glimpse of the Azure universe, showing us the many options on how data can be stored in an azure cloud. While some facts were indeed surprising, for example Microsoft being unable to find two data centers that were more than 400 miles apart in Switzerland (apparently the country is too small!) other facts like the majority of clients still operating in the SQL paradigm were less surprising. One thing that really amazed me was their &quot;really big&quot; storage solution so basically everything beyond 40 peta!-bytes: The data is spread into 60! storage blobs that operate independently of the computational resources, which can be scaled for demand on top of the data layer. In comparison to a classical hadoop stack where the computation and the data are baked into one node, here the customer can scale up his computational power temporarily and then scale it down after he has finished his computations, so saving a bit of money. In regards to the bill though such solutions are not cheap - we are talking about roughly 5 digits per month entrance price, so not really the typical KMU scenario. Have a look at the <a href="http://mypage.netlive.ch/demandit/files/M_D0861CC4DCEF62DFADC/dms/File/AzureAndData_Meinrad_Microsoft.pdf">presentation</a> if you want a quick refresher on current options for big data storage at Microsoft Azure. An interesting insight was also that while a lot of different paradigms have emerged in the last years, Microsoft managed to include them all (e.g. Gremlin Graph, Cassandra, MongoDB) in their database services unifying their interfaces in one SQL endpoint. </p>
<h3>Offline First or progressive web apps</h3>
<figure><img src="https://liip.rokka.io/www_inarticle/7a4898/pwa.png" alt=""></figure>
<p>Nicro Martin, a leading Web and Frontend Developer from the <a href="https://sayhelloagency.com">Say Hello</a> agency showcased how the web is coming back to mobile again. Coming back? Yes you heard right. If thought you were doing mobile first for many years now, you are right to ask why it is coming back. As it turns out (according to a recent comscore report from 2017) although people are indeed using their mobile heavily, they are spending 87% of their time inside apps and not browsing the web. Which might be surprising. On the other hand though, while apps seem to dominate the mobile usage, more than 50% of people don't install any new apps on their phone, simply because they are happy with the ones the have. Actually they spend 80% of their time in the top 3 apps. That poses a really difficult problem for new apps - how can they get their foot into the door with such a highly habitualized behavior. One potential answer might be <a href="https://developers.google.com/web/progressive-web-apps/">Progressive Web apps</a>, a standard defined by Apple and Google already quite a few years ago, that seeks to offer a highly responsive and fast website behavior that feels almost like an application. To pull this off, the main idea is that a so called &quot;service worker&quot; - a piece of code that is installed on the mobile and continues running in the background - is making it possible for these web apps  to for example send notifications to users while she is not using the website. So rather something that users know from their classical native apps. Another very trivial benefit is that you can install these apps on your home screen, and by tapping them it feels like really using an app and not browsing a website (e.g. there is no browser address bar). Finally the whole website can operate in offline mode too, thanks to a smart caching mechanism, that allows developers to decide what to store on the mobile in contrast to what the browser cache normally does. If you feel like trying out one of these apps I highly recommend to try out <a href="http://mobile.twitter.com">mobile.twitter.com</a>, where Google and Twitter sat together and tried to showcase everything that is possible with this new technology. If you are using an Android phone, these apps should work right away, but if you are using an Apple phone make sure to at least have the most recent update 11.3 that finally supports progressive apps for apple devices. While Apple slightly opened the door to PWAs I fear that their lack of support for the major features might have something to do with politics. After all, developers circumventing the app store and interacting with their customers without an intermediary doesn’t leave much love for Apples beloved app store.  Have a look at Martin's great <a href="https://slides.nicomartin.ch/pwa-internet-briefing.html">presentation</a> here. </p>
<h3>Conclusion</h3>
<p>Overall although the topics were a bit diverse, but I definitely enjoyed the conference. A big thanks goes to the organizers of <a href="http://internet-briefing.ch">Internet Briefing series</a> who do an amazing job of constantly organizing those conferences in a monthly fashing. These are definitely a good way to exchange best practices and eventually learn something new. For me it was the motivation to finally get my hands dirty with progressive web apps, knowing that you don't really need much to make these work.  </p>
<p>As usual I am happy to hear your comments on these topics and hope that you enjoyed that little summary.</p>]]></description>
                  <enclosure url="http://liip.rokka.io/www_card_2/25d9a4/abstract-art-colorful-942317.jpg" length="1981006" type="image/jpeg" />
          </item>
        <item>
      <title>The Data Science Stack 2018</title>
      <link>https://www.liip.ch/fr/blog/the-data-science-stack-2018</link>
      <guid>https://www.liip.ch/fr/blog/the-data-science-stack-2018</guid>
      <pubDate>Mon, 16 Apr 2018 00:00:00 +0200</pubDate>
      <description><![CDATA[<p>More than one year ago I sat down and went through my various github stars and browser bookmarks to compile what I then called the Data Science stack. It was basically an exhaustive collection of tools from which some I use on a daily basis, while others I have only heard of. The outcome was a big PDF poster which you can download <a href="https://www.liip.ch/en/blog/data-stack">here</a>. </p>
<p>The good thing about it was, that every tool I had in mind could be found there somewhere, and like a map I could instantly see to which category it belonged. As a bonus I was able to identify my personal white spots on the map. The bad thing about it was, that as soon as I have compiled the list, it was out of date. So I transferred the collection into a google sheet and whenever a new tool emerged on my horizon I added it there. Since then -  in almost a year - I have added over 102 tools to it. </p>
<h2>From PDF to Data Science Stack website</h2>
<p>While it would be OK to release another PDF of the stack year after year, I thought that might be  a better idea to turn this into website, where everybody can add tools to it.<br />
So without further ado I present you the <a href="http://datasciencestack.liip.ch">http://datasciencestack.liip.ch</a> page. Its goal is still to provide an orientation like the PDF, but eventually never become stale. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/2dd1be/front.png" alt="frontpage"></figure>
<p><strong>Adding Tools: </strong>Adding tools to my google sheet felt a bit lonesome, so I asked others internally to add tools whenever they find new ones too. Finally when moving away from the old google sheet and opening our collection process to everybody I have added a little button on the website that allows everybody to add tools by themselves to the appropriate category. Just send us the name, link and a quick description and we will add it there after a quick sanity check. The goal is to gather user generated input too!  The I am thinking also about turning the website into a “github awesome” repository, so that adding tools can be done more in a programmer friendly way. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/855d2c/add.png" alt="adding tools for everyone"></figure>
<p><strong>Search:</strong> When entering new tools, I realized that I was not sure if that tool already exists on the page, and since tools are hidden away after the first five the CTRL+F approach didn’t really work. That's why the website now has a little search box to search if a tool is already in our list. If not just add it to the appropriate category. </p>
<p><strong>Mailing List:</strong> If you are a busy person and want to stay on top of things, I would not expect you to regularly check back and search for changed entries. This is why I decided to send out a quarterly mailing that contains the new tools we have added since our last data science stack update. This helps you to quickly reconnect to this important topic and maybe also to discover a data science gem you have not heard of yet. </p>
<p><strong>JSON download:</strong> Some people asked me for the raw data of the PDF and at that time I was not able to give it to them quickly enough. That's why I added a json route that allows you to simply download the whole collection as a json file and create your own visualizations / maps or stacks with the tools that we have collected. Maybe something cool is going to come out of this. </p>
<p><strong>Communication:</strong> Scanning through such a big list of options can sometimes feel a bit overwhelming, especially since we don’t really provide any additional info or orientation on the site. That’s why I added multiple ways of contacting us, in case you are just right now searching for a solution for your business. I took the liberty to also link our blog posts that are tagged with machine learning at the bottom of the page, because often we make use of the tools in these. </p>
<p><strong>Zebra integration:</strong> Although it's nowhere visible on the website, I have hooked up the data science stack to our internal “technology database” system, called Zebra (actually Zebra does a lot more, but for us the technology part is relevant). Whenever someone enters a new technology into our technology db, it is automatically added for review to the data science stack. Like this we are basically tapping into the collective knowledge of all of our employees our company. A screenshot below gives a glimpse of our tech db on zebra capturing not only the tool itself but also the common feelings towards it. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/680599/zebra.png" alt="Zebra integration"></figure>
<h2>Insights from collecting tools for one more year</h2>
<p>Furthermore, I would like to provide you with the questions that guided me in researching each area and the insights that I gathered in the year of maintaining this list. Below you see a little chart showing to which categories I have added the most tools in the last year. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/caabb8/graphs2.png" alt="overview"></figure>
<h3>Data Sources</h3>
<p>One of the remaining questions, for us is what tools do offer good and legally compliant ways to capture user interaction?  Instead of Google Analytics being the norm, we are always on the lookout for new and fresh solutions in this area. Despite Heatmap Analytics, another new category I added is «Tag Management˚ Regarding the classic website analytics solutions, I was quite surprised that there are still quite a lot of new solutions popping up. I added a whole lot of solutions, and entirely new categories like mobile analytics and app store analytics after discovering that great github awesome list of analytics solutions <a href="https://github.com/onurakpolat/awesome-analytics">here</a>.</p>
<figure><img src="https://liip.rokka.io/www_inarticle/2aff10/sources2.png" alt="data sources"></figure>
<h3>Data Processing</h3>
<p>How can we initially clean or transform the data? How and where can we store logs that are created by these transformation events? And where do we also take additional valuable data? Here I’ve added quite a few of tools in the ETL area and in the message queue category. It looks like eventually I will need  to split up the “message queue” category into multiple ones, because it feels like this one drawer in the kitchen where everything ends up in a big mess. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/732417/processing.png" alt="data processing"></figure>
<h3>Database</h3>
<p>What options are out there to store the data? How can we search through it? How can we access data sources efficiently? Here I mainly added a few specialized solutions, such as databases focused on storing mainly time series or graph/network data. I might either have missed something, but I feel that since there is no new paradigm shift on the horizon right now (like graph oriented, or nosql, column oriented or newsql dbs). It is probably in the area of big-data where most of the new tools emerged. An awesome list that goes beyond our collection can be found <a href="https://github.com/onurakpolat/awesome-bigdata">here</a>.</p>
<figure><img src="https://liip.rokka.io/www_inarticle/64a802/database.png" alt="database"></figure>
<h3>Analysis</h3>
<p>Which stats packages are available to analyze the data? What frameworks are out there to do machine learning, deep learning, computer vision, natural language processing? Obviously, due to the high momentum of deep learning leads to many new entries in this category. In the “general” category I’ve added quite a few entries, showing that there is still a huge momentum in the various areas of machine learning beyond only deep learning. Interestingly I did not find any new stats software packages, probably hinting that the paradigm of these one size fits all solutions is over. The party is probably taking place in the cloud, where the big five have constantly added more and more specialized machine learning solutions. For example for text, speech, image, video or chatbot/assistant related tasks, just to name a few. At least those were the areas where I added most of the new tools. Going beyond the focus on python there is the awesome <a href="https://github.com/josephmisiti/awesome-machine-learning">list</a> that covers solutions for almost every programming language. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/ab1df5/analysis.png" alt="analysis"></figure>
<h3>Visualization, Dashboards, and Applications</h3>
<p>What happens with the results? What options do we have to visually communicate them? How do we turn those visualizations into dashboards or entire applications? Which additional ways of to communicate with user beside reports/emails are out there? Surprisingly I’ve only added a few new entries here, may it be due to the fact that I accidentally have been quite thorough at research this area last year, or simply because of the fact that somehow the time of js visualizations popping up left and right has cooled off a bit and the existing solutions are rather maturing. Yet this awesome <a href="https://github.com/fasouto/awesome-dataviz">list</a> shows that development in this area is still far from cooling off. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/cd8663/viz.png" alt="visualization"></figure>
<h3>Business Intelligence</h3>
<p>What solutions do exist  that try to integrate data sourcing, data storage, analysis and visualization in one package? What BI solutions are out there for big data? Are there platforms/solutions that offer more of a flexible data-scientist approach (e.g. free choice of methods, models, transformations)? Here I have added solutions that were platforms in the cloud, it seems that it is only logical to offer less and less of desktop oriented BI solutions, due to the restrained computational power or due to the high complexity of maintaining BI systems on premise. Although business intelligence solutions are less community and open source driven as the other stacks, there are also <a href="https://github.com/thenaturalist/awesome-business-intelligence">awsome lists</a> where people curate those solutions. </p>
<figure><img src="https://liip.rokka.io/www_inarticle/7fa6f4/bi.png" alt="business intelligence"></figure>
<p>You might have noticed that I tried to slip in an awsome list on github into almost every category to encourage you to look more in depth into each area. If you want to spend days of your life discovering awesome things, I strongly suggest you to check out this collection of awesome lists <a href="https://github.com/jnv/lists">here</a> or <a href="https://github.com/sindresorhus/awesome or">here</a>.</p>
<h3>Conclusion or what's next?</h3>
<p>I realized that keeping the list up to date in some areas seems almost impossible, while others gradually mature over time and the amount of new tools in those areas is easy to oversee. I also had to recognize that maintaining an exhaustive and always up to date list in those 5 broad categories seems quite a challenge. That's why I went out to get help. I’ve looked for people in our company interested particularly in one of these areas and nominated them technology ambassadors of this part of the stack. Their task will be to add new tools whenever they pop up on their horizon. </p>
<p>I have also come to the conclusion that the stack is quite useful when offering customers a bit of an overview at the beginning of a journey. It adds value to just know what popular solutions are out there and start digging around yourself. Yet separating more mature tools from the experimental ones or knowing which open source solutions have a good community behind it, is quite a hard task for somebody without experience. Somehow it would be great to highlight “the pareto principle” in this stack by pointing out to only a handful of solutions and saying you will be fine when you use those. Yet I also have to acknowledge that this will not replace a good consultation in the long run. </p>
<p>Already looking towards the improvement of this collection, I think that each tool needs some sort of scoring: While there could be plain vanilla tools that are mature and do the job, there are also the highly specialized very experimental tools that offers help in very niche area only. While this information is somewhat buried in my head, it would be good to make it explicit on the website. Here I am highly recommending what Thoughtworks has come up with in their <a href="https://www.thoughtworks.com/radar">technology radar</a>. Although their radar goes well beyond our little domain of data services, it offers a great idea to differentiate tools. Namely into four categories: </p>
<ul>
<li>Adopt: We feel strongly that the industry should be adopting these items. We see them when appropriate on our projects. </li>
<li>Trial: Worth pursuing. It is important to understand how to build up this capability. Enterprises should try this technology on a project that can handle the risk. </li>
<li>Asses: Worth exploring with the goal of understanding how it will affect your enterprise. </li>
<li>Hold: Proceed with caution.</li>
</ul>
<figure><img src="https://liip.rokka.io/www_inarticle/37daaf/radar.png" alt="Technology radar"></figure>
<p>Assessing tools according to these criteria is no easy task - thoughtworks is doing it by nominating a high profile jury that vote regularly on these tools. With 4500 employees, I am sure that their assessment is a representative sample of the industry. For us and our stack, a first start would be to adopt this differentiation, fill it out myself and then get other liipers to vote on these categories. To  a certain degree we have already started this task internally in our tech db, where each employee assessed a common feeling towards a tool. </p>
<p>Concluding this blogpost, I realized that the simple task of “just” having a list with relevant tools for each area seemed quite easy at the start. The more I think about it, and the more experience I collect in maintaining this list, the more realize that eventually such a list is growing into a knowledge and technology management system. While such systems have their benefits (e.g. in onboarding or quickly finding experts in an area) I feel that turning this list into one will be walking down this rabbit hole of which I might never re-emerge. Let’s see what the next year will bring.</p>]]></description>
                  <enclosure url="http://liip.rokka.io/www_card_2/2dd1be/front.jpg" length="4200904" type="image/png" />
          </item>
    
  </channel>
</rss>
