<?xml version="1.0" encoding="utf-8"?><rss version="2.0">
  <channel>
    <title>Liip Blog</title>
    <link>https://www.liip.ch/en/blog</link>
    <lastBuildDate>Mon, 23 Mar 2026 12:07:23 +0100</lastBuildDate>
            <item>
      <title>Iframes are still odd</title>
      <link>https://www.liip.ch/en/blog/iframes-are-still-odd</link>
      <guid>https://www.liip.ch/en/blog/iframes-are-still-odd</guid>
      <pubDate>Mon, 23 Mar 2026 00:00:00 +0100</pubDate>
      <description><![CDATA[<h2>The Challenge</h2>
<p>The application does one - rather complicated - task, with lots of business logic. There was no way we could rewrite it to include it directly into the website code. And because the application comes with its own Javascript and CSS, we decided to use an iframe to embed the application with clean isolation.</p>
<p>The company maintaining that application provided us with a version - running as a Docker container - where they had stripped all extra elements like the navigation, so that it would visually fit within the website. There was however no way for us to customise anything within the application.</p>
<h2>iframe security</h2>
<p>The promise of an iframe is to keep a clean security boundary between embedding page and embedded content. This means that it is by design not possible to call Javascript across the boundary. </p>
<p>Because injecting iframes could potentially trick a user into submitting data to an attacker (clickjacking), as the iframe may be from a different origin than the main page. Thus, to even render the iframe, the browser checks the Content-Security-Policy (CSP) HTTP header. That header has a field frame-src to control what may be included as an iframe. With this, I allowed the domain of the application to be included in iframes.</p>
<pre><code>Content-Security-Policy: frame-src https://my-embed.com;</code></pre>
<p>But not only does the including page need to allow an iframe. The page to be embedded also needs to allow being included with the frame-ancestors attribute of the CSP header. As we run the application Docker image under our control, I was able to add that header in the proxy that runs before the Docker image:</p>
<pre><code>Content-Security-Policy: frame-ancestors https://my-website.com;</code></pre>
<p>Several things to note:</p>
<ul>
<li>If you have other CSP rules, merge them with the rules, nginx will overwrite the header and not add to it</li>
<li>Both options also support "self" to allow embedding resp. being embedded with the same webserver</li>
<li>Prior to the CSP becoming a standard, there was an unofficial header <code>X-Frame-Options</code>, which is still supported by browsers</li>
<li><code>Content-Security-Policy</code> must be an actual HTTP header, <code>&lt;meta http-equiv=”...”&gt;</code> is ignored for <code>Content-Security-Policy</code> (and also ignored for <code>X-Frame-Options</code>).</li>
</ul>
<h2>Size of the iframe element</h2>
<p>Now we get to the weird parts. To prevent multiple scrollbars, we need the iframe element to be exactly big enough for the embedded page. If it is too small, there is an additional scrollbar (or hidden content). If it is too large, there is odd whitespace.</p>
<p>The size of the element needs to be set on the iframe, owned by the parent. The dimensions of the content are however only known by the embedded application. HTML / CSS do not provide any means to let the parent page declare that it wants the iframe to have the “necessary size”. </p>
<p>We ended up with a really convoluted way, which seems the only way to achieve this: sending messages from the child page to the parent. This problem spawned dedicated javascript libraries like <a href="https://github.com/davidjbradshaw/iframe-resizer">iframe-resizer</a>. We ended up reimplementing the logic in the React application, as it was so small that a dedicated library felt like overkill. Following the tutorial at <a href="https://github.com/craigfrancis/iframe-height/">iframe-height</a> (which also has some interesting background on the discussion about iframes in the Whatwg), we came up with this code for the containing website:</p>
<pre><code class="language-js">// register an event listener for messages
window.addEventListener('message', receiveMessage, false);

// handle a message
function receiveMessage(event) {
    const origin = event.origin || event.originalEvent.origin;
    // we configure the expected domain to allow for this additional sanity check
    if (expectedDomain !== origin) {
      return;
    }
    if (!event.data.request || 'iframeResize' !== event.data.request) {
      return;
    }
    // the id is known in the js class. we need to find the element that needs to be resized
    const iframe = document.getElementById(`iframe-${id}`);
    if (iframe) {
      // pad the height a bit to avoid unnecessary tiny scrolling
      iframe.style.height = (event.data.height + 20) + 'px';
      // width could be handled the same way if necessary - in our case the width is fix
    }
}</code></pre>
<p>Now we need to make the embedded content send a message with its height. The Javascript for that is a bit verbose to allow for different browsers, but not complicated either:</p>
<pre><code class="language-js">(
    function(document, window) {
      if (undefined === parent || !document.addEventListener) {
        return;
      }
      function init() {
        let owner = null;
        const width = Math.max(document.body.scrollWidth, document.body.offsetWidth, document.documentElement.clientWidth, document.documentElement.scrollWidth, document.documentElement.offsetWidth);
        const height = Math.max(document.body.scrollHeight, document.body.offsetHeight, document.documentElement.clientHeight, document.documentElement.scrollHeight, document.documentElement.offsetHeight);
        if (parent.postMessage) {
          owner = parent;
        } else if (parent.contentWindow &amp;&amp; parent.contentWindow.postMessage) {
          owner = parent.contentWindow;
        } else {
          return;
        }
        owner.postMessage({
          'request' : 'iframeResize',
          'width' : width,
          'height' : height
        }, '*');
      }

      if (document.readyState !== 'loading') {
        window.setTimeout(init);
      } else {
        document.addEventListener('DOMContentLoaded', init);
      }
      // this is needed to also adjust the iframe if something (e.g. the Javascript of the application) changes the size of the iframe without an actual page reload.
      const observer = new ResizeObserver(init);
      observer.observe(document.body);
    }
  )(document, window);"</code></pre>
<h3>iframes with same origin</h3>
<p>If the iframe comes from the same origin (= domain) as the parent page, Javascript can cross the boundary. From parent to child, there is a <code>contentWindow</code> property on the <code>iframe</code> element. From child to parent, there is <code>window.parent</code>. With same origin, those elements expose all things the window usually has. For different origins, they only expose the function <code>postMessage</code> for the secure separation.</p>
<h2>Injecting content with Nginx</h2>
<p>Remember how I said we can’t modify the application? That still holds true. If we would have loaded both applications from the same domain, we could have had the parent page add a listener inside the iframe to directly update dimensions as needed. But the application contained absolute paths for its assets, so providing it from a subfolder of the same domain would have been tricky and we had to run it on a separate domain. </p>
<p>I ended up injecting the above snippet of Javascript in the Nginx proxy that sits in front of the Docker container:</p>
<pre><code>proxy_set_header Accept-Encoding ""; # make sure we get plain response for substitution to work
...
sub_filter_last_modified on;
sub_filter "&lt;/body&gt;" "&lt;script language='javascript'&gt;${script}&lt;/script&gt;&lt;/body&gt;";
sub_filter_once on;
...
proxy_pass https://my-embed.com$request_uri;</code></pre>
<p>Now the embedded iframe communicates its size to the containing page, which adjusts the iframe size accordingly.</p>
<p>(Note that Nginx does not execute these statements in order. The sub_filter instructions apply to the response, wihle proxy_set_header and proxy_pass apply to the request.)</p>
<h2>Alternatives</h2>
<p>Web Components are a more lightweight solution to combine separate sources into one website. If what you need to integrate is just an element and not a whole application, they might be a better fit. My collegue Falk wrote about <a href="https://www.liip.ch/en/blog/web-components-the-good-the-bad-and-the-ugly">Web Components</a> last week.</p>
<hr />
<h2>Bonus: Access control for the iframe content</h2>
<p>Because the application is not under our control, we need to manage access to it. We told the supplier of the application to remove access control and simply allow items to be created and edited by ID. Of course, this means that the application must never be directly exposed to the internet, but only reachable through the proxy.</p>
<p>On the embedding side, we track which user is allowed what ids, and have Nginx do a pre-flight check against the website to get the access decision:</p>
<pre><code># at the beginning of the location for the main request to the embeded application
auth_request /auth;

location /auth {
    # preflight authorization request with symfony
    fastcgi_pass phpfcgi;
    include /usr/local/openresty/nginx/conf/fastcgi_params;
    fastcgi_param SCRIPT_FILENAME /app/public/index.php;
    # forward the original request URI to allow our main application to verify access to the specific resource
    fastcgi_param REQUEST_URI /embed-check$request_uri;
    # body is not forwarded. we have to remove content length separately, otherwise PHP-FPM will wait for the body until the auth request times out.
    fastcgi_pass_request_body off;
    fastcgi_param CONTENT_LENGTH "";
    fastcgi_param CONTENT_TYPE "";
    internal;
}</code></pre>
<p>If the call at <code>/embed-check/...</code> returns a 2xx status, Nginx continues with the request, otherwise it returns the response with the status code to the client, allowing for example to redirect to the login page. In my case, i return an empty response with status 204 if the user is allowed to access the specific resource.</p>
<p>On Symfony side, I use Symfony security to make sure the user is logged in. And then parse the path to know which item in the application the request wants to access, and check if the user has access. This leaks knowledge about the URL design of the embedded application, which is not avoidable for granular access control.</p>]]></description>
    </item>
        <item>
      <title>Preventing Context Pollution for AI Agents</title>
      <link>https://www.liip.ch/en/blog/preventing-context-pollution-for-ai-agents</link>
      <guid>https://www.liip.ch/en/blog/preventing-context-pollution-for-ai-agents</guid>
      <pubDate>Wed, 18 Mar 2026 00:00:00 +0100</pubDate>
      <description><![CDATA[<p>Context pollution happens when the context window fills up with information that is irrelevant to the current task. The more an agent has to juggle, the more likely it loses track of what it was actually doing.</p>
<p>Here are practical techniques to prevent it.</p>
<h2>Session Hygiene</h2>
<p>Start a fresh session for each task. This is the simplest technique and the easiest to get right. If earlier research is needed, write it into a temporary handoff file and let a new session pick up from there.</p>
<h2>Streamline Tool Calling</h2>
<p>Every tool call adds tokens to the context. Poorly built tools add a lot of them. To keep the context lean:</p>
<ul>
<li>Choose tools and MCPs that are well built and optimize token usage</li>
<li>Make sure via prompting that the right tools are used from the start</li>
</ul>
<p>A single bloated tool response can waste more context than an entire conversation turn.</p>
<h2>Subagents</h2>
<p>Agents can spawn other agents that run in their own context. This isolates work and keeps the parent context clean. It helps most when building large features where individual parts are independent.</p>
<p>The easiest way to use subagents is to prompt something like:</p>
<pre><code>Split the current plan into tasks, use a subagent for each task.</code></pre>
<h2>Persistent Tasks</h2>
<p>I built an MCP for Claude Code called <code>deliverables-mcp</code> that lets an agent create persistent tasks per codebase. Tasks are stored in <code>.claude/deliverables.jsonl</code> and persist across sessions.</p>
<p>This allows:</p>
<ul>
<li>Starting a new session before implementing each task</li>
<li>Running subagents in parallel based on tasks dependencies</li>
<li>Restarting a failed task in a clean session</li>
</ul>
<p>The tool replaces Claude Code's internal tasks and is deliberately called "deliverables" for two reasons:</p>
<ol>
<li>To avoid confusing the agent with two tools both called "tasks"</li>
<li>Deliverables are typically larger than just a task, which is a sweet spot for AI agents. Not so small that handoff cost dominates, but small enough that context problems are rare.</li>
</ol>
<p>You can check out <code>deliverables-mcp</code> on <a href="https://github.com/FalkZ/deliverables-mcp">GitHub</a>.</p>]]></description>
    </item>
        <item>
      <title>The ConfIAnce Chatbot, one year later</title>
      <link>https://www.liip.ch/en/blog/the-confiance-chatbot-one-year-later</link>
      <guid>https://www.liip.ch/en/blog/the-confiance-chatbot-one-year-later</guid>
      <pubDate>Tue, 17 Mar 2026 00:00:00 +0100</pubDate>
      <description><![CDATA[<p>A little less than a year ago, we introduced <a href="https://www.liip.ch/en/blog/confiance-the-first-llm-chatbot-for-general-medicine-in-switzerland">the ConfIAnce chatbot</a>. In collaboration with the Geneva University Hospitals (HUG), we developed this conversational agent to provide easier, interactive access to medical information about common chronic diseases. The content is produced and validated by the medical institution itself.</p>
<p>An article published by the team behind the project in the latest issue of the  Revue Médicale Suisse provides a first assessment one year after its public launch.</p>
<figure><img alt="" src="https://liip.rokka.io/www_inarticle_5/9661ee/rms-confiance.jpg" srcset="https://liip.rokka.io/www_inarticle_5/o-dpr-2/9661ee/rms-confiance.jpg 2x"></figure>
<h2>An official chatbot instead of unreliable answers online</h2>
<p>Primary care, which is essential for a well-functioning healthcare system, is facing a growing shortage, even in urban areas. Without easy access to their primary care physician, many patients turn to the internet to search for answers. Unfortunately, the information they find is often inaccurate or even potentially harmful.</p>
<p>In this context, a well-designed AI solution can help deliver <strong>the right information at the right time</strong>.</p>
<p>This is why we supported HUG in developing a <strong>RAG-based chatbot</strong> (Retrieval Augmented Generation). ConfIAnce is not the first chatbot designed for patients. However, it stands out thanks to its institutional roots, its use of locally validated medical content, and the control layers implemented to ensure reliable responses.</p>
<p>To guarantee safety, the system integrates monitoring mechanisms, including matching, groundedness checks, harmfulness detection, automated testing, and semantic routing.</p>
<h2>Keeping control of the tool to ensure quality</h2>
<p>One key challenge is maintaining control over the system, which requires monitoring capabilities. To achieve this, automated tests are run on all answers generated in response to user questions.</p>
<p>These tests measure the factual consistency of the chatbot’s responses compared with the knowledge base.</p>
<p>In addition, adjustable routing rules allow administrators to maintain human oversight by filtering and directing questions appropriately. Administrators can also immediately take the chatbot offline if a malfunction is suspected.</p>
<p>Topics that are frequently asked about but are insufficiently covered in the source documents are identified. These are then developed further to enrich the knowledge base as part of a continuous improvement process.</p>
<h2>Even the best tool is only useful if people use it</h2>
<p>For the chatbot to be useful, patients need to use it. To support adoption, HUG ran a public information campaign that promoted the tool while setting realistic expectations about what it can do.</p>
<p>ConfIAnce is <strong>not a medical device meant to replace a consultation</strong>. Instead, it provides informational support for questions related to the most common chronic diseases affecting adults.</p>
<p>In February 2025, ConfIAnce was released in beta. Between early February and the end of October 2025: 3,823 users interacted with the chatbot, generating 5,969 conversations  with <strong>11,781 questions</strong> (about two questions per conversation)</p>
<p><strong>Feedback</strong> provided directly through the chatbot is <strong>75% positive</strong>.</p>
<h2>Strong acceptance for a different kind of chatbot</h2>
<p>Chatbots in healthcare journeys are generally well accepted by patients thanks to their constant availability and ease of use. However, studies highlight recurring issues: inconsistent response quality and a lack of transparency regarding sources.</p>
<p>These are precisely the aspects that differentiate ConfIAnce from many other medical chatbots.</p>
<p>Designed to <strong>support, not replace, the relationship between patients and physicians</strong>, ConfIAnce helps primary care doctors by freeing up time so they can focus on practising medicine with the human-centred approach that motivated them to choose this profession.</p>
<p>The chatbot, developed in the specific context of HUG and its information resources, could be adapted to other institutional settings.<br />
For such a project to succeed, access to high-quality data is essential, as was the case here. Beyond that, control layers, automated testing, and user feedback enable continuous improvement and ensure the safety and relevance needed to build trust.</p>]]></description>
    </item>
        <item>
      <title>Making LiipGPT Accessible: Our Journey to WCAG AA Compliance</title>
      <link>https://www.liip.ch/en/blog/making-liipgpt-accessible</link>
      <guid>https://www.liip.ch/en/blog/making-liipgpt-accessible</guid>
      <pubDate>Mon, 16 Mar 2026 00:00:00 +0100</pubDate>
      <description><![CDATA[<p>After focusing on themability for our chatbot <a href="https://www.liipgpt.ch/" rel="noreferrer" target="_blank">LiipGPT</a> (most recently showcased in the <a href="https://zuericitygpt.ch/" rel="noreferrer" target="_blank">Z&uuml;riCityGPT Relaunch</a>), we turned our attention to accessibility with the goal of reaching WCAG AA compliance. As we do with many features, we first examined how industry leaders like ChatGPT, Perplexity, and Claude handle accessibility. While we found room for improvement across the board, this inspired us to think about how we could do better.</p>
<p>Our accessibility journey followed four main steps: automatic scans and quick-fixes, keyboard navigation, mobile zoom optimization, and screen reader experience.</p>
<h2>Automatic Scans and Quick-Fixes</h2>
<p>We started with automated accessibility testing using browser extensions like <a href="https://chromewebstore.google.com/detail/ibm-equal-access-accessib/lkcagbfjnkomcinoddgooolagloogehp" rel="noreferrer" target="_blank">IBM Equal Access Accessibility Checker</a> and <a href="https://www.deque.com/axe/devtools/extension" rel="noreferrer" target="_blank">axe DevTools</a>. These tools helped us identify common issues: missing labels, insufficient color contrast, improper semantic HTML, and missing ARIA attributes. While automated scans only catch about 40% of accessibility issues, they provided a solid foundation for our work.</p>
<h2>Keyboard Navigation</h2>
<p>Proper keyboard navigation is fundamental to accessibility. Ensuring basic Tab navigation works across the app is straightforward, but more complex components like <a href="https://www.w3.org/WAI/ARIA/apg/patterns/tabs/examples/tabs-automatic/" rel="noreferrer" target="_blank">tabs</a>, <a href="https://www.w3.org/WAI/ARIA/apg/patterns/menubar/" rel="noreferrer" target="_blank">menus</a>, and <a href="https://www.w3.org/WAI/ARIA/apg/patterns/dialog-modal/" rel="noreferrer" target="_blank">modals</a> require advanced keyboard interactions: arrow keys, Escape key handling, and focus management that follow official W3C guidelines. Users who rely on keyboard navigation have learned to expect these specific patterns, and deviating from them creates confusion and frustration. Rather than building these patterns from scratch, we leveraged <a href="https://bits-ui.com/" rel="noreferrer" target="_blank">Bits UI</a>, a headless UI library that implements these accessibility guidelines correctly.</p>
<p>Beyond individual components, we implemented focus loops and focus restoration at the application level to keep users oriented as they move through different stages of the chat interface.</p>
<h2>Mobile Zoom Optimization</h2>
<p>During user testing for <a href="https://meinplatz.ch/" rel="noreferrer" target="_blank">meinplatz.ch</a> with users who have disabilities, we observed something striking: many users navigate websites on mobile devices with 200% or more zoom, holding their devices just 10cm from their eyes. This insight highlighted a critical gap in most chatbot implementations.</p>
<p>Most chatbots use fixed-position elements: a chat input at the bottom and often a header at the top. When users zoom in significantly, these fixed elements can consume the entire viewport, making the interface unusable. Unfortunately, reliably detecting user zoom levels is impossible in browsers. Our solution: use Intersection Observer to detect when the header or footer take up more space than expected, then dynamically remove their fixed positioning to restore usability.</p>
<figure class="video"><video autoplay controls loop muted playsinline><source src="https://www.liip.ch/media/pages/blog/making-liipgpt-accessible/343fac4583-1769071651/chatgpt-zoom.mp4" type="video/mp4"></video><figcaption>Fixed-position elements are problematic on zoomed viewports.</figcaption></figure> 
<figure class="video"><video autoplay controls loop muted playsinline><source src="https://www.liip.ch/media/pages/blog/making-liipgpt-accessible/6900992c5e-1769071651/liipgpt-zoom.mp4" type="video/mp4"></video><figcaption>Solution:Revert fixed elements to static positioning when zoom is detected.</figcaption></figure> 
<h2>Screen Reader Experience</h2>
<p>Screen reader accessibility isn't automatic: it requires careful design. We focused on providing clear context through proper page structure (landmarks and headings), ensuring users always understand where they are and what's happening, and provide them shortcut to the key parts of the app.</p>
<h4>Providing Context</h4>
<p>We implemented a comprehensive outline structure with landmarks for main navigation, settings, and input areas. Each message includes proper headings and labels, and we added a skip link after the chat input (at the bottom of the page) to help users quickly return to the top.</p>
<h4>Web Component Challenges</h4>
<p>Working with web components introduced unique challenges. VoiceOver is particularly sensitive to how libraries are implemented. We worked closely with the bits-ui team (who were very responsive to bug reports) and implemented local portals for dropdown menus to avoid VoiceOver navigation issues, for example.</p>
<h4>Managing Announcements</h4>
<p>One of the trickiest challenges was managing VoiceOver announcements when multiple events occur simultaneously. Since queuing announcements doesn't work reliably, we carefully sequenced events and merged related announcements. For example, when a user clicks "select all options" for a list, individual announcements for each option would normally fire and override each other. Instead, we cancel those separate announcements and replace them with a single clear announcement summarizing everything that happened (all items selected or deselected, reset to the predefined set of items, etc.).</p>
<p>Since the chat is a SPA without page reloads, it was also important to announce all changes that are only visually visible, for example: light/dark mode switch, language switch, etc.</p>
<h4>Chat Flow for Screen Readers</h4>
<p>We designed the chat experience specifically for screen reader users:</p>
<ul>
<li>The input field includes both a placeholder and an aria-label with the page title, providing context on page load since the input auto-focuses and users skip over the initial page content.</li>
<li>When a response is being generated, we announce this clearly, providing the same feedback that a visual loading indicator would.</li>
<li>Once a response is ready, it's read without markdown formatting (no bold, no links, etc.) to maintain a natural reading flow.</li>
<li>After reading a response, we make users aware that they can ask another question directly or navigate to the last message's options to provide feedback or view references. We dynamically add this interactive section of the last message (where users are most likely to interact) to the document outline, creating a quick navigation shortcut.</li>
<li>Chat history is structured as articles with labels, making it easy to navigate past conversations.</li>
</ul>
<figure class="video"><video autoplay controls loop muted playsinline><source src="https://www.liip.ch/media/pages/blog/making-liipgpt-accessible/912fd0d7cd-1769018708/screenreader.mp4" type="video/mp4"></video><figcaption>Solution:Navigating the chatbot using VoiceOver screen reader on macOS.</figcaption></figure> 
<h2>Try It Yourself</h2>
<p>You can experience these improvements with <a href="https://www.bs.ch/alva" rel="noreferrer" target="_blank">Alva</a>, the chatbot of the Basel-Stadt administration. Try navigating with <a href="https://www.google.com/search?q=how+to+navigate+a+website+with+voiceover" rel="noreferrer" target="_blank">VoiceOver (MacOS)</a> or <a href="https://www.google.com/search?q=how+to+navigate+a+website+with+nva+screen+reader" rel="noreferrer" target="_blank">NVA (Windows)</a>, use only your keyboard, or zoom in significantly on a mobile device.</p>
<h2>An ongoing journey</h2>
<p>Our next goal is to integrate automated accessibility testing into our CI pipeline. However, as mentioned earlier, automated scans only catch around 40% of accessibility issues. This means we'll still need to carefully plan and test each new feature manually. Nothing replaces human testing when it comes to accessibility—automated tools can flag missing labels or contrast issues, but they can't evaluate whether an interface is actually usable for someone navigating with a screen reader or keyboard.</p>
<p>Accessibility is an ongoing journey, not a destination. We're committed to making LiipGPT usable for everyone, and we'll continue refining our approach based on real-world feedback.</p>
<h2>Need Help with your Accessibility?</h2>
<p>We offer accessibility audits to help you identify and fix issues in your own applications. If you're looking to improve the accessibility of your product, <a href="https://www.liip.ch/en/contact">get in touch with us</a>, we would be happy to help.</p>]]></description>
    </item>
        <item>
      <title>Insights on AI and Open Source for government at Drupal4Gov</title>
      <link>https://www.liip.ch/en/blog/insights-on-ai-and-open-source-for-government-at-drupal4gov</link>
      <guid>https://www.liip.ch/en/blog/insights-on-ai-and-open-source-for-government-at-drupal4gov</guid>
      <pubDate>Wed, 11 Mar 2026 00:00:00 +0100</pubDate>
      <description><![CDATA[<p>The Drupal4Gov conference was packed with interesting talks. Here, you’ll find here my personal highlights. I was also there to showcase our work on the Kanton Basel-Stadt/Alva/blökkli project. We already wrote about it, but I’ll share with you the current status and new features.</p>
<h2>GovNL: From months to minutes to build sites</h2>
<p>GovNL combines open source Drupal components and an open design system to run many Dutch government sites in a way that is accessible and scalable. It reduces time to build new sites from <strong>3 months to about 10 minutes</strong>, pretty impressive to say the least. This is a strong use case of designing for reuse at large scale.</p>
<h2>European Commission: Coordination is key to scaling</h2>
<p>The European Commission already runs no less than <strong>770 sites</strong> and invests heavily in the Drupal ecosystem! What stood out for me was how much they <strong>focus on coordination</strong>—making sure the right content is published through the right channel across that landscape. Open source program offices (OSPOs) were established as a way to drive open source agendas both at government level and inside organisations.</p>
<figure><img alt="" src="https://liip.rokka.io/www_inarticle_5/ac3271/drupal4gov2026-josef.jpg" srcset="https://liip.rokka.io/www_inarticle_5/o-dpr-2/ac3271/drupal4gov2026-josef.jpg 2x"></figure>
<h2>Kanton Basel-Stadt website and Alva: a blueprint for local public administration</h2>
<p>Just before lunch break, it was time for me to present the <strong>different AI use cases we implemented</strong> for <a href="https://www.liip.ch/en/work/projects/basel-stadt">Kanton Basel-Stadt</a>. The canton set new standards with the bs.ch relaunch with user-centred design, topic-based access instead of internal org structure, and <a href="https://www.bs.ch/alva">Alva</a> as <strong>the first AI-based chatbot for a Swiss canton</strong>. The stack is based on open source components and Liip heavily contributed to open source as part of this relaunch. We use Drupal as CMS, Nuxt/Vue, the <a href="https://blokk.li/">bl&ouml;kkli</a> editor for the headless frontend and Elasticsearch for search. Content is produced by a cross-department editorial team following a clear content strategy.</p>
<h2>AI to support the public and editors of the website</h2>
<p>The talk was an opportunity to share figures more than 18 months after the go-live. Today, Alva handles <strong>over 10,000 questions per month</strong>, with about <strong>1.36 questions per conversation and +44% growth since Alva 2.0.</strong> API integrations let the chatbot answer questions based on real-time information. Alva is also used heavily by internal users from the canton as well as the public. The chatbot always shows and validates its sources, which is central to creating trust. </p>
<p>On the editing side, blökkli is working hard to simplify texts. By using <strong>the blökkli editor</strong> with integrated AI, editors can now run a readability audit, see proposed simplifications side-by-side and accept or adapt them. Alva and the AI features on bs.ch are continuously developed further to provide editors and citizens trustworthy AI technology.</p>
<h2>AI assisted technologies at the French Government</h2>
<p>Another inspiring talk to watch was Use-cases of AI in the Services Publics+ platform at the French government. With more than 140.000 experiences shared and over a million of reactions, the system uses AI assisted technology to help State service provide feedback to citizens. They leverage speech to text and real-time summaries as an enabling technology. C’est magnifique!</p>
<h2>The EU trusts open source more than ever</h2>
<p>The European Union introduced <strong>Website Evidence Collector</strong> that scans sites for security issues and is open source. It was notable that they publish it under the <strong>EUPL</strong> (European Union Public Licence), which emphasises interoperability between countries and licences and supports multilingual collaboration. I wonder if Switzerland has something similar? </p>
<p>Not only does the EU trust open source for security they also provide through Interoperable Europe a new portal there’s a useful <strong>Licensing Assistant</strong>. You can <a href="https://interoperable-europe.ec.europa.eu/collection/eupl/solution/licensing-assistant/find-and-compare-software-licenses">find and compare software licences</a> and run a <a href="https://interoperable-europe.ec.europa.eu/collection/eupl/solution/licensing-assistant/compatibility-checker">compatibility checker</a> to see if different open source licences can be combined and whether there are legal complications. </p>
<h2>Using open source is not enough, we need champions</h2>
<p>Finally, <strong>Tiffany Farris</strong> from strategy consultancy <a href="https://www.palantir.net/">Palantir.net</a> (not to be confused with the infamous Palantir Technologies) stressed that <strong>using open source is good, but not enough</strong>. You need <strong>champions</strong> in the organisation who put contribution and ecosystem health on the agenda. Designing for reuse should be a core principle. From a US perspective, procurement is a problem: open source usage has grown, but support mechanisms often haven’t. Treating open source as “free” can lead to contracts going to vendors who brand their work as open source without actually supporting a thriving ecosystem. She proposed concrete <strong>public money, public code</strong>-style amendments to public procurement to better support the ecosystem. This was really an inspiring conclusion to an intense day of learning and exchanges.</p>
<p>You can <a href="https://www.youtube.com/playlist?list=PLNubpNMwP36QH5Y3RlbOiV4f9hjlrxCOo">watch the playlist</a> if you would like to dive deeper into the presentations from Drupal4Gov EU 2026.</p>]]></description>
    </item>
        <item>
      <title>Web Components: The Good, the Bad, and the Ugly</title>
      <link>https://www.liip.ch/en/blog/web-components-the-good-the-bad-and-the-ugly</link>
      <guid>https://www.liip.ch/en/blog/web-components-the-good-the-bad-and-the-ugly</guid>
      <pubDate>Wed, 11 Mar 2026 00:00:00 +0100</pubDate>
      <description><![CDATA[<h1>Introduction</h1>
<p>We created a fully themeable chat UI that can be embedded in any website and has no effect on the parent page. <a href="https://www.bs.ch/alva">Kanton Basel-Stadts Alva</a> and <a href="https://ramms.ch/">RAMMS' Rocky AI</a> are instances of that UI.</p>
<p>This blog post will show you what we learned about creating web components that do not influence the parent page. Here are the good, the bad and the ugly when working with web components.</p>
<h1>The Good</h1>
<p>These are the good parts of web components. They will lay the foundation for why you might use them.</p>
<h2>Portability</h2>
<p>Every system that can handle HTML can handle web components. A simple tag and a script will integrate it into any web framework. It doesn't even need to be a JavaScript framework.</p>
<pre><code class="language-html">&lt;body&gt;
  &lt;your-webcomponent&gt;&lt;/your-webcomponent&gt;
  &lt;script src="path/to/your-webcomponent.js"&gt;&lt;/script&gt;
&lt;/body&gt;</code></pre>
<h2>Native Feel</h2>
<p>IFrames are another way to embed UI into a page, and they are arguably easier to use. But the main difference is that web components feel more native to the page, since they directly integrate into the parent page's layout. This means you can use transparency, intrinsic sizing (size based on the web component's contents), and seamless event communication with the parent page.</p>
<h2>Slots</h2>
<p>With slots you can provide content that will be added at a specified point inside your web component.</p>
<p>In our chat UI, we used a slot to let integrators provide a custom loading spinner. This spinner needs to be visible immediately, before the full theme loads asynchronously.</p>
<h2>Shadow DOM - Isolating Styles</h2>
<p>A robust way to ensure that your styles do not affect the parent page is to use the Shadow DOM. Shadow DOM is a web component feature to add a boundary for styles. Styles applied inside the Shadow DOM never apply to the parent page.</p>
<h3>Caveat: Inheritable CSS Properties</h3>
<p>There is one exception to the isolation where CSS properties of the parent page apply to the web component.</p>
<p>These are the properties that pierce through the boundary:</p>
<ul>
<li>Inheritable CSS properties like <code>color</code>, <code>font-family</code>, <code>line-height</code></li>
<li>CSS custom properties like <code>--my-var</code></li>
</ul>
<p>In practice, we have found it helps to fully specify the common properties like fonts and color on every element. That way you will never be surprised by different styles on integration.</p>
<h1>Vite</h1>
<p>For bundling web components, we can highly recommend Vite. There are a lot of neat tricks you can apply while bundling. Here are the Vite features we used for our web component.</p>
<h2>Inlining Assets</h2>
<p>Vite's <a href="https://vite.dev/guide/assets#explicit-inline-handling">explicit inline handling</a> feature allowed us to inline our external CSS files into the JS bundle.</p>
<pre><code class="language-ts">import cssContentString from "./index.css?inline";</code></pre>
<p>This feature will not only inline the raw content of the imported <code>index.css</code>. It will also resolve all CSS imports, apply PostCSS transforms, and even work with CSS preprocessors like SASS. While inlined CSS is not the most efficient for browsers to render, the benefit is that we can ship a single JS file.</p>
<h2>Library Mode</h2>
<p>The Vite <a href="https://vite.dev/guide/build#library-mode">library mode</a> provides you with fine-grained control of how the bundle should behave. To enable the library mode just add the <code>build.lib</code> option in your Vite config.</p>
<h1>The Bad</h1>
<p>Not everything about web components is great though. Here are the bad parts.</p>
<h2>SSR - Hard to Get Working</h2>
<p>Server-side rendering will almost certainly not work. The rest of the page can still be rendered server-side, but the web component will only show up as a <code>&lt;your-webcomponent&gt;&lt;/your-webcomponent&gt;</code> tag. Its contents will only be rendered in the browser.</p>
<p>There is one <a href="https://lit.dev/docs/ssr/overview/">experimental package by Lit Labs</a> that tries to solve this, but we never tried it.</p>
<h2>Tailwind - Not a Great Fit</h2>
<p>Tailwind feels like a natural choice for web components, but it does not play well with them.</p>
<p>The core issue is twofold. First, Tailwind ships its own CSS reset (called Preflight), which overrides default browser styles. When injected into a page that does not use Tailwind, it potentially breaks the page. Shadow DOM could isolate this reset, but Tailwind is fundamentally not designed to work inside a Shadow DOM. Here is the <a href="https://github.com/tailwindlabs/tailwindcss/discussions/1935">discussion</a> if you are interested.</p>
<p>There are some hacky workarounds, but we tried them and had no success getting them to work reliably.</p>
<p>Our recommendation is to only use Tailwind if you are guaranteed that the parent page also uses it, and then use web components without Shadow DOM.</p>
<h1>The Ugly</h1>
<h2>Verbose Web Components API</h2>
<p>The native web component API is verbose and hard to read. A simple counter component, for example, requires manually defining a class, attaching a shadow root, setting up <code>innerHTML</code>, and wiring event listeners in <code>connectedCallback</code>. This boilerplate adds up quickly. You can see examples of the API <a href="https://github.com/mdn/web-components-examples">here</a>.</p>
<p>Fortunately, web components make for a great compile target. <a href="https://svelte.dev/docs/svelte/custom-elements">Svelte</a> and <a href="https://vuejs.org/api/custom-elements.html#definecustomelement">Vue</a> directly support compiling to web components. <a href="https://blog.logrocket.com/working-custom-elements-react/">React</a> is a bit trickier, but totally doable as well. We used this approach for our chat UI, where the first iteration was built with React and the current one with Svelte.</p>
<h1>Weird Quirks</h1>
<p>Advanced web component features come with edge cases that no documentation warns you about. Even Svelte, which has excellent web component support, ships with a notable <a href="https://svelte.dev/docs/svelte/custom-elements#Caveats-and-limitations">list of caveats</a>.</p>
<p>We even hit an undocumented edge case with slots in Svelte: the bundle script must load after the component markup, or slotted content will not render. An ugly <a href="https://github.com/FalkZ/svelte-web-components-starter/blob/main/src/slot.svelte">wrapper for slots</a> fixes the problem, but quirks like this add up and slow you down.</p>
<h2>Font Loading - Not Working Inside Shadow DOM</h2>
<p>When authoring web components, you get into the habit of defining all stylesheet links etc. in the web component body. As you should, so they do not affect the parent page. But there is another annoying detail here: @font-face will not work when defined in the Shadow DOM. If your web component needs a custom font, you need to inject the font CSS into the parent page to make it work.</p>
<h1>Conclusion</h1>
<p>I do not want to end on this ugly note though. I really think there are cases where web components are the right choice, and in our case we would choose Svelte &amp; web components again.</p>
<p>To help you get started, here is a <a href="https://github.com/FalkZ/svelte-web-components-starter">Svelte starter template</a>.</p>]]></description>
    </item>
        <item>
      <title>City of Zurich&#039;s 900+ Open Data Sets Now Have an MCP Server</title>
      <link>https://www.liip.ch/en/blog/city-of-zurich-s-900-open-data-sets-now-have-an-mcp-server</link>
      <guid>https://www.liip.ch/en/blog/city-of-zurich-s-900-open-data-sets-now-have-an-mcp-server</guid>
      <pubDate>Thu, 26 Feb 2026 00:00:00 +0100</pubDate>
      <description><![CDATA[<p><a href="https://www.linkedin.com/in/alexander-g%C3%BCntert-3379071b6/">Alexander Güntert</a> <a href="https://www.linkedin.com/posts/activity-7432101739589345280-0YcB">posted on LinkedIn</a> about a new open-source project his colleague <a href="https://www.linkedin.com/in/hayaloezkan/">Hayal Oezkan</a> had built: an <a href="https://github.com/malkreide/zurich-opendata-mcp">MCP server for Zurich's open data</a>. The post got quite some reactions and I liked the idea very much. But it still needed a local installation, not something non-developers easily know how to do. So I had it packaged and deployed on our servers, available for anyone to use as the "OGD City of Zurich" remote MCP server.</p>
<p>The City of Zurich publishes over 900 datasets as open data, spread across six different APIs. There's <a href="https://data.stadt-zuerich.ch">CKAN</a> for the main data catalog, a WFS Geoportal for geodata, the Paris API for parliamentary information from the Gemeinderat, a tourism API, SPARQL linked data, and ParkenDD for real-time parking data. All public, all freely available. But until now, making an AI assistant actually use these APIs meant writing custom integrations for each one.</p>
<p>The MCP server wraps all six APIs into 20 tools that any MCP-compatible AI assistant can call directly. Ask "How warm is it in Zurich right now?" and it queries the live weather stations. Ask about parking availability, and it pulls real-time data from 36 parking garages. It also covers parliamentary motions, tourism recommendations, SQL queries on the data store, and GeoJSON features for school locations, playgrounds, or climate data. All through a single, standardized <a href="https://modelcontextprotocol.io/">Model Context Protocol</a> interface.</p>
<p>Hayal Oezkan  built it in Python using FastMCP. One file for the server with all 20 tool handlers. The <a href="https://github.com/malkreide/zurich-opendata-mcp">repo</a> is on GitHub.</p>
<p>Deploying it on our side took very little effort. The server supports both stdio transport for local use (like in Claude Desktop or Claude Code) and SSE and HTTP Streaming for remote deployment. I packaged it with Docker, deployed it to our cluster, and now it's available as a remote MCP server that anyone can add to their AI tools without installing anything locally.</p>
<p>The natural next step was integrating this with <a href="https://zuericitygpt.ch/">ZüriCityGPT</a>. It happened, just not quite in the direction I originally had in mind.</p>
<p>ZüriCityGPT already had its own MCP server at zuericitygpt.ch/mcp, exposing tools for searching the city's website content, "Stadtratsbeschlüsse" and looking up waste collection schedules. Instead of wiring the open data tools into ZüriCityGPT, I went the other way: the open data MCP server now proxies tools from the ZüriCityGPT MCP server. A lightweight proxy client connects to the remote server via streamable-http and forwards calls. The whole thing is about 40 lines of Python.</p>
<p>So now, when you connect to the Zurich Open Data MCP server, you get 23 tools in one place. The 21 original open data tools across six APIs, plus <code>zurich_search</code> for querying the city's knowledge base and <code>zurich_waste_collection</code>  for waste pickup schedules (based on the <a href="https://openerz.metaodi.ch/documentation">OpenERZ API</a>). One MCP endpoint, many services behind it. </p>
<p>A city employee builds something useful in the open, publishes the code, and within a day it's deployed and available to a wider audience. Open data and open source working together, exactly as intended.</p>]]></description>
    </item>
        <item>
      <title>The Canton of Graub&#252;nden Launches a New Website</title>
      <link>https://www.liip.ch/en/blog/the-canton-of-graubunden-launches-a-new-website</link>
      <guid>https://www.liip.ch/en/blog/the-canton-of-graubunden-launches-a-new-website</guid>
      <pubDate>Wed, 25 Feb 2026 00:00:00 +0100</pubDate>
      <description><![CDATA[<p>The Canton of Graubünden is developing a new website. The current structure is based on the administrative organization and includes <strong>several tens of thousands of pages</strong>. The existing site no longer meets today’s requirements for usability or the changed user behavior.</p>
<p>At the heart of the relaunch is <strong>user centricity</strong>. Instead of an internal, administration-driven perspective, a topic-centered structure is being created—similar to what the Canton of Basel-Stadt has already successfully established (more about the project in <a href="https://www.liip.ch/en/work/projects/basel-stadt" rel="noreferrer" target="_blank">this blogpost</a>).</p>
<p><strong>From a content perspective, the initiative? A mammoth project.</strong></p>
<h1>What does a content project of this scale require?</h1>
<p>Organically grown content cannot simply be migrated. A relaunch is usually the moment when content is not only transferred, but rethought in terms of quality.</p>
<p>Because a large volume of content needs to be revised, a <strong>decentralized editorial model</strong> is required. Numerous employees—often without in-depth content experience—are involved. <strong>Good processes, clear roles, and targeted support</strong> are therefore essential:</p>
<ul>
<li>A decisive project lead</li>
<li>A solid inventory and assessment</li>
<li>Concept and strategy</li>
<li>Governance (processes &amp; responsibilities)</li>
<li>Set-up and training of the web editorial team</li>
<li>Operational content production</li>
<li>Quality assurance</li>
</ul>
<p>Liip is accompanying the Canton of Graubünden on this journey. Based on our experience from projects in Basel-Stadt, Solothurn, and various federal offices, a trusting and efficient collaboration has developed.</p>
<h1>How we work together</h1>
<p>Two <a href="https://www.liip.ch/en/services/content/strategic-storytelling" rel="noreferrer" target="_blank">content strategists</a> from Liip work closely with the canton’s project management. The project team contributes internal know-how and decision-making power, while Liip adds operational capacity and extensive experience from similar setups. This arrangement enables high speed—and a clean integration of internal knowledge and external input.</p>
<p>The collaboration spans already more than 1.5 years.</p>
<h2>What are Liip's deliverables?</h2>
<ul>
<li>Content audit</li>
<li>Content strategy &amp; tone of voice</li>
<li>Content governance &amp; editorial model (for the project phase and for ongoing operations)</li>
<li>Content lifecycle model</li>
<li>Content guidelines for the editorial teams</li>
<li>Page types &amp; sample texts (in collaboration with the design team)</li>
<li>Word templates for content creation</li>
<li>Training &amp; support for editorial members</li>
<li>AI-based writing tool “GR Editor” (based on <a href="https://liip-textmate.liipgpt.ch/" rel="noreferrer" target="_blank">TextMate</a>)</li>
<li>Quality assurance</li>
</ul>
<figure><img alt="" src="https://liip.rokka.io/www_inarticle_5/6120c9/gr-work-layers-01-en.jpg" srcset="https://liip.rokka.io/www_inarticle_5/o-dpr-2/6120c9/gr-work-layers-01-en.jpg 2x"></figure>
<h1>What does this mean in detail? From audit to quality assurance</h1>
<p>A relaunch involving more than 120 editors and thousands of pages of content needs one thing above all else: a <strong>clear implementation plan and thorough preparation</strong>. Each step builds on the previous one.</p>
<ol>
<li>
<p><strong><a href="https://www.liip.ch/en/services/content/content-audit" rel="noreferrer" target="_blank">Content audit</a> – creating clarity</strong><br />
We started the project by analyzing a representative selection of existing content. The audit helped identify optimization potential and formed the foundation for all subsequent decisions.</p>
</li>
<li>
<p><strong>The editorial model – roles, responsibilities, and processes</strong><br />
<a href="https://www.liip.ch/en/services/content/content-governance" rel="noreferrer" target="_blank">Content governance</a> and the editorial model we developed created the basic ideas for our work with the web editorial teams. We made a model for the project phase. We also added a plan for after the relaunch.</p>
</li>
<li>
<p><strong>Content strategy &amp; tone of voice – the guiding framework</strong><br />
Governance regulates collaboration and processes; the content strategy defines the editorial direction. It is aimed less at editors and more at providing the conceptual and content foundation for operational elements. The strategy was complemented by two distinct tones of voice for the new website. With the strategy completed, the operational phase began.</p>
</li>
<li>
<p><strong>Page types &amp; sample texts – making the concept tangible</strong><br />
Together with the design team, we developed the new page types. Our <a href="https://www.liip.ch/en/services/content/ux-writing" rel="noreferrer" target="_blank">UX writer</a> created initial realistic content that served as references for finalizing the page types. Working with real text is crucial: only this way can structure, content, and design be meaningfully aligned.</p>
</li>
<li>
<p><strong><a href="https://www.liip.ch/en/services/content/guidelines" rel="noreferrer" target="_blank">Content guidelines</a> – the handbook for editors</strong><br />
Derived from the content strategy, the content guidelines bundle rules, recommendations, examples, and concrete instructions—for clear, accessible, and consistent content.</p>
</li>
<li>
<p><strong>Word templates – because the CMS is being developed in parallel</strong><br />
Since CMS development and content creation run in parallel, texts need to be created outside the CMS. For this purpose, we created Word templates that: reflect the new page structure, integrate all relevant meta information, include guidance for editors. The Word templates were supplemented with visualizations of the individual page types. For each page type, a real text example was set up in Figma and shared with editors as a PDF.</p>
</li>
<li>
<p><strong>AI writing tool “GR Editor” – real-time support</strong><br />
A decentralized editorial model inevitably leads to varying text quality. The GR Editor (based on <a href="https://www.liip.ch/en/blog/textmate" target="blank">TextMate</a>) counteracts this by checking clarity, consistency, and tone of voice directly while writing.<br />
The canton had ideal prerequisites: existing cantonal writing guidelines, the new content guidelines, and the two defined tones of voice.</p>
</li>
<li>
<p><strong>Building the editorial teams &amp; <a href="https://www.liip.ch/en/services/content/trainings" rel="noreferrer" target="_blank">training</a></strong><br />
First, the editorial teams were assembled to match the new website structure: a topic section and an organizational section. For the topic section, topic-based editorial teams were created, each with a coordinator. They act as an interface between the teams and the project management. The support organization (project management + Liip) accompanies the teams. The training sessions were the first practical entry point. The new page types, guidelines, and the GR Editor were introduced.</p>
</li>
</ol>
<figure><img alt="" src="https://liip.rokka.io/www_inarticle_5/892272/gr-editorial-team-02-en.jpg" srcset="https://liip.rokka.io/www_inarticle_5/o-dpr-2/892272/gr-editorial-team-02-en.jpg 2x"></figure>
<ol start="9">
<li>
<p><strong>Editorial support – day-to-day guidance</strong><br />
Content work is currently ongoing. The support organization hosts regular editorial meetings, answers questions, and develops solutions. Each topic team receives individual sparring on their first texts.</p>
</li>
<li>
<p><strong>Quality assurance – before content goes live</strong><br />
With this volume of content, personal responsibility is key. Editors receive checklists to review their own texts. Strategically selected content receives specific feedback. The GR Editor also makes a significant contribution to quality assurance.</p>
</li>
</ol>
<h1>Initial learnings from the project</h1>
<p>A relaunch of this size stretches everyone involved. A decentralized editorial model is unavoidable. Understanding the situation of editorial team members, flexibility in implementing the plan, and pragmatic solutions along the way are essential ingredients for constructive collaboration.</p>
<p>What does this mean in concrete terms? A few examples:</p>
<ul>
<li>“Tone makes the music”: treating each other with respect, addressing open points, taking problems seriously, and allowing co-creation.</li>
<li>Giving editorial teams leeway in scheduling—within the overarching project timeline.</li>
<li>Using tools flexibly (e.g., Excel instead of Miro if it works better).</li>
</ul>
<p>Working with <strong>real text examples</strong> during the development of page types has clearly proven to be a crucial decision. With so many stakeholders, <strong>mature templates and distinct references</strong> are essential to ensure quality, consistency, and speed. The visualizations of the page types also play a central role. For people who think strongly in visual terms, writing in Word templates without a frontend preview is challenging. The visualizations at least serve as a general reference.</p>
<p>External experts can help smooth peak workloads and contribute an outside perspective to discussions. Beyond that, it is rewarding to see a <strong>shared team emerge across organizational boundaries</strong>—working together for the common goal.</p>
<p>…and in exactly this spirit, we are looking forward to the next, final process steps—and to an exciting year for the relaunch 🙂</p>]]></description>
    </item>
        <item>
      <title>Blind testing your chatbot, Arena style</title>
      <link>https://www.liip.ch/en/blog/blind-testing-your-chatbot-arena-style</link>
      <guid>https://www.liip.ch/en/blog/blind-testing-your-chatbot-arena-style</guid>
      <pubDate>Tue, 24 Feb 2026 00:00:00 +0100</pubDate>
      <description><![CDATA[<p>At Liip, we've been building and running AI-powered chatbots for organizations through our <a href="https://www.liip.ch/en/work/projects/liipgpt">LiipGPT</a> platform. Over time, we developed evaluation sets, automated scoring with LLM-as-a-Judge, and various ways to measure chatbot quality. Max wrote about this approach in <a href="https://www.liip.ch/en/blog/no-value-without-trust">No value without trust</a>. But when it comes to comparing two different configurations, like a new prompt versus the old one, or GPT-4o versus Claude Sonnet, automated metrics only get you so far. Sometimes you need actual humans reading actual answers and telling you which one is better.</p>
<h2>The Bias Problem</h2>
<p>The problem is bias. If you know that Response A comes from the expensive model and Response B from the cheaper one, you'll unconsciously read them differently. The idea behind our solution is borrowed from classic A/B testing: show two variants to evaluators without telling them which is which, and let the results speak. For RAG chatbots, the question isn't "which model is generally better?" It's "which configuration produces better answers for this specific knowledge base and these specific users?" That requires comparing pre-generated answers, not live model outputs.</p>
<h2>Side by Side, Let's Evaluate</h2>
<p>So we built Arena mode into the Admin UI. It works like this: you start by running your evaluation set against two different configurations. Maybe you're testing a new system prompt, or switching embedding models, or trying a different retrieval strategy. Each run produces answers for every question in the set. Then you create a comparison, selecting those two runs.</p>
<p>When an evaluator opens the comparison, they see one question at a time. Two answers, labeled A and B. No model names, no hints. The order is shuffled differently for each evaluator using a seeded randomization, so if Alice sees the Claude answer as "A" for question 3, Bob might see it as "B". This prevents evaluators from developing patterns like "A is always the better one."</p>
<p>For each answer, you rate it as Good, Neutral, or Bad. You can add a comment explaining why, which turns out to be incredibly valuable. The quantitative scores tell you which model wins, but the comments tell you why, and often reveal issues you wouldn't catch with automated evaluation.</p>
<figure><img alt="" src="https://liip.rokka.io/www_inarticle_5/3feb95/question-bad.jpg" srcset="https://liip.rokka.io/www_inarticle_5/o-dpr-2/3feb95/question-bad.jpg 2x"></figure>
<h2>More Feedback the Merrier</h2>
<p>Multiple evaluators can rate the same comparison independently. Getting people involved is easy: share links generate a temporary API key, so you can send a URL to a colleague or a client and they can evaluate in a protected area without needing an admin account. They just enter their name and start rating.</p>
<figure><img alt="" src="https://liip.rokka.io/www_inarticle_5/ade347/share.jpg" srcset="https://liip.rokka.io/www_inarticle_5/o-dpr-2/ade347/share.jpg 2x"></figure>
<h2>Understand the Results</h2>
<p>The results page shows a leaderboard: average score, win rate per model, and a breakdown per evaluator. That last part is where it gets interesting: it surfaces inter-annotator agreement, a standard measure of how much evaluators align in their ratings.</p>
<ul>
<li><strong>High agreement</strong>: the quality difference is clear and you can act on the result with confidence.</li>
<li><strong>Low agreement</strong>: the two configurations are close enough that the choice may come down to other factors like cost or latency.</li>
</ul>
<p>You can also drill into individual comments grouped by model, and export everything to Excel for reporting.</p>
<figure><img alt="" src="https://liip.rokka.io/www_inarticle_5/876023/arena-result.jpg" srcset="https://liip.rokka.io/www_inarticle_5/o-dpr-2/876023/arena-result.jpg 2x"></figure>
<h2>Calibrating AI with Human Truth</h2>
<p>Building this was a good reminder that evaluation tooling is never "done." Christian recently wrote about <a href="https://www.liip.ch/en/blog/using-claude-agent-sdk-to-analyse-chatbot-answers">using Claude Agent SDK to analyze chatbot answers</a> automatically, which is the other side of the coin: scaling evaluation with AI. Arena mode complements that by providing the human ground truth that automated evaluation needs to calibrate against.</p>
<p>We're using Arena now whenever we make significant changes to a chatbot's configuration. The signal-to-noise ratio is much better than staring at spreadsheets of automated scores. The feature is available in the Admin UI for any organization that uses our evaluation sets.</p>
<h2>The Next Steps of the Arena</h2>
<p>We're considering adding support for more than two models in a single comparison, and possibly integrating the Arena ratings as ground truth labels for training better LLM-as-a-Judge prompts. For now, blind human evaluation with a simple Good/Neutral/Bad rating scheme gives us exactly what we need: honest answers about which configuration actually works better.</p>]]></description>
    </item>
        <item>
      <title>WebMCP: Making LiipGPT Tools Discoverable by Browser AI Agents</title>
      <link>https://www.liip.ch/en/blog/webmcp-making-liipgpt-tools-discoverable-by-browser-ai-agents</link>
      <guid>https://www.liip.ch/en/blog/webmcp-making-liipgpt-tools-discoverable-by-browser-ai-agents</guid>
      <pubDate>Tue, 24 Feb 2026 00:00:00 +0100</pubDate>
      <description><![CDATA[<p>Two weeks ago, Google shipped <a href="https://developer.chrome.com/blog/webmcp-epp">WebMCP in Chrome 146</a> as an early preview. It's a W3C standard, jointly developed with Microsoft, that lets websites expose structured tools to AI agents running in the browser. Instead of agents scraping your DOM or clicking buttons, they can call well-defined functions with typed parameters and get structured results back. Think of it as MCP, but for the browser.</p>
<p>We already had an <a href="https://www.linkedin.com/posts/chregu_z%C3%BCricitygpt-and-in-extension-liipgpt-is-activity-7407037094528819200-oDdw">MCP server</a> running on <a href="https://zuericitygpt.ch/">ZüriCityGPT</a> and other <a href="https://www.liipgpt.ch/">LiipGPT</a> deployments. Tools like knowledge base search, public transport timetables, and waste collection schedules were available to any MCP client, Claude Desktop, Cursor, or custom integrations. The tool definitions, schemas, and execution logic were all there. WebMCP just needed a bridge to bring them into the browser.</p>
<p>The implementation turned out to be surprisingly straightforward. When the page loads, a small script checks if <code>navigator.modelContext</code> exists. If it does, it fetches the available tools from our existing MCP endpoint using a standard JSON-RPC <code>tools/list</code> request and registers each one with <code>navigator.modelContext.registerTool()</code>. When an AI agent calls a tool, the <code>execute</code> handler sends a <code>tools/call</code> request back to the same endpoint. The entire client-side code is about 30 lines of vanilla JavaScript.</p>
<pre><code class="language-javascript">navigator.modelContext.registerTool({
  name: tool.name,
  description: tool.description,
  inputSchema: tool.inputSchema,
  execute: function(params) {
    return fetch(mcpUrl, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        jsonrpc: '2.0',
        method: 'tools/call',
        params: { name: tool.name, arguments: params }
      })
    }).then(r =&gt; r.json()).then(d =&gt; d.result);
  }
});</code></pre>
<p>The beauty is that the tool set is dynamic and per site setup. ZüriCityGPT exposes search with a special mode for city council resolutions (Stadtratsbeschlüsse, see also <a href="https://strb.zuericitygpt.ch/">StrbGPT</a>), waste collection lookups via the <a href="https://openerz.metaodi.ch/">OpenERZ API</a>, and Swiss public transport departures. A different LiipGPT deployment would expose its own set of tools. The backend decides what's available, the browser just registers whatever comes back.</p>
<p>One thing worth noting: <code>navigator.modelContext</code> is behind a flag in Chrome Canary (<code>chrome://flags/#enable-webmcp-testing</code>) and not available in any stable browser yet. The script uses feature detection, so nothing happens in unsupported browsers. </p>
<p>To actually test it, you need Chrome Canary and the <a href="https://chromewebstore.google.com/detail/model-context-tool-inspec/gbpdfapgefenggkahfehlcenpd">Model Context Tool Inspector</a> extension (which requires a Gemini API key to do anything useful with the tools). But the registration itself works, and seeing "search", "get_waste_collection", and "get_timetable_info" show up in the inspector when visiting a chatbot page is a nice confirmation that everything clicks together.</p>
<p>Now, ZüriCityGPT is a page with essentially one big input field. It's already a chatbot. An AI agent visiting the site could just type into that field. Exposing structured tools on a page that is itself a tool feels a little redundant. But it does open up genuinely new possibilities. A browser agent could search the knowledge base, check the next tram departure, and look up waste collection dates in parallel, without ever touching the chat UI, and combine the results with data from other sites. The chatbot answers questions one at a time. WebMCP lets agents compose capabilities.</p>
<p>Is this useful today? Not really. No stable browser supports it, and the user base of Chrome Canary with experimental flags enabled is, let's say, select. Google and Microsoft are pushing it through the W3C, and <a href="https://venturebeat.com/infrastructure/google-chrome-ships-webmcp-in-early-preview-turning-every-website-into-a">formal browser support is expected by mid-to-late 2026</a>. When that happens, any AI agent visiting ZüriCityGPT will automatically discover what it can do, no documentation reading and UI guessing required.</p>
<p>The interesting part is the convergence. We now have the same tool definitions serving three purposes: LangChain tools for the chatbot's own RAG pipeline, an MCP server for "traditional" AI clients, and WebMCP for browser agents. One set of schemas, three integration points. Adding a new tool to any LiipGPT site automatically makes it available everywhere.</p>
<p>If you want to try it yourself, visit <a href="https://zuericitygpt.ch/">zuericitygpt.ch</a> with Chrome Canary, enable the WebMCP flag, and install the <a href="https://chromewebstore.google.com/detail/model-context-tool-inspec/gbpdfapgefenggkahfehlcenpd">Model Context Tool Inspector</a> extension. You'll see the three registered tools and can ask Gemini when your next paper collection is. Or just type the question into the chatbot, that still works too.</p>]]></description>
    </item>
      </channel>
</rss>