Automatically generate multilingual alt attributes for your images with a LLM and rokka.io

The alt attribute in HTML img elements is crucial and not optional. However, creating them can be cumbersome for content editors and is often overlooked. A multimodal LLM can automatically generate them with sufficient accuracy.

Our image service rokka.io now supports this feature, thanks to OpenAI's Vision API.

In this blog post, we will also share the prompt we used, so you can recreate it for yourself. Alt attributes are vital for enhancing accessibility, and we are keen on promoting their wider use.

Why Alt Attributes?

Besides being mandatory for valid HTML, here's what ChatGPT says about them:

The alt attribute on img tags is used for accessibility, providing a textual description of the image for users who can't see it; as a fallback when the image can't be displayed; for SEO, helping search engines understand and index the image content; and to meet legal accessibility standards.

The Example

Man watering a hanging houseplant with a green watering can.

For the image shown above, rokka and the LLM produced the following accurate descriptions in English, German, and French:

en: Man watering a hanging houseplant with a green watering can.
de: Mann giesst hängende Zimmerpflanze mit grüner Gießkanne.
fr: Homme arrosant une plante suspendue avec un arrosoir vert.

How We Do It

First, we resize the image to a smaller resolution to save costs (width/height of 500 pixels works well). Then, we send that, along with the prompt below, to OpenAI's Vision API, parse the result, and store it in rokka's metadata of the image. That's all. And it works very well most of the time.

If it's implemented into a CMS, you maybe should provide an option for content editors to adjust the alt attribute in case of inaccuracies. We offer this for example in the Drupal rokka plugin, but it's seldom necessary.

The Prompt

You are an alt attribute creator for images. Please describe 
the image suitable for the alt tag in an HTML image element 
in as few words as possible.
Answer in the following format in the languages 
English (en), German (de), French (fr).

Example Output:
en: View from a train window at Aarau station platform with tracks.
de: Blick aus dem Zugfenster auf den Bahnsteig am Bahnhof Aarau mit Gleisen.
fr: Vue depuis la fenêtre du train sur le quai de la gare d'Aarau avec les voies.

Output:

As a response, you should now get a line-by-line description for each image.

Want to Try It Out?

The API calls are documented in the rokka documentation. Currently, paying rokka.io users can use this feature immediately. Contact us if you want to try it out on rokka.

And as mentioned, there's also a Drupal plugin for complete integration of rokka into your Drupal installation.

PS: We are aware that we don't use this feature on this site (not yet, we are working on it). We eat our own dogfood on some upcoming client sites, though.

Do you have a question, a comment, or just feeling inspired? Mention us or share this article on Mastodon or LinkedIn.

Subscribe to blog updates using the RSS Feed.

Related services

Custom Development

Topics

Technology