Localisation and Translation on the Web
Coming from the English-speaking world, it can be easy to maintain the bubble that is the English-speaking World Wide Web. But in fact, more than half of web pages are written in languages other than English.
Since starting work at eyeo, I’ve had to think a lot more about localisation and translations because most of our websites are translated into several languages, something I previously didn’t have to really consider before. Once you decide to translate a web page, there are many things to take into account, and a lot of them I've found are useful even if your website is written in only one language.
What Language is the Website? #
One of the first things we need to think about is what language the HTML document (or elements within) are written in. Using the lang
attribute, we can let browsers know what language the web page is written in.
Typically, we would add this attribute to the root element of the document, which is in most cases the HTML element.
<html lang="en">
Adding this attribute to the root element is really important, particularly for users who’s main language used with their machine is not the same as the web page’s language. For example, a French-speaking user who visits this blog.
In the absence of the lang
attribute, the browser will assume the web page is written in the user’s default language, which can lead to some strange results. Here’s an example of a screen reader reading an English web page in a French accent due to a missing lang
attribute -
The lang
attribute is one of the global HTML attributes which allows it to be applied to any HTML element. This means that we can specify different sections of our web page as being written in different languages. This can be really useful, for example, if you are writing an article that references a text in a different language, comme ça, par exemple.
<html lang="en">
<body>
<h1>Localisation and Translation on the Web</h1>
<p>
This can be really useful, for example, if you are writing an article that references a text in a different language, <strong lang="fr">comme ça, par exemple</strong>
</p>
</body>
Specifying Language for External Pages #
When using the lang
attribute, we can tell user agents what language the content on the current webpage is, but what about if we need to link to an external page/resource?
We can specify the language of an externally linked resource using the hreflang
attribute. As it’s name implies, it sets the language of a resource linked to via the href
attribute, and can thus only be applied to elements that have this attribute, i.e. the <a>
, <link>
, and <area>
elements.
<a href="https://adblockplus.org/ar/" hreflang="ar">adblockplus.org (Arabic)</a>
Controlling Translations #
In some cases, we may want a section of the web page to always be displayed in a certain language, never translated. This is the idea behind the new HTML5.1 translate
attribute.
The translate
attribute can accept one of two values:
yes
: The element’s contents should be translatedno
: The element’s contents should not be translated
<html lang="de">
<p>Übersetze mich!</p> <!-- Translate me! -->
<p translate="no">Übersetze mich nicht</p> <!-- Do not translate me -->
</html>
Not currently supported
Unfortunately, this attribute is not currently supported by any browser. However, it's effect can be simulated by using the .notranslate
class, which is respected by Google’s Web Page Translator. Take, for example, the following two paragraphs:
<html lang="de">
<p>Übersetze mich!</p> <!-- Translate me! -->
<p class="notranslate">Übersetze mich nicht</p> <!-- Do not translate me -->
</html>
If this page is translated to another language, only the first paragraph will be translated.
Text Direction #
In many languages, the direction in which text is written is not left-to-right like it is in English. In languages such as Arabic, text is written (and read) from right-to-left.
To change the direction in which text is written, we can use the dir
attribute, which accepts one of three values:
ltr
: Left to Rightrtl
: Right to Leftauto
: Allow the user agent to decide which direction based on the text content
<html lang="ar" dir="rtl">
Based on this root direction, most browsers will apply the corresponding CSS styles to switch the direction in which text is displayed, using the direction
property.
The CSS direction
property accepts one of two values - ltr
or rtl
.
html[dir="rtl"] {
direction: rtl;
}
This property works in the same way as the text-align
property. It doesn’t re-order the words in any way, it just aligns the text in the appropriate direction.
Other relevant CSS properties for controlling text direction include:
writing-mode
: This determines if text is laid out horizontally or vertically and the direction (See MDN)text-orientation
: This determines the orientation of each character (See MDN)
Alternates #
For most sites that are translated into different languages, there are separate pages for each language. For example, there might be several versions of the homepage -
https://adblockplus.org/en/
for the English versionhttps://adblockplus.org/ar/
for the Arabic version- ...
In order for user agents to know of all these separate pages and classify them correctly as the same page, just translated into different languages, we can use the <link>
element, with the alternate
relationship type. In the document <head>
we can write out all the alternate versions for the page -
<html lang="en">
<head>
<link rel="alternate" href="https://adblockplus.org/ar" hreflang="ar">
...
</head>
</html>
Note that we use the hreflang
attribute in combination with the alternate
type to set the language each alternate page is in.
Alternates for Social Media #
When a link to a web page is shared, it’s language is typically determined from the og:locale
meta tag.
<meta name="og:locale" content="en_US">
If there are multiple available locales, we can specify this using the og:locale:alternate
meta tag.
<meta property="og:locale:alternate" content="ar_AR">
Left, Right, Start, End #
Because most of the web was originally written with only English in mind, a lot of CSS was written with the mindset that the start of a line is the left, and the end of the line is the right. But as the web becomes more internationally aware, things are changing.
For example, with Flexbox, the default “left” side of the box is called the “start”, because this can be on any of the four sides of the box itself. A lot of new CSS properties are starting to work this way, for example the new margin-inline-start
property.
The margin-inline-start
property corresponds to the inline “start” margin of an element, and can be equal to any of the four sides of the element depending on the direction of the document. For example, if the direction on an element is right-to-left, then the start margin will be equivalent to the right margin.
span {
direction: rtl;
margin-inline-start: 20px; /* Equivalent to margin-right */
}
Similarly, if the writing-mode
of an element is set as vertical and left-to-right, then the start margin will be equivalent to the top margin.
span {
writing-mode: vertical-lr;
margin-inline-start: 20px; /* Equivalent to margin-top */
}
There are other properties that work in this way, for example the corresponding margin-inline-end
, which works similarly to margin-inline-start
, but applies to the end of the element. Taking the first example from above, if the direction on an element in right-to-left, then the end margin will be equivalent to the left margin.
span {
direction: rtl;
margin-inline-end: 20px; /* Equivalent to margin-left */
}
There are so many more considerations that are involved when creating a localised website, particularly one that accepts user input. Feel free to share any tips you have come across in the comments below.