Difference between revisions of "Internationalization of Haskell programs using gettext"

From HaskellWiki
Jump to navigation Jump to search
m
m (correct errors (thanks to Michael Thompson))
Line 1: Line 1:
In the GNU world, the most common approach to internationalization (i18n) is to use [http://www.gnu.org/software/gettext/ GNU gettext] utilities. In this tutorial we will create a simple "Hello world" program with multilingual support.
+
The approach I'll talk about is based on GNU [http://www.gnu.org/software/gettext/ gettext] utility. All my experience on this utility is taken from internationalizing Python applications. So I adapted this experience to the Haskell world.
   
==Prepare program to internationalization==
+
=== Prepare program for internationalization ===
   
Suppose we want to make the following program multilingual (file '''Main.hs'''):
+
Let's start with an example. Suppose that we want to make the following program multilingual:
  +
  +
<haskell>module Main where
   
<haskell>
 
module Main where
 
 
 
import IO
 
import IO
   
Line 15: Line 14:
 
putStrLn $ "Hello, " ++ name ++ ", how are you?"
 
putStrLn $ "Hello, " ++ name ++ ", how are you?"
 
</haskell>
 
</haskell>
  +
First of all, to wrap all the strings, you want some 'translation' function '__':
   
  +
<haskell>module Main where
First of all, wrap all strings you want to translate in function <hask>__</hask>:
 
   
<haskell>
 
module Main where
 
 
 
import IO
 
import IO
   
Line 30: Line 27:
 
putStrLn $ (__ "Hello, ") ++ name ++ (__ ", how are you?")
 
putStrLn $ (__ "Hello, ") ++ name ++ (__ ", how are you?")
 
</haskell>
 
</haskell>
  +
We will return to the definition of '__' a bit later; for now we will leave the function empty (<hask>id</hask>).
   
  +
=== Translate ===
   
  +
The next step is to generate a POT file (a template which contains all strings to needed to be translated). For Python, C, C++ and Scheme there is the xgettext utility, but it doesn't support Haskell. So I created simple utility, that does the same thing for haskell files --- '''hgettext'''. You could find it on Hackage.
We will return to the definition of <hask>__</hask> a bit later, for now leave this function as a no-op (<hask>id</hask>).
 
   
  +
Now, from the directory that contains your project, run this command:
==Translate==
 
   
  +
<pre>hgettext -k __ -o messages.pot Main.hs</pre>
The next step is to generate a POT file (a template which contain all strings needing translation). For Python, C, C++ and Scheme languages there is the xgettext utility, but it doesn't support Haskell. On [[Hackage]] you can download [http://hackage.haskell.org/cgi-bin/hackage-scripts/package/hgettext hgettext] library and utilities, which process haskell source files in the same way as xgettext does for C/C++ files:
 
  +
It will gather all strings containing the function '__' from the Main.hs and write everything to messages.pot.
 
<tt> cabal install --global hgettext </tt>
 
 
Now run this from the directory where your project is:
 
 
<tt>hgettext -k __ -o messages.pot Main.hs</tt>
 
 
This gathers all strings marked by the function <tt>__</tt> from the file <tt>Main.hs</tt> and writes everything to <tt>messages.pot</tt>.
 
   
 
Now look at the resulting pot file:
 
Now look at the resulting pot file:
   
  +
<pre># Translation file
<tt>
 
   
<pre>
 
# Translation file
 
 
 
msgid ""
 
msgid ""
 
msgstr ""
 
msgstr ""
  +
 
 
"Project-Id-Version: PACKAGE VERSION\n"
 
"Project-Id-Version: PACKAGE VERSION\n"
 
"Report-Msgid-Bugs-To: \n"
 
"Report-Msgid-Bugs-To: \n"
Line 76: Line 65:
 
#: Main.hs:0
 
#: Main.hs:0
 
msgid ", how are you?"
 
msgid ", how are you?"
msgstr ""
+
msgstr ""</pre>
  +
We are interested in the last part of this file -- the parts beginning with <tt>#: Main.hs:...</tt>. Each is followed by a pair of lines beginning with <tt>msgid</tt> and <tt>msgstr</tt>. <tt>msgid</tt> is the original text from the code, and <tt>msgstr</tt> is the translated string. Each language should have its own translation file. I will create two translations: German and English.
</pre>
 
</tt>
 
   
  +
To create a PO file for specific locale we should use the <tt>msginit</tt> utility.<br />
  +
To generate the German translation template run:
   
  +
<pre>msginit --input=messages.pot --locale=de.UTF-8</pre>
We are interested in the bottom part of this file (starting from <tt>'#: Main.hs:...'</tt>). Here we can see pairs of lines: <tt>msgid</tt> and <tt>msgstr</tt>: <tt>msgid</tt> is the original text from the code, and <tt>msgstr</tt> is the translated string. Each language should have its own translation file. I will create two translations, German and English.
 
  +
And for English translations run:
   
  +
<pre>msginit --input=messages.pot --locale=en.UTF-8</pre>
To create a PO file for specific locale we should use the <tt>msginit</tt> utility:
 
  +
If we look at the generated files (<tt>en.po</tt> and <tt>de.po</tt>), we will see that English translation is completely filled, only the German PO file needs to be edited. So we fill it with following strings:
   
To generate the German translation file template run:
 
 
<tt>msginit --input=messages.pot --locale=de.UTF-8</tt>
 
 
And for the English translation run:
 
 
<tt>msginit --input=messages.pot --locale=en.UTF-8</tt>
 
 
If we look at the generated files (<tt>en.po</tt> and <tt>de.po</tt>), we will see that the English translation is completely filled, we have only to edit the German PO file. So fill it with the following strings:
 
 
<tt>
 
 
<pre>
 
<pre>
 
#: Main.hs:0
 
#: Main.hs:0
Line 107: Line 88:
 
#: Main.hs:0
 
#: Main.hs:0
 
msgid ", how are you?"
 
msgid ", how are you?"
msgstr ", wie geht es Ihnen?"
+
msgstr ", wie geht es Ihnen?"</pre>
  +
=== Install translation files ===
</pre>
 
</tt>
 
   
  +
Now we have to create directories where these translations should be placed. Originally all translation files are placed in the folder <tt>/usr/share/locale/</tt> , but you are free to select a different place. Run:
==Install translation files==
 
   
  +
<pre>mkdir -p {de,en}/LC_MESSAGES</pre>
Now we have to create the directories where these translations should be placed. By default all translation files are placed on <tt>/usr/share/locale/</tt> folder, but we are free to select different places. Run:
 
  +
This will create two sub-directories 'de' and 'en', each containing <tt>LC_MESSAGES</tt>, in the current directory. Now we use the <tt>msgfmt</tt> tool to encode our po files to mo files (binary translation files):
   
<tt>mkdir -p {de,en}/LC_MESSAGES</tt>
+
<pre>msgfmt --output-file=en/LC_MESSAGES/hello.mo en.po
  +
msgfmt --output-file=de/LC_MESSAGES/hello.mo de.po</pre>
  +
=== Turn on internationalization in the code ===
   
  +
Ok, now the preparatory tasks are done. The final step is to modify the code to support the internationalization:
It will create two directories in the current directory, <tt>de</tt> and <tt>en</tt>, that contain <tt>LC_MESSAGES</tt>. Now use <tt>msgfmt</tt> tool to encode our <tt>po</tt> files to <tt>mo</tt> files (binary translation files):
 
   
  +
<haskell>module Main where
<tt>
 
<pre>
 
msgfmt --output-file=en/LC_MESSAGES/hello.mo en.po
 
msgfmt --output-file=de/LC_MESSAGES/hello.mo de.po
 
</pre>
 
</tt>
 
 
==Enable internationalization in the code==
 
 
As the final step we have to modify the code to support the internationalization:
 
   
<haskell>
 
module Main where
 
 
 
import IO
 
import IO
 
import Text.I18N.GetText
 
import Text.I18N.GetText
Line 150: Line 121:
 
putStrLn $ (__ "Hello, ") ++ name ++ (__ ", how are you?")
 
putStrLn $ (__ "Hello, ") ++ name ++ (__ ", how are you?")
 
</haskell>
 
</haskell>
  +
Here we added three initialization strings:
   
  +
<haskell>setLocale LC_ALL (Just "")
  +
bindTextDomain "hello" "."
  +
textDomain "hello" </haskell>
  +
You'll have to download the <tt>setlocale</tt> package to enable the first function: it sets the current locale to the default value. The next two functions tell <tt>gettext</tt> to take the "hello.mo" message file from the locale directory (I set it to ".", but in general case, this directory should be passed from the package configuration).
   
  +
The final step is to define the function '__'. It simply calls <hask>getText</hask> from the module <hask>Text.I18N.GetText</hask>. Its type is <hask>String -> IO String</hask> so I used <hask>unsafePerformIO</hask> to make it simpler the.
Here we added three initialization commands:
 
   
  +
=== Run and test the program ===
<haskell>
 
setLocale LC_ALL (Just "")
 
bindTextDomain "hello" "."
 
textDomain "hello"
 
</haskell>
 
   
  +
Now you can build and try the program in different locales:
   
  +
<pre>user> ghc --make Main.hs
The first one (you'll have to download [http://hackage.haskell.org/cgi-bin/hackage-scripts/package/setlocale setlocale] package to enable this function) sets the current locale to a default value. The next two functions tell <tt>gettext</tt> to take the '''"hello.mo"''' message file from the locale directory (I set it to ".", but, in the general case, this directory should be passed from the package configuration).
 
  +
[1 of 1] Compiling Main ( Main.hs, Main.o )
 
The final step — define function <hask>__</hask>. It simply calls <hask>getText</hask> from the module <hask>Text.I18N.GetText</hask>, but its type is <hask>String -> IO String</hask> so here we used <hask>unsafePerformIO</hask> to make it simpler to call.
 
 
==Run the program==
 
 
Now you can build and try this program in different locales:
 
 
<tt>
 
<pre>
 
user> ghc --make Main.hs
 
[1 of 1] Compiling Main ( Main.hs, Main.o )
 
 
Linking Main ...
 
Linking Main ...
   
Line 185: Line 148:
 
Hallo, Bond, wie geht es Ihnen?
 
Hallo, Bond, wie geht es Ihnen?
   
user>
+
user></pre>
  +
=== Distribute internationalized cabal package ===
</pre>
 
</tt>
 
 
==Distribute internationalized cabal package==
 
   
 
TBD
 
TBD

Revision as of 08:09, 29 March 2009

The approach I'll talk about is based on GNU gettext utility. All my experience on this utility is taken from internationalizing Python applications. So I adapted this experience to the Haskell world.

Prepare program for internationalization

Let's start with an example. Suppose that we want to make the following program multilingual:

module Main where

import IO 

main = do
  putStrLn "Please enter your name:"
  name <- getLine
  putStrLn $ "Hello, " ++ name ++ ", how are you?"

First of all, to wrap all the strings, you want some 'translation' function '__':

module Main where

import IO 

__ = id

main = do
  putStrLn (__ "Please enter your name:")
  name <- getLine
  putStrLn $ (__ "Hello, ") ++ name ++ (__ ", how are you?")

We will return to the definition of '__' a bit later; for now we will leave the function empty (id).

Translate

The next step is to generate a POT file (a template which contains all strings to needed to be translated). For Python, C, C++ and Scheme there is the xgettext utility, but it doesn't support Haskell. So I created simple utility, that does the same thing for haskell files --- hgettext. You could find it on Hackage.

Now, from the directory that contains your project, run this command:

hgettext -k __ -o messages.pot Main.hs

It will gather all strings containing the function '__' from the Main.hs and write everything to messages.pot.

Now look at the resulting pot file:

# Translation file

msgid ""
msgstr ""

"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2009-01-13 06:05-0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME \n"
"Language-Team: LANGUAGE \n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

#: Main.hs:0
msgid "Please enter your name:"
msgstr ""

#: Main.hs:0
msgid "Hello, "
msgstr ""

#: Main.hs:0
msgid ", how are you?"
msgstr ""

We are interested in the last part of this file -- the parts beginning with #: Main.hs:.... Each is followed by a pair of lines beginning with msgid and msgstr. msgid is the original text from the code, and msgstr is the translated string. Each language should have its own translation file. I will create two translations: German and English.

To create a PO file for specific locale we should use the msginit utility.
To generate the German translation template run:

msginit --input=messages.pot --locale=de.UTF-8

And for English translations run:

msginit --input=messages.pot --locale=en.UTF-8

If we look at the generated files (en.po and de.po), we will see that English translation is completely filled, only the German PO file needs to be edited. So we fill it with following strings:

#: Main.hs:0
msgid "Please enter your name:"
msgstr "Wie heißen Sie?"

#: Main.hs:0
msgid "Hello, "
msgstr "Hallo, "

#: Main.hs:0
msgid ", how are you?"
msgstr ", wie geht es Ihnen?"

Install translation files

Now we have to create directories where these translations should be placed. Originally all translation files are placed in the folder /usr/share/locale/ , but you are free to select a different place. Run:

mkdir -p {de,en}/LC_MESSAGES

This will create two sub-directories 'de' and 'en', each containing LC_MESSAGES, in the current directory. Now we use the msgfmt tool to encode our po files to mo files (binary translation files):

msgfmt --output-file=en/LC_MESSAGES/hello.mo en.po
msgfmt --output-file=de/LC_MESSAGES/hello.mo de.po

Turn on internationalization in the code

Ok, now the preparatory tasks are done. The final step is to modify the code to support the internationalization:

module Main where

import IO 
import Text.I18N.GetText
import System.Locale.SetLocale
import System.IO.Unsafe

__ :: String -> String
__ = unsafePerformIO . getText

main = do
  setLocale LC_ALL (Just "") 
  bindTextDomain "hello" "." 
  textDomain "hello" 

  putStrLn (__ "Please enter your name:")
  name <- getLine
  putStrLn $ (__ "Hello, ") ++ name ++ (__ ", how are you?")

Here we added three initialization strings:

setLocale LC_ALL (Just "")
bindTextDomain "hello" "."
textDomain "hello"

You'll have to download the setlocale package to enable the first function: it sets the current locale to the default value. The next two functions tell gettext to take the "hello.mo" message file from the locale directory (I set it to ".", but in general case, this directory should be passed from the package configuration).

The final step is to define the function '__'. It simply calls getText from the module Text.I18N.GetText. Its type is String -> IO String so I used unsafePerformIO to make it simpler the.

Run and test the program

Now you can build and try the program in different locales:

user> ghc --make Main.hs
[1 of 1] Compiling Main         ( Main.hs, Main.o )
Linking Main ...

user> LOCALE=en_US.UTF-8 ./Main
Please enter your name:
Bond
Hello, Bond, how are you?

user> LOCALE=de_DE.UTF-8 ./Main
Wie heißen Sie?
Bond
Hallo, Bond, wie geht es Ihnen?

user>

Distribute internationalized cabal package

TBD