Sunday, June 14, 2009

Java Internationalization and Localization(I18N, L10N)

Preface

This document summarizes some important Java classes and techniques for I18N development. I have made most examples to show the point directly. Many Java based web frameworks(JSF, STRUTS, GRAILS etc) have made I18N/L10N very simple, but the fundamentals behind the frameworks are still the same. So hope the blog is still relevant.

Many of the samples are adopted from the resources listed in the reference area, especially the book "Java Internationalization" by Andrew Deitsch & David Czarnecki from O’Reilly. This post is taken from my note while reading the I18N book and websites many years ago.

Internationalization Tips

  1. Don't assume all letters of the alphabet fall between A and Z. Use Character.

  2. Don't hardcode strings. Use resource bundle.

  3. Don’t concatenate strings. Use MessageFormat/ChoiceFormat.

  4. Use InputStreamReader/OutputStreamWriter for reading/writing.

  5. Sort using Collator/RuleBasedCollator.

  6. Use ComponentOrientation for GUI layout.

  7. Don't hardcode fonts.

  8. Use NumberFormat/DecimalFormat to format numbers.

  9. Use DateFormat/SimpleDateFormat to display dates.

Culturally Dependent Data

In developing internationalized applications, you need to identify culturally dependent data. The following lists some examples of culturally dependent data:

  • Messages

  • Labels on GUI components

  • Online help

  • Sounds

  • Colors

  • Graphics

  • Icons

  • Dates

  • Times

  • Numbers

  • Currencies

  • Measurements

  • Phone numbers

  • Honorifics and personal titles

  • Postal addresses

  • Page layouts

This document introduces some of the key java classes for developing international applications.

Locale

A Locale object represents a specific geographical, political, or cultural region. An operation that requires a Locale to perform its task is called locale-sensitive and uses the Locale to tailor information for the user.


Here are the Locale constructors:

Locale(String language, String country)

Locale(String language, String country, String variant)


For Example:


Locale ja = new Locale("ja", "JP");

ResourceBundle

ResourceBundle objects contain locale-specific objects. When you need a locale-specific object, you fetch it from a ResourceBundle.


For example, you have the following resource bundle,

Image

Image_ja_JP

Image_fr_FR

Image_zh


Each of these resource bundles is a set of related subclasses that share the same base name Image. To select the appropriate ResourceBundle, invoke the ResourceBundle.getBundle method. For example:

Locale currentLocale = new Locale("fr", "FR");

ResourceBundle myimages =

ResourceBundle.getBundle("Image", currentLocale);


To get specific object in the bundle, use get method of ResourceBundle with the key. I.e,

String menubar = myimages.getString(“MenuBar”);


ResourceBundle may be back by property files or ListResourceBunle.

Property Files

Property file contains properties defined in the format of key=values. You should always create a default properties file. The name of this file begins with the base name of your ResourceBundle and ends with .properties suffix. For example:


Image.properties

Image_ja_JP.properties

Image_fr_FR.properties

Image_zh.properties


This example show how resources bundle is created:


ResourceBundle labels =

ResourceBundle.getBundle("Image", currentLocale);

The getBundle method first looks for a class file that matches the base name and the Locale. If it can't find a class file, it then checks for properties files.

ListResourceBundle

When you have non-string object in resource bundle, you need to use ListResourceBundle. Here is a list resource bundle class example:

public class StatsBundle_ja_JP extends ListResourceBundle {

public Object[][] getContents() {

return contents;

}

private Object[][] contents = {

{ "GDP", new Integer(21300) },

{ "Population", new Integer(125449703) },

{ "Literacy", new Double(0.99) },

};

}

To create the ListResourceBundle, invoke the getBundle method.

NumberFormat

NumberFormat is the abstract base class for all number formats.

Here are some examples:

Format number:

myString = NumberFormat.getNumberInstance(yourlocale).format(myNumber);

Format Currency:

currencyFormatter = NumberFormat.getCurrencyInstance(currentLocale).format(currency);


Format Percentage:

percentFormatter = NumberFormat.getPercentInstance(currentLocale).format(percent);

Examples for NumberFormat:

Locale

Number

Number

Currency

Percentage

fr_FR

123 456

345 987,246

9 876 543,21 F

75%

de_DE

123.456

345.987,246

9.876.543,21 DM

75%

en_US

123,456

345,987.246

$9,876,543.21

75%

ja_JP

123,456

345,987.246

9,876,543

75%

zh_TW

123,456

345,987.246

NT$9,876,543.21

75%


DecimalFormat

You can use the DecimalFormat class to format decimal numbers into locale-specific strings with patterns.

You can design your own format patterns for numbers by following the rules specified by the following BNF diagram:

pattern := subpattern{;subpattern}

subpattern := {prefix}integer{.fraction}{suffix}

prefix := '\\u0000'..'\\uFFFD' - specialCharacters

suffix := '\\u0000'..'\\uFFFD' - specialCharacters

integer := '#'* '0'* '0'

fraction := '0'* '#'*


For example:

String pattern=”###.##”;

DecimalFormat myFormatter = new DecimalFormat(pattern);

String output = myFormatter.format(value);


Locale sensitive format with pattern:

NumberFormat nf = NumberFormat.getNumberInstance(currentlocale);

DecimalFormat df = (DecimalFormat)nf;

df.applyPattern(pattern);

String output = df.format(value);

Examples DecimalFormat:

Pattern

Output of 123456.789

Locale

###,###.###

123,456.789

en_US

###,###.###

123.456,789

de_DE

###,###.###

123 456,789

fr_FR

###,###.###

123,456.789

ja_JP

###,###.###

123,456.789

zh_TW

DecimalFormatSymbols

DecimalFormatSymbols unusualSymbols =

new DecimalFormatSymbols(currentLocale);

unusualSymbols.setDecimalSeparator('|');

unusualSymbols.setGroupingSeparator('^');


String strange = "#,##0.###";

DecimalFormat weirdFormatter = new DecimalFormat(strange, unusualSymbols);

weirdFormatter.setGroupingSize(4);


String bizarre = weirdFormatter.format(12345.678);

System.out.println(bizarre);


output: 1^2345|678

DateFormat

Format date:

Date today = new Date();

String dateOut;

DateFormat dateFormatter;


dateFormatter = DateFormat.getDateInstance(DateFormat.DEFAULT, currentLocale);

dateOut = dateFormatter.format(today);


DateFormat timeFormatter =

Format time: DateFormat.getTimeInstance(DateFormat.DEFAULT,

currentLocale);

Format both date and time:

DateFormat formatter =

DateFormat.getDateTimeInstance(DateFormat.LONG, DateFormat.LONG, currentLocale);

Example: Date styles

Locale

Default

SHORT

MEDIUM

LONG

FULL

En_US

Jan 9, 2003

1/9/03

Jan 9, 2003

January 9, 2003

Thursday, January 9, 2003

fr_FR

9 janv. 03

09/01/03

9 janv. 03

9 janvier 2003

jeudi 9 janvier 2003

de_DE

09.01.2003

09.01.03

09.01.2003

9. Januar 2003

Donnerstag, 9. Januar 2003

ja_JP

2003/01/09

03/01/09

2003/01/09

2003/01/09

200319

zh_TW

2003/1/9

2003/1/9

2003/1/9

200319

200319

Example: Time Style

Locale

Default

SHORT

MEDIUM

LONG

FULL

en_US

3:07:28 PM

3:07 PM

3:07:28 PM

3:07:28 PM GMT-05:00

3:07:28 PM GMT-05:00


fr_FR

15:07:28

15:07

15:07:28

15:07:28 GMT-05:00

15 h 07 GMT-05:00


de_DE

15:07:28

15:07

15:07:28

15:07:28 GMT-05:00

15.07 Uhr GMT-05:00


ja_JP

15:07:28

15:07

15:07:28

15:07:28:GMT-05:00

150728GMT-05:00


zh_TW

下午 03:07:28

下午 3:07

下午 03:07:28

下午030728

下午030728GMT-05:00



Example: Both Date and Time Style

Locale

Default

SHORT

MEDIUM

LONG

FULL

en_US

Dec 12, 2002 3:07:28 PM

12/12/02 3:07 PM

Dec 12, 2002 3:07:28 PM

December 12, 2002 3:07:28 PM GMT-05:00

Thursday, December 12, 2002 3:07:28 PM GMT-05:00

fr_FR

12 déc. 02 15:07:28

12/12/02 15:07

12 déc. 02 15:07:28

12 décembre 2002 15:07:28 GMT-05:00

jeudi 12 décembre 2002 15 h 07 GMT-05:00

de_DE

12.12.2002 15:07:28

12.12.02 15:07

12.12.2002 15:07:28

12. Dezember 2002 15:07:28 GMT-05:00

Donnerstag, 12. Dezember 2002 15.07 Uhr GMT-05:00

ja_JP

2002/12/12 15:07:28

02/12/12 15:07

2002/12/12 15:07:28

2002/12/12 15:07:28:GMT-05:00

20021212150728GMT-05:00

zh_TW

2002/12/12 下午 03:07:28

2002/12/12 下午 3:07

2002/12/12 下午 03:07:28

20021212日 下午030728

20021212日 下午030728GMT-05:00


SimpleDateFormat

If the DateFormat does not satisfy your need, you can define your own pattern to format date use SimpleDateFormat.

Date today= new Date();

String output;

SimpleDateFormat formatter;


formatter = new SimpleDateFormat(pattern, currentLocale);

output = formatter.format(today);

System.out.println(pattern + " " + output);


formatter = new SimpleDateFormat("EEE d MMM yy", currentLocale);

output = formatter.format(today);

Examples SimpleDateFormat:

Locale

Pattern

Output

en_US

yyyy.MM.dd G 'at' hh:mm:ss z

2002.12.12 AD at 03:29:36 GMT-05:00

Fr_FR

yyyy.MM.dd G 'at' hh:mm:ss z

2002.12.12 ap. J.-C. at 03:29:36 GMT-05:00

de_DE

yyyy.MM.dd G 'at' hh:mm:ss z

2002.12.12 n. Chr. at 03:29:36 GMT-05:00

ja_JP

yyyy.MM.dd G 'at' hh:mm:ss z

2002.12.12 西暦 at 03:29:36 GMT-05:00

zh_TW

yyyy.MM.dd G 'at' hh:mm:ss z

2002.12.12 民國 at 03:29:36 GMT-05:00

Zh_CN

yyyy.MM.dd G 'at' hh:mm:ss z

2002.12.12 公元 at 03:40:52 GMT-05:00


MessageFormat

A compound message may contain several kinds of variables: dates, times, strings, numbers, currencies, and percentages. To format a compound message in a locale-independent manner, you construct a pattern that you apply to a MessageFormat object, and store this pattern in a ResourceBundle.

1. Identify the Variables in the Message

Suppose that you want to internationalize the following message:

Notice that we've underlined the variable data and have identified what kind of objects will represent this data.

2. Isolate the Message Pattern in a ResourceBundle

Store the message in a ResourceBundle named MessageBundle, as follows:

ResourceBundle messages =

ResourceBundle.getBundle("MessageBundle", currentLocale);

This ResourceBundle is backed by a properties file for each Locale. The properties file for U.S. English is named MessageBundle_en_US.properties. The contents of this file is as follows:

template = At {2,time,short} on {2,date,long}, we detected \

{1,number,integer} spaceships on the planet {0}.

planet = Mars

The first line of the properties file contains the message pattern. If you compare this pattern with the message text shown in step 1, you'll see that an argument enclosed in braces replaces each variable in the message text. Each argument starts with a digit called the argument number, which matches the index of an element in an Object array that holds the argument values. Note that in the pattern the argument numbers are not in any particular order. You can place the arguments anywhere in the pattern. The only requirement is that the argument number has a matching element in the array of argument values.

The next step discusses the argument value array, but first let's look at each of the arguments in the pattern. The following table provides some details about the arguments:

Arguments for template in MessageBundle_en_US.properties

Argument

Description

{2,time, short}

The time portion of a Date object. The short style specifies the DateFormat.SHORT formatting style.

{2,date, long}

The date portion of a Date object. The same Date object is used for both the date and time variables. In the Object array of arguments the index of the element holding the Date object is 2. (This is described in the next step.)

{1,number, integer}

A Number object, further qualified with the integer number style.

{0}

The String in the ResourceBundle that corresponds to the planet key.



3. Set the Message Arguments

The following lines of code assign values to each argument in the pattern. The indexes of the elements in the messageArguments array match the argument numbers in the pattern. For example, the Integer element at index 1 corresponds to the {1,number,integer} argument in the pattern. Because it must be translated, the String object at element 0 will be fetched from the ResourceBundle with the getString method. Here is the code that defines the array of message arguments:

Object[] messageArguments = {

messages.getString("planet"),

new Integer(7),

new Date()

};

4. Create the Formatter

Next, create a MessageFormat object. You set the Locale because the message contains Date and Number objects, which should be formatted in a locale-sensitive manner.

MessageFormat formatter = new MessageFormat("");

formatter.setLocale(currentLocale);

5. Format the Message Using the Pattern and the Arguments

This step shows how the pattern, message arguments, and formatter all work together. First, fetch the pattern String from the ResourceBundle with the getString method. The key to the pattern is template. Pass the pattern String to the formatter with the applyPattern method. Then format the message using the array of message arguments, by invoking the format method. The String returned by the format method is ready to be displayed. All of this is accomplished with just two lines of code:

formatter.applyPattern(messages.getString("template"));

String output = formatter.format(messageArguments);

Example MessageFormat:

Locale

Pattern

Output

en_US

template = At {2,time,short} on {2,date,long}, we detected \

{1,number,integer} spaceships on the planet {0}.

planet = Mars

At 4:16 PM on December 12, 2002, we detected 7 spaceships on the planet Mars.

de_DE

emplate = Um {2,time,short} Uhr am {2,date,long}, haben wir \

{1,number,integer} Raumschiffe auf dem planeten {0} entdeckt.

planet = Mars

Um 16:16 Uhr am 12. Dezember 2002, haben wir 7 Raumschiffe auf dem planeten Mars entdeckt.

zh

template =我们在{2,date,long}{2,time,short} {0}上探测到{1,number,integer}个太空船。

planet = 火星

我们在20021212日下午4:16 在 火星上探测到7个太空船。


ChoiceFormat

The words in a message may vary if both plural and singular word forms are possible. With the ChoiceFormat class, you can map a number to a word or a phrase, allowing you to construct grammatically correct messages.

1. Define the Message Pattern

First, identify the variables in the message:

Next, replace the variables in the message with arguments, creating a pattern that can be applied to a MessageFormat object:

There {0} on {1}.

The argument for the disk name, which is represented by{1}, is easy enough to deal with. You just treat it like any other String variable in a MessageFormat pattern. This argument matches the element at index 1 in the array of argument values.

Dealing with argument{0} is more complex, for a couple of reasons:

  • The phrase that this argument replaces varies with the number of files. To construct this phrase at run time, you need to map the number of files to a particular String. For example, the number 1 will map to the String containing the phrase is one file. The ChoiceFormat class allows you to perform the necessary mapping.

  • If the disk contains multiple files, the phrase includes an integer. The MessageFormat class lets you insert a number into a phrase.

2. Create a ResourceBundle

Because the message text must be translated, isolate it in a ResourceBundle:

ResourceBundle bundle =

ResourceBundle.getBundle("ChoiceBundle", currentLocale);

The sample program backs the ResourceBundle with properties files. The ChoiceBundle_en_US.properties file contains the following lines:

pattern = There {0} on {1}.

noFiles = are no files

oneFile = is one file

multipleFiles = are {2} files

The contents of this properties file show how the message will be constructed and formatted. The first line contains the pattern for MessageFormat. The other lines contain phrases that will replace argument {0} in the pattern. The phrase for the multipleFiles key contains the argument {2}, which will be replaced by a number.

Check out the following table for more examples.

3. Create a Message Formatter

In this step you instantiate MessageFormat and set its Locale:

MessageFormat messageForm = new MessageFormat("");

messageForm.setLocale(currentLocale);

4. Create a Choice Formatter

The ChoiceFormat object allows you to choose, based on a double number, a particular String. The range of double numbers, and the String objects to which they map, are specified in arrays:

double[] fileLimits = {0,1,2};

String [] fileStrings = {

bundle.getString("noFiles"),

bundle.getString("oneFile"),

bundle.getString("multipleFiles")

};

ChoiceFormat maps each element in the double array to the element in the String array that has the same index. In the sample code the 0 maps to the String returned by calling bundle.getString("noFiles"). By coincidence the index is the same as the value in the fileLimits array. If the code had set fileLimits[0] to seven, ChoiceFormat would map the number 7 to fileStrings[0].

You specify the double and String arrays when instantiating ChoiceFormat:

ChoiceFormat choiceForm = new ChoiceFormat(fileLimits, fileStrings);

5. Apply the Pattern

Remember the pattern you constructed in step 1? It's time to retrieve the pattern from the ResourceBundle and apply it to the MessageFormat object:

String pattern = bundle.getString("pattern");

messageForm.applyPattern(pattern);

6. Assign the Formats

In this step you assign to the MessageFormat object the ChoiceFormat object created in step 4:

Format[] formats = {choiceForm, null, NumberFormat.getInstance()};

messageForm.setFormatsByArgumentIndex(formats); //(After Java1.4)

or

messageForm.setFormats(formats);//This method has trouble dealing with complicated grammar.

The setFormatsByArgumentIndex (formats); method assigns Format objects to the arguments in the message pattern. You must invoke the applyPattern method before you call the setFormats method. The following table shows how the elements of the Format array correspond to the arguments in the message pattern:

The Format Array of the ChoiceFormatDemo Program

Array Element

Pattern Argument

choiceForm

{0}

null

{1}

NumberFormat.getInstance()

{2}

7. Set the Arguments and Format the Message

At run time the program assigns the variables to the array of arguments it passes to the MessageFormat object. The elements in the array correspond to the arguments in the pattern. For example, messageArgument[1] maps to pattern argument {1}, which is a String containing the name of the disk. In the previous step the program assigned a ChoiceFormat object to argument {0} of the pattern. Therefore the number assigned to messageArgument[0] determines which String the ChoiceFormat object selects. If messageArgument[0] is greater than or equal to 2, the String containing the phrase are {2} files replaces argument {0} in the pattern. The number assigned to messageArgument[2] will be substituted in place of pattern argument {2}. Here's the code that tries this out:

Object[] messageArguments = {null, "XDisk", null};

for (int numFiles = 0; numFiles <>

messageArguments[0] = new Integer(numFiles);

messageArguments[2] = new Integer(numFiles);

String result = messageForm.format(messageArguments);

System.out.println(result);

}

Examples ChoiceFormat:

Locale

Pattern

output

en_US

noFiles = are no files

oneFile = is one file

multipleFiles = are {2} files

pattern = There {0} on {1}.

disk=XDISK

There are no files on XDISK.

There is one file on XDISK.

There are 2 files on XDISK.

There are 3 files on XDISK.

fr_FR

noFiles = n' y a pas des fichiers

oneFile = y a un fichier

multipleFiles = y a {2} fichiers

pattern = Il {0} sur {1}.

disk=frdisk

Il n' y a pas des fichiers sur frdisk.

Il y a un fichier sur frdisk.

Il y a 2 fichiers sur frdisk.

Il y a 3 fichiers sur frdisk.

zh

noFiles =没有文件

oneFile = 有一个文件

multipleFiles = {2} 个文件

pattern = {0}{1}.

disk=某磁碟上

没有文件在某磁碟上.

有一个文件在某磁碟上.

2 个文件在某磁碟上.

3 个文件在某磁碟上.



Character

To check characters, use Character object method, for example:

  • isDigit

  • isLetter

  • isLetterOrDigit

  • isLowerCase

  • isUpperCase

  • isSpaceChar

  • isDefined

BreakIterator

The BreakIterator class implements methods for finding the location of boundaries in text. Instances of BreakIterator maintain a current position and scan over text returning the index of characters where boundaries occur.

You can analyze four kinds of boundaries with the BreakIterator class: character, word, sentence, and potential line break. When instantiating a BreakIterator, you invoke the appropriate factory method:

  • getCharacterInstance

  • getWordInstance

  • getSentenceInstance

  • getLineInstance

Each instance of BreakIterator can detect just one type of boundary.

Locale currentLocale = new Locale ("en","US");


BreakIterator wordIterator =

BreakIterator.getWordInstance(currentLocale);


String someText = "She stopped. " +

"She said, \"Hello there,\" and then went on.";


markBoundaries(someText, wordIterator);

static void markBoundaries(String target, BreakIterator iterator) {


StringBuffer markers = new StringBuffer();

markers.setLength(target.length() + 1);

for (int k = 0; k <>

markers.setCharAt(k,' ');

}


iterator.setText(target);

int boundary = iterator.first();


while (boundary != BreakIterator.DONE) {

markers.setCharAt(boundary,'^');

boundary = iterator.next();

}


System.out.println(target);

System.out.println(markers);

}

String

When you convert between String and byte array, be reminded that you are dealing with locales and provide proper encoding information for conversion.


String original = new String("A" + "\u00ea" + "\u00f1" + "\u00fc" + "C");

byte[] utf8Bytes = original.getBytes("UTF8");

String roundTrip = new String(utf8Bytes, "UTF8");

Collator

The Collator class performs locale-sensitive String comparison. You use this class to build searching and sorting routines for natural language text.

Collator fr_FRCollator = Collator.getInstance(new Locale("fr","FR"));


public static void sortStrings(Collator collator,

String[] words) {

String tmp;

for (int i = 0; i <>

for (int j = i + 1; j <>

if (collator.compare(words[i], words[j]) > 0) {

tmp = words[i];

words[i] = words[j];

words[j] = tmp;

}

}

}

}

Locale

Original

Sorted

fr_FR

Péché pêche peach sin

Peach pêche péché sin

en_US

Peach pêche péché sin

Peach péché pêche sin

zh

胡益光你好

光好胡你益


RuleBasedCollator

String smallnTilde = new String("\u00F1"); // ñ

String capitalNTilde = new String("\u00D1"); // Ñ


String traditionalSpanishRules =

("<>

"<>

"<>

"<>

"<>

"<>

"< " + smallnTilde + "," + capitalNTilde + " " +

"<>

"<>

"<>

RuleBasedCollator spCollator =

new RuleBasedCollator(traditionalSpanishRules);

sortStrings(spCollator, words);

public static void sortStrings(Collator collator, String[] words) {

String tmp;

for (int i = 0; i <>

for (int j = i + 1; j <>

if (collator.compare(words[i], words[j]) > 0) {

tmp = words[i];

words[i] = words[j];

words[j] = tmp;

}

}

}

}

InputStreamReader and OutputStreamWriter

InputStreamReader and OutputStreamWriter bridges byte streams and character streams. The InputStreamReader allows you to read different bytes from byte stream and convert the bytes into particular encoding character. OutputStreamWriter takes character stream data and output it to a byte stream. The characters written are converted to proper bytes according to the encoding. Failing to use proper encoding may corrupt your data.


Non-unicode byte-> InputStreamReader->Unicode character

Unicode character-> OutputStreamWriter->non-unicode byte


static String readInput() {


StringBuffer buffer = new StringBuffer();

try {

FileInputStream fis = new FileInputStream("test.txt");

InputStreamReader isr = new InputStreamReader(fis,

"UTF8");

Reader in = new BufferedReader(isr);

int ch;

while ((ch = in.read()) > -1) {

buffer.append((char)ch);

}

in.close();

return buffer.toString();

} catch (IOException e) {

e.printStackTrace();

return null;

}

}

static void writeOutput(String str) {


try {

FileOutputStream fos = new FileOutputStream("test.txt");

Writer out = new OutputStreamWriter(fos, "UTF8");

out.write(str);

out.close();

} catch (IOException e) {

e.printStackTrace();

}

}


Socket communication

To pass non-UNICODE data correctly through socket, you need to wrap up the stream of socket with InputStreamReader and OutputStreamWriter and write and read data the same way as above.

BufferedWriter out = new BufferedWriter( new OutputStreamWriter(
clientSocket.getOutputStream(), "UTF8"));
BufferedReader in = new BufferedReader(new InputStreamReader(
clientSocket.getInputStream(),"UTF8"));


ComponentOrientation

The ComponentOrientation class encapsulates the language-sensitive orientation that is to be used to order the elements of a component or of text.

For different locale, you need to setup ComponentOrientation and apply it to each component such as menu, items etc.


ComponentOrientation Co;


If (Chinese) {

Locale.setDefault(Locale.CHINA);

Co=ComponentOrientation.LEFT_TO_RIGHT;

}

If (Hebrew){

Locale.setDefault(new Locale(“iw”,”IL”);

Co=ComponentOrientation.RIGHT_TO_LEFT;

}

applyComponentOrientation(Frame frm, Co);


applyComponentOrientation(Component c, ComponentOrientation co){

c.setComponentOrientation(co);

applyComponentOrientation(EachComponent of c,co);

}

Internationalized web applications

For a internationalized web application, the same server need to response to request from different locales properly. This section describes how the client and server are communicating the locale information.


1. The client sends the preferred locale information in the Accept-Language header.


2. ServletRequest provide two methods to parse the Accept-Language header: getLocale() and getLocales().


3. ServletResponse has two methods, getLocale() and setLocale(Locale loc) to get and set the locale of the response to client.


4. There are two ways to set the character set in servlet response. Browser uses the charset information to render the page with proper font.


4.1 One way is to call response.setContentType(“text/html”, charset=ISO-8859-1”);


4.2 The second way is to use response.setLocale(Locale loc).


The second method takes a Locale object and set the appropriate headers in the response to the client. The second method is preferred since you may not know the exact charset name of a given locale. Even if you know the exact charset name, you would still need to maintain that list and manage the mapping between locale and charset name if you use the first method.


The character set must be set using either setContentType or setLocale before constructing PrintWriter object by calling ServletResponse.getWriter(); the call to getWriter() uses the charset tag to construct PrintWriter object.

JSP Custom Tags

JSP custom tags are a very powerful mechanism for performing complex processing that removes java code from the JSP pages while encapsulating nontrivial programming logic in a reusable component, which can be packaged in tag library. There are many open source tag libraries on inter-net including an I18N tag library that we can take advantage. To use tag lib, you need to configure the tag lib and then refer to it in JSP pages similar to normal html tags. You can also develop your own tags and package them into tag lib.

FAQ

  1. Why I saw square blocks on my screen where I supposed to see MBCS characters?

    1. This may be because you are using the wrong font. You need to install the necessary font on your client machine. If you see square blocks on applet, you need to use the right font.properties file for the java VM.

  2. Why I saw ‘?’ marks on my screen or my output file where I supposed to see MBCS characters?

    1. When you convert characters from one encoding EC1 to another EC2 while the EC2 may not have a character or has more than one character that maps to the character in EC1, the result is unknown character-the ‘?’ mark. The following is a small list of possible scenario:

      1. When you read the file, you didn’t use InputStreamReader with proper encoding.

      2. When you write the file, you didn’t use OutputStreamWriter with proper encoding.

      3. When you convert a String to byte array, you didn’t use the proper encoding. Or you may not specify encoding at all and the default encoding doesn’t match your String.

      4. When you convert byte array to String, you didn’t use the proper encoding. Or you may not specify encoding at all and the default encoding doesn’t match your original String encoding.

      5. When you convert characters from one encoding EC1 to another EC2 while the EC2 may have more than one character that maps to the character in EC1. For example, one simplified Chinese character may map to more than one Traditional Chinese Character.


Reference

  1. Java Internationalization by Andrew Deitsch & David Czarnecki from O’Reilly

  2. http://java.oreilly.com/news/javainternational_0301.html

  3. http://java.oreilly.com/news/javaintl_0701.html

  4. http://java.sun.com/docs/books/tutorial/i18n/TOC.html

  5. http://www.apache.org