Re: GNU gettext



On Thu, 04 Dec 2008 19:42:42 -0800
"Mihai N." <nmihai_year_2000@xxxxxxxxx> wrote:

It's very easy to use a non-printing Unicode
character to "tag" such differences, and then mark in the comments
which instance is which.

Really?
Can you point me to something in the gettext documentation that
explains this technique?

The gettext documentation explains how the keys work. Furthermore, the
translation files give you a pointer to the origin of the message so
that you can see where it's used to gain a sense of context within the
program.

Given how the keys work, and given that the utilities that work with
the files are easily able to handle Unicode (obviously) and that modern
user interfaces are able to handle Unicode, it follows that you can use
some technique such as this. It wouldn't mess with screen readers since
they don't care about non-breaking whitespace, and non-breaking
whitespace doesn't get displayed unless you ask a tool to show it to
you. Now, that means that you might need to take a closer look at the
translation files, or that you'd need to write a small utility to help
you to manage the files, but it's certainly doable. There are plenty
of other ways to mark things like that, too. Heck, you could use a
padding of NUL bytes between the first " and the beginning of the
string or the end of the string and the last ", assuming that the
program that reads them doesn't assume that there are no NUL bytes. I
don't know how you'd do this on Windows applications, but Emacs (as an
example) very easily lets you input arbitrary Unicode characters that
are not on your keyboard. Most software on Linux supports doing that
easily, too; I'd imagine that there is some way of doing this on
Windows, as well.

One could also, if they _really_ wanted, actually modify gettext
slightly to use a surrogate key in the source code. However, that'd
really put a damper on its usability, due to the way strings are
extracted from source code. But that's doable, too. (As an aside, I
didn't need the documentation to come up with that idea.)

This is what you do in your code, for every localizable string?
Remember: gettext was desined for plain C, the keys are C strings
(char*), so they are not Unicode. Same as the typical C source.

No, but that's the great thing about sensible modern systems: They use
UTF-8. Lucky for us, one can use UTF-8 in C source code without an
issue, since C treats a char * as a sequence of bytes. The user
interface will put the bytes together and see them for what they are.
If you need to do things like read multiple transformations of Unicode,
then just use a C library that can work and convert between them.
Furthermore, since UTF-8 is immensely more efficient than any other
transformation of Unicode for _most_ use cases where storage needs are
mostly ASCII characters, it's very natural to use.

Now, if the system were written in Python, I'd be a bit dubious about
using any of those tricks---Python does things really weirdly with
strings and interferes with you all over the place. Also, though, FTR,
I while I have worked with projects that actively use gettext, there is
little point to me using it: I work with a very niche audience, and
don't have a need for i18n other than ensuring that it'd be easy to do
if the client changes its mind later. What _I_ do with my software is
I make the strings absolutely clear. I don't transliterate words and
break their meaning by mutilating them, I don't use action verbs
outside of menus, and I try to make sure that the UI is not written in
what you'd see in everyday colloquial English. I am only a fluent
speaker in English, but I've learned enough of two other languages to
understand the sorts of situations that are awkward to translate, and
so I avoid them.

In fact, that's probably the _best_ way to use gettext. Use formal,
well-written, unambiguous English. Then, when adopting it in a
previously non-internationalized source base, spend the time that one
might spend creating surrogate keys instead cleaning up English
language strings.

Now, if that doesn't suit you, that's fine; freedom is for everybody,
and freedom includes choice. Obviously, because proprietary software
thrives yet still today. But when it comes to programming, there are a
few things that are important in today's strongly heterogeneous
computing world: (1) simple is better until it's not, and (2) portable
knowledge and portable software systems are among the best tools one
has in today's world, especially when it is open and maintained by
masses of people so that the longevity of the software is ensured as
long as there is someone interested in it.

And can you give me a list of such Unicode characters?

Take a look at the Unicode standard. It's freely available on the
Internet.

How many of tham can you give me?
Are they enough to eliminate duplicated keys in a big software?
(let's say 400.000 words)

I'm not sure why you'd count in words, when strings are the important
thing. There are 6,992 strings in GCC (1 string in every approx. 840
lines of code, roughly).

There are other ways you can accomplish the
task, as well---it just takes a little bit of imagination.

This is a kludge around gettest bad design.
Exactly my point.


The primary goal of the design was to make it easy to adopt and use in
existing software, since that was (and largely still is) the primary
use case scenario. Most software isn't hooked up with i18n from the
beginning, and some never get hooked up with it at all. The barrier to
entry is pretty light if you have only the need for a translator to run
a program, extract the strings, and make a few modifications to the
source code to gain the use of the translation catalogs.

The *other* solution would be to use more free-standing text in
things like window titles and menu actions. IMHO, using "Print"
for a dialog title and a menu item is rather silly.

And if you don't like either, then you can use "Print…" as the key
for the menu item (since it ought to have an ellipsis anyway) and
"Print" for the dialog title.

Thing is, as a developer you have no clue what every language
requires. Titles/buttons was just an example of what can go wrong.

Scan (Scan disk vs Scan paper) is usually translated differently,
because they have different meanings, not because of the context.

Yes; and again, this depends on the language. "Search" is a more
proper verb to use when looking for something on a disk, be that bad
blocks or the file that you think you might've deleted last week by
accident. Scan is appropriate for use with a scanning device, be that
a bar code reader or an optical document scanner. This is a prime
example, really, because that's one thing that many people who speak
and write English on a regular basis fail to contemplate: word choice.

English:
New
Spanish:
Nuevo (masculine singular)
Nuevos (masculine plural)
Nueva (feminine singular)
Nuevas (feminine plural)

Every language has it's own characteristics.

Indeed it does. I also think that if you're going to have a new
something, you should know what that new something is; most
applications have "New" as an option in the "File" menu, but "new file"
doesn't make any sense from a usability standpoint---not when you're an
end user and you're using an office suite and what you really want is a
new file containing a new text document or new spread*** document.
Menu items in most computer software aren't as clear as they ought to
be for native English speakers, let alone translators. You seem to
want to pinpoint that as a gettext problem: no, it's a developer
problem. Developers have, for years, completely overestimated the
ability of a regular end-user to grasp a user interface. I to this
_day_ get calls from people that are confused about their
software---and there is no language barrier. Every developer should be
well-practiced in applying language elegantly, not just putting it
there. A flaw of developers, and unless those developers are somehow
prompted to think about it and trained to deal with that issue, it'll
never go away. Sounds like something someone ought to do, if you ask
me.

If you translate your application in 30 languages, you don't want to
"fix" you English keys every time you receive a bug report.
Just tink about it: translate into 30 languages, you get a but report
for language 31, you change a key and update all 30 language catalogs.

Which is very easy to do on any reasonably equipped operating
system---it's a very simple search/replace operation. Surely Windows
has something like sed or awk, doesn't it? Updating even 100 language
catalogs can be done in well under a minute using them unless your
catalogs are on a very slow network drive.

There are three quick solutions, and I am sure that there are many,
many more to choose from given a little bit more thought on the
issue.

If all are as bad as the 3 ones, don't bother.

Ok, I will stop here.

I work in localization and internationalization for more than 11
years, and I have seen hundresd of projects from tens of companies.
Stuff translated in tens of languages, on a lot of the platforms
out there (from Win and Mac to Palm OS), using every standard
solution, and quite a few non standard "quick solutions".

There are lots of g11n, i18n, and l12n experts out there. Many of them
have broad experience in a good number of systems, and most of them
that I have met are native speakers of more than one language (a
variant of English and another language is the most common I've run
into). You're the first that I've seen actively complain about gettext,
to be honest. That's fine---everyone has their own opinions about
things.

Personally, I'll take a single, capable, and portable system and use
that over any single-environment system, unless I have extremely strong
reason to do the opposite. In more than 20 years, I've only once had a
really good reason to pick a single-environment system, and even then,
eventually portability was required and that choice came back to take a
chunk out of my ass. As the saying goes, if one hasn't the time to do
it right, they'd best have the time to do it over---and I'll spend a
bit of extra time up front automating things in a portable fashion to
save the great expense later of having to suddenly become portable.
I've never had a problem doing it that way.

You listen, fine, you don't fine again.
I have nothing to loose.

Agreed; it's time to end the thread. You seem to be frustrated.
Please accept my apologies if I've somehow caused that.

--- Mike

--
My sigfile ran away and is on hiatus.
http://www.trausch.us/

.


Loading