Re: National Language - question on alphabet/sort order



On our engineering drawings there are often several sectional views and
these are labelled A-A, B-B etc.
So in our code which automatically generates these sectional views there is
a starting dialog which has a box for the section label ("A" in this case)
and the code increments the next character until "Z" and wraps round to "A"
again. (Highly unlikely that there will be more than 26 sections on one
drawing but the start letter _could_ be near the end of the alphabet and a
wrap is needed)
and Yes this particular increment function assumes the English ASCII code
sequence! :-(

Have you guessed my question yet?

Is there a method of incrementing and wrapping non-English alphabets?
Or : of identifying first and last characters in a set? (Sort sequence?)
European and Cyrillic alphabets might be straight forward, but obviously
Asian sets are a different kettle of fish ("whole new ballgame" in modern
vernacular)!


Well, there are several problems here.

The first one is that in many places the labelling rules might be different.
So if A-A, B-B is the normal thing in the US, in other places it might be
A1, B2, or 1-1, 2-2, etc.
This has little to do with the alphabet, but with local engineering/design
rules.

So I would start by doing a bit of research to see how things are done
in the potential markets that you care about.

Then, moving to languages, each might have it's own challanges.
I doubt that for Japanese they use radical-stroke counts for what you need,
most likey they use Gojuon (http://en.wikipedia.org/wiki/Goj%C5%ABon) or
Iroha (http://en.wikipedia.org/wiki/Iroha), but they can also use numbers
(http://en.wikipedia.org/wiki/Japanese_numerals), or something else.

You might have to use locale-speciffic digits.

Some conventions are also used for numbering lists
(http://www.w3.org/TR/CSS21/generate.html#lists)
that might also affect this.

I also doubt that accented characters are used for this even when the
language uses them.
But in some laguages you might skip non-accented characters that are not
really used in the language (except for borrowed words).
For instance in Romanian you will probably not use Q, W, Y (but check this!)


Anyway, long story short, you need somehting very flexible.
An idea is this: use two different strings, both of them easy
to change from UI or config file. Then iterate thru each character.
I would also have a patern that allows switching parameters (like
FormatMessage)

CString label;
WCHAR *str1 = ???; // load from config/ui/resources
WCHAR *str2 = ???; // load from config/ui/resources
WCHAR *pattern = ???; // load from config/ui/resources
for( int i = 0; str1[i]; ++i )
for( int j = 0; str1[j]; ++j )
label.FormatMessage( pattern, str1[i], str2[j] );

Examples:
str1 = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
str2 = "0123456789";
pattern = "%1!s!-%2!s!";
will give you "A-0" "A-1" "A-2" ... "B-1" ...

str1 = "0123456789";
str2 = "0123456789";
pattern = "[%1!s! - %2!s!]";
will give you "[1 - 1]" "[1 - 2]" ... "[2 - 1]"
and so on.

The initial market study might show you than you need more flexibility,
so you might end up with more than one character as primary element.
In this case a string will not be enough, you might need an array of strings.

TCHAR str1[] = {
"AA",
"AB",
"AB",
...
"ZY",
"ZZ"
};
TCHAR str2[] = {
"00",
"01",
...
"98",
"99"
};
So the labels might look like this: "AA-00" "AA-01" ... "BA-00" ... "ZZ-99"

And in fact I would also store this in resource, but in a config file.
The convention use should not be tied to the language of the UI
(a French application should not prevent from using Letter paper :-)
And you might even allow the user to change it thru the UI
(same as Word allows you to print to custom paper).

Now, this is not a 100% safe solution, but is quite flexible.
A quick market research should tell you if it is flexible enough or not.


--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email
.



Relevant Pages

  • Re: Fast UTF-8 strlen function
    ... >> syllables using Latin characters. ... >> their native language using the Latin-characters. ... was a "standard" way to write Vietnamese stuff using ASCII ... And when it comes to vowels, this is NOT easy at all...English, in fact, is ...
    (alt.lang.asm)
  • Re: Paul Grahams Arc is released today... what is the long term impact?
    ... It's not a matter of characters it is a ... What makes you think that language is not intimately related to history? ... programming in machine code? ... allows for treating a sequence of words as a single unit and yet somehow ...
    (comp.lang.lisp)
  • RE: VBA question: How to extract cell values in different language
    ... language is entered, but it seems like all that data is lost when the VBA ... about having binary data and not unicode data confirms my suspicions. ... You are have 256 binary characters. ... First column has the string IDs ...
    (microsoft.public.excel.programming)
  • Re: Attention: European C/C++/C#/Java Programmers-Call for Input
    ... For any language using a Latin ... Look at existing tools and source code that supports UTF-8, and see how it can make your work easier and give a result that users might actually be able to *use*. ... But you'll find something that does a reasonable job and *will* work perfectly for most programmers who stick to ASCII identifiers. ... A related problem is if you are making identifiers case-insensitive - it's hard to figure out cases for non-ASCII characters. ...
    (comp.arch.embedded)
  • Re: Whats happening in SCS?
    ... mention the British Nazi Party's leading intellectuals. ... There is actually no such language as Chinese, ... This is not necesssarily a barrier, as Chinese characters are mutually ...
    (soc.culture.scottish)