RTF Associated Character Properties



Hello there, I spent nearly 2 days trying to make sense of the
"Associated Character Properties" as documented in "Rich Text Format
(RTF) Specification Version 1.9.1".

The documentation being too vague, I'm trying to understand it by
altering test .rtf file and opening them in Word 2007 and then looking
at the font properties.

I beleive I understand that "associated character properties" is a
mean to set a "composite font" to a selection of text so that, for
example, arabic text within that selection would use a certain font
and latin text another preventing you from having to change the font
all the time as you type multilingual text.

If I understand correctly, it affect new text as you type it and
existing text within a text run.

But I can't quite make sense of my observations. I'm trying to
determine how the \rtlch, \rtlch, \fcs, \loch, \hich, and dbch relate
to the font dialog's complex font, latin font and asian font
properties.

In the rtf spec 1.9, you can find that the syntax for the associated
properties is :
<atext> <ltrrun> | <rtlrun> | <sarun> | <nonsarun> | <saltrrun> |
<nonsaltrrun> | <nonsartlrun> | <losbrun> | <hisbrun> | <dbrun>

<ltrrun> \rtlch \afN & <aprops>* \ltrch <ptext>
<rtlrun> \ltrch \afN & <aprops>* \rtlch <ptext>
<sarun> \fcs0 \afN & <aprops>* \fcs1 <ptext>
<nonsarun> \fcs1 \afN & <aprops>* \fcs0 <ptext>
<saltrrun> \rtlch \fcs0 \af & <aprops>* \ltrch \fcs1 <ptext>
<nonsaltrrun> \rtlch \fcs1 \af & <aprops>* \ltrch \fcs0 <ptext>
<nonsartlrun> \ltrch \fcs1 \af & <aprops>* \rtlch \fcs0 <ptext>
<losbrun> \hich \afN & <aprops> \dbch \afN & <aprops> \loch <ptext>
<hisbrun> \loch \afN & <aprops> \dbch \afN & <aprops> \hich <ptext>
<dbrun> \loch \afN & <aprops> \hich \afN & <aprops> \dbch <ptext>

From my observation, I see the following results :

ltrrun : rtlch \a* defines the "complex script" properties and what's
after \ltrch controls the "latin" properties. Therefore RTL characters
in ptext will use the \a* properties and the non fe (far east)
character will use the "latin" properties. That make sense.

rtlrun : ltrch \a* defines the "latin" properties and what's after
\rtlch controls the "complex script" properties. Therefore RTL
characters in ptext will use the "complex script" properties and the
non fe (far east) character will use the "latin" properties which were
set by the \a*. That make sense.

sarun : (south asian complex script ex. hindi). In this case, fcs0 \a*
appears to define both the latin and the normal asian properties while
fcs1 properties affect the "complex script" properties too. So it
seems like rtlch and fcs1 control the same set of properties. In
ptext, complex script characters would use the "complex script
properties" which the other would use "latin" or "asian" depending of
the character's value. make sense

nonsarun : the opposite of sarun. Still make sense.

saltrrun : Now its getting complicated. It seems like the couple
"\rtlch \fcs0" \a* is used to specify the latin-asian properties. One
might think that fcsN vs rtlch/ltrch controls the same "font
properties destination". "\ltrch \fcs1" controls the "complex script"
properties. So far the idea holds up.

nonsaltrrun : The opposite of saltrrun. It works as I expected.
"\rtlch \fcs1" \a* specify the "complex script" props and "ltrch
\fcs0" properties specify latin-asian.

nonsartlrun : This is where it makes no sense. "\ltrch \fcs1" \a*
should specify the "complex script" props since we have \fcs1. "\rtlch
\fcs0" should modify latin-asian. They dont. "\ltrch \fcs1" now
control the "latin-asian" and "\rtlch \fcs0" goes in "complex script".
This means that these tags are not synonyms.

losbrun : As if it was not complicated enough, this is thrown in the
picture. \hich seems to control the latin properties but just for the
hi-characters. \loch does it for the lo-characters and \dbch for the
double byte. These 3 when taken alone together make sense. loch and
hich are both represented by the "latin" font in the ui which is why
it is blank when they differ in the rft file. dbch is represented by
the "asian" font in the ui. These three alone can't specify any
"complex script" properties.

I was trying to determine which font properties are affected by which
tag thinking of them as "font properties buckets" :
\rtlch -> complex font
\ltrch -> latin font
\fcs0 -> latin and asian font
\fcs1 -> complex font
\loch -> latin font (for lo-character)
\hich -> latin font (for hi-character)
\dbch -> asian font

this idea is wrong according to my findings above.

Can someone explain how all this works together because I am really
confused as you can tell?
Also, how is the <ptext> processed to determine which characters in it
uses which font?
Do you have to seek ahead as you parse your rtf file when reading the
associated properties to determine what "font properties bucket" they
controls?

Thanks!
.



Relevant Pages

  • Re: Fonts and character encodings
    ... If I were in your shoes, I would start with the .NET side, writing a ..NET program to do what I'd like it to do, using the font support within ..NET, and see how that works. ... example code point 100 represents character A) and map it to font F ... When using the unmanaged API, where you can in fact pass text of various code pages, yes…the OS API will do the necessary translation to map to whatever the font requires, _when you are drawing text using fonts_. ... If we take your definitions of character encoding and coded character set, then that implies we could have a coded character set with some characters that are not represented by a specific character encoding. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: English versus German
    ... assigning keyboard shortcuts, or any of that other fancy stuff. ... get an acute accent over "a" in Word, and I've modified my keyboard to ... I don't know of a simple way to get both accents on the same character ... Then you need to update your font. ...
    (sci.lang)
  • Angband with an accent: displaying extended characters
    ... For almost as long as Angband has been in existance, people have wanted to display more than the basic ASCII set of 128 characters. ... The remaining character positions are left blank for additional customization. ... You take an existing font, edit it, add it to the font folder, and use it by ... In this case, we have a forward single quote, which means forward slant accent. ...
    (rec.games.roguelike.angband)
  • Re: English versus German
    ... TeX was able to handle this easily before Unicode. ... I don't know of a simple way to get both accents on the same character ... I just put an acute accent over an "m" in Word, ... font from Cambria to MS Reference Sans Serif, and now, everything I ...
    (sci.lang)
  • Re: Fonts and character encodings
    ... Do Fonts know anything about coded character sets (Unicode, ... does Font file specify which coded character sets ... would consider a coded character set essentially equivalent to a character encoding or code page. ... you can only draw Unicode strings anyway. ...
    (microsoft.public.dotnet.languages.csharp)