RTF Associated Character Properties



Hello there, I spent nearly 2 days trying to make sense of the
"Associated Character Properties" as documented in "Rich Text Format
(RTF) Specification Version 1.9.1".

The documentation being too vague, I'm trying to understand it by
altering test .rtf file and opening them in Word 2007 and then looking
at the font properties.

I beleive I understand that "associated character properties" is a
mean to set a "composite font" to a selection of text so that, for
example, arabic text within that selection would use a certain font
and latin text another preventing you from having to change the font
all the time as you type multilingual text.

If I understand correctly, it affect new text as you type it and
existing text within a text run.

But I can't quite make sense of my observations. I'm trying to
determine how the \rtlch, \rtlch, \fcs, \loch, \hich, and dbch relate
to the font dialog's complex font, latin font and asian font
properties.

In the rtf spec 1.9, you can find that the syntax for the associated
properties is :
<atext> <ltrrun> | <rtlrun> | <sarun> | <nonsarun> | <saltrrun> |
<nonsaltrrun> | <nonsartlrun> | <losbrun> | <hisbrun> | <dbrun>

<ltrrun> \rtlch \afN & <aprops>* \ltrch <ptext>
<rtlrun> \ltrch \afN & <aprops>* \rtlch <ptext>
<sarun> \fcs0 \afN & <aprops>* \fcs1 <ptext>
<nonsarun> \fcs1 \afN & <aprops>* \fcs0 <ptext>
<saltrrun> \rtlch \fcs0 \af & <aprops>* \ltrch \fcs1 <ptext>
<nonsaltrrun> \rtlch \fcs1 \af & <aprops>* \ltrch \fcs0 <ptext>
<nonsartlrun> \ltrch \fcs1 \af & <aprops>* \rtlch \fcs0 <ptext>
<losbrun> \hich \afN & <aprops> \dbch \afN & <aprops> \loch <ptext>
<hisbrun> \loch \afN & <aprops> \dbch \afN & <aprops> \hich <ptext>
<dbrun> \loch \afN & <aprops> \hich \afN & <aprops> \dbch <ptext>

From my observation, I see the following results :

ltrrun : rtlch \a* defines the "complex script" properties and what's
after \ltrch controls the "latin" properties. Therefore RTL characters
in ptext will use the \a* properties and the non fe (far east)
character will use the "latin" properties. That make sense.

rtlrun : ltrch \a* defines the "latin" properties and what's after
\rtlch controls the "complex script" properties. Therefore RTL
characters in ptext will use the "complex script" properties and the
non fe (far east) character will use the "latin" properties which were
set by the \a*. That make sense.

sarun : (south asian complex script ex. hindi). In this case, fcs0 \a*
appears to define both the latin and the normal asian properties while
fcs1 properties affect the "complex script" properties too. So it
seems like rtlch and fcs1 control the same set of properties. In
ptext, complex script characters would use the "complex script
properties" which the other would use "latin" or "asian" depending of
the character's value. make sense

nonsarun : the opposite of sarun. Still make sense.

saltrrun : Now its getting complicated. It seems like the couple
"\rtlch \fcs0" \a* is used to specify the latin-asian properties. One
might think that fcsN vs rtlch/ltrch controls the same "font
properties destination". "\ltrch \fcs1" controls the "complex script"
properties. So far the idea holds up.

nonsaltrrun : The opposite of saltrrun. It works as I expected.
"\rtlch \fcs1" \a* specify the "complex script" props and "ltrch
\fcs0" properties specify latin-asian.

nonsartlrun : This is where it makes no sense. "\ltrch \fcs1" \a*
should specify the "complex script" props since we have \fcs1. "\rtlch
\fcs0" should modify latin-asian. They dont. "\ltrch \fcs1" now
control the "latin-asian" and "\rtlch \fcs0" goes in "complex script".
This means that these tags are not synonyms.

losbrun : As if it was not complicated enough, this is thrown in the
picture. \hich seems to control the latin properties but just for the
hi-characters. \loch does it for the lo-characters and \dbch for the
double byte. These 3 when taken alone together make sense. loch and
hich are both represented by the "latin" font in the ui which is why
it is blank when they differ in the rft file. dbch is represented by
the "asian" font in the ui. These three alone can't specify any
"complex script" properties.

I was trying to determine which font properties are affected by which
tag thinking of them as "font properties buckets" :
\rtlch -> complex font
\ltrch -> latin font
\fcs0 -> latin and asian font
\fcs1 -> complex font
\loch -> latin font (for lo-character)
\hich -> latin font (for hi-character)
\dbch -> asian font

this idea is wrong according to my findings above.

Can someone explain how all this works together because I am really
confused as you can tell?
Also, how is the <ptext> processed to determine which characters in it
uses which font?
Do you have to seek ahead as you parse your rtf file when reading the
associated properties to determine what "font properties bucket" they
controls?

Thanks!
.