Re: text to bibliography?



On Aug 18, 5:23 am, p0 <yves.dho...@xxxxxxxxx> wrote:
convincing any user of reading even 25 pages before he can start using
a program, let alone 500. People just aren't patient enough anymore
for looking up things in a help file (see also my comment below on
creating new formatting styles).

No, tht's not it at all. You cannot use a "Help" file unless you
happen to know the exact name that the writers of the "Help" have
assigned to a feature/bug. Look how many times a day it is asked here
how to get rid of the dots between words, or why there's suddenly no
vertical space between the pages.

Help files are perfectly searchable nowadays. You write about the
'dots between words' question. It is indeed a frequently asked
question.

So I just started Word pressed F1 and entered the following in the
help box "dot words" (without the quotes). And guess what, the 8th
entry in the list of results is titled: "I see dots and arrows in my
document". I click the link, and yes, it tells me exactly how to get
rid of those dots. It is there in the help, seconds away for people to
find it. Yet they still go to newsgroups to get an answer to their
question which they could easily find themselves. And they do get the
answer in here. What is more, the question and the answer in the
newsgroup are both indexed by Google. A 2 second Google search would
prevent the next person from asking the exact same question.

Note that this is not a complaint about the (quality of) posts in a
newsgroup. I merely wish to point out that people don't read help
pages or look for answers already out there. And remarks like 'but it
is easier to just ask the question' don't count. Newsgroups are slow.
They sometimes have to wait 24 hours for an answer they could have
found in 2 seconds.

Irt sure looks like a complaint about the quality of posts.

When I try to use Help it's generally for something a tad more
complicated, like why I can't type Tibetan even though I have a
Tibetan font and a Tibetan keyboard installed.

Another frequent question is about those brackety things that appear
instead of a ToC or an entire Index. If you don't know the name "field
code," how do you find it in "Help"?

And actually, your "I do?" remarks are plain examples of you not
willing to look for help on the subject. I don't blame you for not
wanting to look things up, but at least don't say the answers aren't
available.

I noted that I did not know that there is a 500-page manual for
Papyrus, because I never needed a manual.

In a "Help" system, the answers are only available if you happen to
hit on the exact name used for the problem.

That aside, the bibliographic tools of Word 2007 lack almost all
documentation; the promised SDK is almost a year overdue now (I doubt
it will ever be released); and non of the people originally working on
the academic features seem to be still doing that job nowadays.

What's an SDK? You use their jargon, you must be one of them!

An SDK is a Software Development Kit, and it is a term widely used in
the software business. It is most certainly not coined by Microsoft.

You're one of them -- the people who talk about SDKs.

It doesn't seem too much to ask that "Text to Table" could come up
with a tabular presentation, which some other module could then
convert to the "format" used by the bibliographic database: if it
knows that col. 1 is the author, col. 2 is the date, col. 3 is the
title, col. 4 is the place, and col. 5 is the publisher (that's a
basic Book entry), why can't it simply do that?

Now you are no longer talking static text, you are talking (poorly)

"Poorly"? CMS has been around since 1906 and is by far the leading
style guide in the US.

If I would ask the medical doctors what the leading US style guide
would be, they would say AMA.
If I would ask the psychologists what the leading US style would be,
they would say APA.
If I would ask the legal people what the leading US style would be,
they would say Bluebook.
If I would ask the average school kid writing a little science paper
what the leading US style would be, they would say Turabian.

Gotcha. Turabian is based on Chicago (which she was the editor of for
50 years or so).

Gotcha? Does it matter which one is derived from which one? They are
different. Hence, any static text parser would have to work
differently on both styles.

The other three probably do not predate 1906.

And if you would ask the rest of the world what the most commonly used
style would be, they would probably say Harvard (which is also the
oldest one if I'm not mistaken).

I'm not aware that a style called "Harvard" is used in the US. In what
publication is it codified?

I personally don't use it. But it is used in almost all science fields
in Western Europe and the British Commonwealth (so that's including
Australia and New Zealand). So I think it is probably the most widely
used system.

Use "Help" to find out why it's called "Harvard" in the non-Harvard-
country-English-speaking world.

I am not trying to say that CMS isn't important or widely used, it is
just that everybody feels that his or her style is the most important
and commonly used one while it isn't.

On a side note, of the above list, only APA and Turabian are supported
in Word 2007.

I've noticed that. Kinda leaves the humanists, who tend to use MLA, up
the creek.

Not really, Word 2007 supports MLA out of the box.



formatted text. And you would have to have a tool to map columns to
fields, since in my case, year should be the last entry (except maybe
for pages) in mybibliographyand most certainly not the second.

So you don't do author-date references in the text? Fine. That would
be Chicago's "Humanities" style.

The style I use is not supported by Word 2007 at all. I did write the
transformation style*** for it from scratch.

And in your case, how is your book displayed if it is an anonymous
work? I would guess col. 1 is the title, col. 2 is the date, col. 3 is
the place, and col.4 is the publisher. So even between 2 entries of
the same type, the ordering of data would be different.

No, col. 1 would be empty. (Though there are circumstances in which
the author is entered as "Anonymous"; see CMS.)

And how do you expect your static text parser to guess that column one
is empty? Once you start adding delimeters, you can just as well use
the delimeters Microsoft defined. Those delimeters being xml tags.They
might not be what you prefer as delimeters, but they are delimeters.

I don't know what a "static text parser" is. Did you again forget that
I've put tabs between the fields, in order to do Text to Table? (The
punctuation between each pair of fields differs through each
paragraph, so it can't go by comma or period or colon.)

Static text is text without any kind of markup or delimeters
indicating clearly were fields start and/or end. It is what you would
call text without any codes.

Once again, you want to use your tabs, Microsoft wants you to use
their tabs, their tabs being the xml tags I showed earlier. It is not
because their tabs are longer than yours (and a lot more descriptive),
that they are worse. It is all a matter of taste.

Maybe you don't have anonymous works, but it doesn't matter. What you
require is so specifc that you will probably be the only one using the
'import filter' anyway. The point is, Microsoft provides a set of
generic tools which works for 80% of their customers. There is no

You are again losing sight of the point. They provide _no_ tool for
going from an existingbibliographyto thebibliographydatabase.

The tools are there, they are just not obvious in use for the average
Word user.

You just recently told me that it's _not_ possible.

It is possible, I showed you which tags to use in the simple example I
gave in one of the previous posts.

I said, a tool for going from text to database. As you pointred out, I
can do it by myself without the assistance of any tool in Word.

However, to automate the entire process, the input format has to be
perfectly known. That is, no single exception can be left aside
(anonymous works, corporate authors, ...). Once you have fully defined
your format, all you have to do is provide a mapping between your
fields and the fields defined by Microsoft.

So is it possible to do it in an automated way? Yes. Is it doable? No.
There are so many versions of every style format that your
'translator' would be either just working in your specific case, or be
a huge monster which takes years to make and would even then not cover
some exceptions. Microsoft decided not to create the monster (and I
can't blame them). Instead, they decided to give you the tools to
create your translator for your specific case. But for someone without
any programming skills, those tools are too hard to use.

The format of a b:Source element is entirely defined by an xml schema..
All you have to do is write a (simple) XSLT which transforms your
format into the format described by that schema. Of course, if your
format happens to be an incomprehensible static text, your XSLT will
be very complicated. But you can not blame Microsoft for that.

I have no idea what a "b:Source element," an "xml schema," an "XSLT
schema," however simple or complex, or an "incomprehensible static
text" may be.

And that is the main problem. I do not blame you for not knowing them.
But they are available to you. If you want to learn how to use them,
you can. It is all about reading the documentation on those
technologies.

I never used XSLT before I started using Word 2007. It took me a
couple of hours to figure out how it worked and I could start creating
my own stuff. I agree that coming from a computer science background
gave me an advantage, but still ... I had to start from zero.

point for them in developing a tool which will work in a very specifc
case (yours) and therefore will target 1% or less of their customer
base. If you want one, you will have to write it yourself. They
provide a specifcation of thebibliographyformat and even provide a
programming interface (I have no experience with it). They try to help
you a long way, but the last few steps you will have to take yourself.

On the occasions when programming new reference styles has been
mentioned here, the MVPs have stated it appears to be impossibly
complicated to do so.

It is not. It is pretty basic XSLT, nothing fancy at it.

Like you said above, all people have to do is read the available
help:
* you have the

...

read more »

.