Re: Most efficient ebook format and fastest reader?
- From: "xTenn" <xTennREmoveThisPart@xxxxxxx>
- Date: Thu, 12 May 2005 14:19:55 -0400
"casioculture" <casioculture@xxxxxxxxx> wrote in message
news:1115918540.059193.174890@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>
> Hello, thanks a lot for the reply. I did use another tool which is the
> open source pdftohtml, http://pdftohtml.sourceforge.net/. The problems
> I had were 1) file import in the tomeraiders 3 will only list files
> with the *.txt or *.tr2 extensions, and 2) even when I manually changed
> the .html file extension to .txt, the file import in tomeraider 3
> objected to characters and tags such as ! and doctype and gave me
> errors. What is it that I'm not doing right?
>
Well, you are not doing anything wrong. The html tags supported by the
import are basically a subset of true html, and the pdf-html generator tool
must be writing fairly nice HTML. And yes you need to rename it to .txt to
have it load correctly. In the installation there is a doc that has more
detail, but the gist of it is that.
At this point I would probably suggest using something like OpenOffice to
convert the HTML to text and trying it again. But if you have a lot of
formatting in the HTML (from the PDF) you may be a bit disappointed in the
end result.
The second option is to write a quick script to rip out the offending tags.
This is atttractive because you keep all the image tags in place, if there
are any. Once you have the script you can quickly fix all your files in a
three step process (pdf-html-strippedhtml-tr3), but it will take some trial
and error to have them working correctly. On top of that if the pdfs are
smaller by nature you would need to combine them into one large file in
order to have searching optimized. If you do use images be sure you are
using at least 3.12.
The last thing to note is that TR's strength is in massive amounts of data,
not so much in pretty pages until they improve their import process and
better image handling. For general purpose a simpler reader may be the best
alternative (mobibook, for example). But once you get the process working
you might be surprised at the tons of data you can have at your fingertips.
Hope that helps, but it looks like you have issues to work out, sorry to
say.
BTW, if you were curious about Wikipedia and TR check out here:
http://en.wikipedia.org/wiki/Wikipedia:TomeRaider
http://download.wikimedia.org/tomeraider/current_tr3/ (The 517 meg file is
pretty sweet, if you have the room. I am running a different (newer) dump,
but this one is still darn current for a general purpose encyclopedia.).
Good Luck
.
- References:
- Most efficient ebook format and fastest reader?
- From: casioculture
- Re: Most efficient ebook format and fastest reader?
- From: xTenn
- Re: Most efficient ebook format and fastest reader?
- From: casioculture
- Re: Most efficient ebook format and fastest reader?
- From: xTenn
- Re: Most efficient ebook format and fastest reader?
- From: casioculture
- Most efficient ebook format and fastest reader?
- Prev by Date: Network Permissions
- Next by Date: Re: Network Permissions
- Previous by thread: Re: Most efficient ebook format and fastest reader?
- Next by thread: Re: Most efficient ebook format and fastest reader?
- Index(es):
Relevant Pages
|