Re: HTML to XML conversion

Subject: Re: HTML to XML conversion
From: Sandy Harris <sandy -at- storm -dot- ca>
To: "TECHWR-L" <techwr-l -at- lists -dot- raycomm -dot- com>
Date: Tue, 15 Jan 2002 11:16:04 -0500

"Sandrine Touzé" wrote:
>
> I want to import our existing HTML documentation in order to generate online
> helps and printed manuals.

You can do that without XML. I do, for my docs at www.freeswan.org/doc.html

The tool I use to go from a collection of HTML source files to:

HTML with prev/contents/next tags + separate linked HTML TOC
HTML as one big file with TOC
PDF
Postscript

is htmldoc, free from www.easysw.com, though they do charge for support.
I use it on both Linux and Windows. As I recall. it runs in some other
environments too.

That said, using XML is probably a better way to do this. I oftem think
about switching myself, but so far haven't gotten around to it.

> I guess my question should actually rather be: what are the pre-requisites
> to switch from an HTML to an XML documentation? Shall I first create a DTD

I wouldn't suggest creating your own DTD unless you /both/ have enough XML
experience to be confident you'll do it well /and/ have some unusual
requirements that rule out using a standard DTD.

I'd look at the DocBook DTD first. www.docbook.org Likely the O'Reilly book
on it would be worth your while.

Look at www.linuxdoc.org for many examples. Most of the Linux Documentation
Project docs use DocBook. Their authors' guide may also be helpful.

> and an XSL stylesheet and then import the files? What is the procedure?
> Has anyone ever gone through these kind of steps?

As someone mentioned, the HTML Tidy program (download from w3c.org) will
do batch conversions. The same site has a free browser/editor called
Amaya that lets you do "save as XHTML". Of course, that only gets you to
XHTML, not to whatever other DTD you might want.

I have some sed scripts that do parts of an XHTML-DocBook conversion.
They are not finished. Mail me off list if you want them.

A product called CommandParse from http://www.commandprompt.com/
does complete HTML to DocBook XML conversion.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Collect Royalties, Not Rejection Letters! Tell us your rejection story when you
submit your manuscript to iUniverse Nov. 6 -Dec. 15 and get five free copies of
your book. What are you waiting for? http://www.iuniverse.com/media/techwr

Have you looked at the new content on TECHWR-L lately?
See http://www.raycomm.com/techwhirl/ and check it out.

---
You are currently subscribed to techwr-l as: archive -at- raycomm -dot- com
To unsubscribe send a blank email to leave-techwr-l-obscured -at- lists -dot- raycomm -dot- com
Send administrative questions to ejray -at- raycomm -dot- com -dot- Visit
http://www.raycomm.com/techwhirl/ for more resources and info.


References:
RE: HTML to XML conversion: From: Sandrine Touzé

Previous by Author: Re: Question about dressing for job interviews
Next by Author: Re: Two Monitors
Previous by Thread: RE: HTML to XML conversion
Next by Thread: RE: HTML to XML conversion


What this post helpful? Share it with friends and colleagues:


Sponsored Ads