Re: Trapping your content in HTML

Subject: Re: Trapping your content in HTML
From: Bill Burns <BillDB -at- ILE -dot- COM>
Date: Thu, 26 Mar 1998 11:48:17 -0700

I'll take these points one by one:

> Think about it: it's 2003, and you have to update an old file that the
> last guy left behind and publish it on the HyperZwim. The extension is
> .DOC. Is it a Word 95 file? Word 97? Word 98? Word 3.0? Word 00 Service
> Pack Q?
>
Word 97 currently supports filters back to Word 2.0 for Windows. My guess is
filters will still be available. I'm not going to argue that proprietary
formats are safest, but I don't think this particluar dog will hunt. I can
open Frame 3 and 4 files in FM 5. Heck, I can open my Word 3.0 files from my
thesis (which I wrote on a Mac Classic 6 years ago) and open them in Frame 5
on my PC. (Of course, I do have those old Norton Textra files...a-and those
old things in WordStar--guess they're toast.)

> Oh, darn - the file's a little bit corrupted - or is it the
> wrong version of Word, or a bad DLL? Doesn't matter, it's all gone.
> Maybe it's only a byte or two out of the 50 Meg, but it won't open.
>
> .FM file, pretty much the same, maybe a few less versions. Interleaf
> file? hahahhahahahahah........
>
Can't argue with that. But why would you archive a corrupt version of the
file? The file itself might not be immutable, but a clean file on CD is
pretty safe, yes?

> Now, same situation but it's .HTML. Gee, nobody uses that anymore. Worst
> case - you have to read the tags and figure out that <b> means start
> bold text, <H1> means heading 1. The <TD> tag might take a little
> research, and you might never figure out what &nbsp; means, but there is
> probably some old-timer around that you can ask. Even if there is no
> HTML software in existence anymore, you can cobble together a macro in
> the HyperZwim editor that handles %95 of the conversion.
> File's a little corrupted? OK, use the good part.
>
Here's the rub. Most of the HTML that's coded today isn't valid, much less
well formed. Yeah, you can get your content out of it, but you have to
reformat the entire thing. I haven't seen any tools for taking a kludgy
markup using tables for columns and clear GIFs for creative layouts and
converting them reliably into a form that can be used in another
application. And forget about using JavaScript or ActiveX to liven up the
page. Text in graphics? Specialized typefaces for technical detail or
localized terms? All of it would have to be converted--some of it by hand.
Then you have context issues to worry about. If you've broken your files up
for web delivery, are you going to recompile them into chapters for a
printed book? Is the organization conducive to this process, or do you have
to start from scratch.

I've also seen a few non-WYSIWYG HTML editors do some pretty horrible things
to some very simple HTML files. Seems they sometimes don't like white space.
The data is there, but it no longer has a recognizable structural form to
extract from. Or its s

> PDF is probably a good choice too, since it is widely used as an
> archival format and (at least so far) is very compatible between
> versions. And there are probably other very good choices that I can't
> think of off the top of my head. But the bottom line is that I would not
> necessarily argue against HTML as a storage format.
>
I think PDF and HTML are fine for delivery, but you can't edit PDF to a
large degree, so it, too, would have to undergo significant processing to
convert it into another format. Same goes for HTML. For development (not
output), PDF is out of the question for obvious reasons. HTML limits you to
web-based delivery, so multiple output forms have to start from a different
format, unless you plan and budget for the conversion.

Bill Burns
Senior Technical Writer/Technology Consultant
ILE Communications
billdb -at- ile -dot- com




Previous by Author: Trapping your content in HTML
Next by Author: Re: Use of the definite article before product names
Previous by Thread: Re: Trapping your content in HTML
Next by Thread: Re: Trapping your content in HTML


What this post helpful? Share it with friends and colleagues:

Sponsored Ads


Sponsored Ads