Readability indices, take II

Subject: Readability indices, take II
From: Geoffrey Hart <Geoff-h -at- MTL -dot- FERIC -dot- CA>
Date: Thu, 25 Mar 1999 08:53:25 -0500

Steven Jong responded to my original posting on readability:

<<Hmmm... So if I write the sentence "The dog is white" and
change it to "The dog is black," and the tool says both
sentences are equally readable (as they will), I should demand
a refund? Gee, what do you want? Readability indexes
purport to measure readability, not meaning.>>

Picking examples always comes down to choosing your
favorite straw men, doesn't it? I should have illustrated my
point rather than just making the statements: First, I suggested
randomizing the words: "the dog is white" becomes "white
the is dog", which is hardly readable despite having an
identical readbility index. Second, I suggested changing word
order to reverse the meaning: "the dog is not white" becomes
"is not the dog white[?]". I don't know of any language in
which readability is independent of meaning; in fact, I define
readability as the ease with which the text communicates the
author's desired meaning. What do I want? An index that
does more than count words and spaces. That's useless, as I
hope my two examples have just shown.

I added <<There's almost no correlation between the main
readability indexes and actual readability, and there won't be
for a good long time to come until someone develops a tool
that can parse the content of text in the specific context of a
well-defined audience.>>

To which Steve replied: <<This is a sweeping indictment, but
it is not true.>>

In fact, Steve is correct, because I omitted a key word in my
original posting: there is almost no _causal_ correlation.

<<Readability formulas were originally developed to predict
the ability of schoolchildren to comprehend written text. The
parameters of the original formulas (startingwith the Flesch
index) were adjusted heuristically until their predictions
matched actual reading-test scores.>>

And there's a near-100% correlation between watching
television and dying prematurely (i.e., before reaching the
age of 200), and between working as a techwhirler and being
a human being. The causal correlation, however, remains
somewhat suspect in each case. As for Flesch's metrics, I've
seen some 1920s-era textbooks, and few, if any would pass
muster today... but not because they use long sentences and
long words, but rather because they're so ornately written that
the style becomes more important than (and actively
interferes with) the substance. Flesch wasn't measuring what
we'd be measuring today.

<<Actually, there is a good correlation between readability-
index scores and actual readability in certain domains.>>

It's true that well-written short sentences _can_ be more
effective than poorly written, convoluted, long sentences.
Nobody disputes that, because it's not a fair comparison.
What I dispute is the assertion that well-written, well
organized long sentences are inherently less useful than
shorter, simpler sentences. In fact, relying on overly short
sentences can compromise readability by making the text too
choppy and hindering the efficient flow of thought. I'm
unaware of any readability index that addresses these issues.

<<However, readability does have some validity, and could
reasonably be considered as part of a larger set of

And here, I'll conclude by reluctantly agreeing with you. As a
red flag for text that is childishly simple, or horridly complex,
it can work well enough. But for the vast majority of text,
which falls somewhere in between, I consider the indices of
so little use that I'd rather pay a good editor to have a read
through the manuscript and tell me if it's appropriate for my

--Geoff Hart @8^{)} Pointe-Claire, Quebec
geoff-h -at- mtl -dot- feric -dot- ca

"Patience comes to those who wait."--Anon.

