Category Archives: Literature

A first look at phrase length distribution

Here's a sentence length vs. frequency distribution graph for Chesterton, Poe, and Swift, plus Time of Punishment.

Phrase length distribution

A few observations:

  • Take everything with a grain of salt. There are features here that might be artifacts of parsing and so on.
  • That said, it's interesting that Poe seems to fancy short interjections more than Chesterton does (not as much as I do, though).
  • Swift seems to have a more heterogeneous style in terms of phrase lengths, compared with Chesterton's more marked preference for relatively shorter phrases.
  • Swift's average sentence length is about 31 words, almost twice Chesterton's 18 (Poe's is 21, and mine is 14.5). I'm not sure how reasonable that looks.
  • Time of Punishment's choppy distribution is just an artifact of the low number of samples.

Chesterton's magic word squares

Here are the magic word squares for a few of Chesterton's books. Whether and how they reflect characteristics that differentiate them from each other is left as an exercise to the reader.

Orthodoxy

the same way of this
world was to it has
and not think would always
i have been indeed believed
am no one thing which

The Man Who Was Thursday

the man of this agreement
professor was his own you
had the great president are
been marquis started up as
broken is not to be

The Innocence of Father Brown

the other side lay like
priest in that it one
of his is all right
this head not have you
agreement into an been are

The Wisdom of Father Brown

the priest in this time
other was an agreement for
side not be seen him
explained to say you and
father brown he had then

Magic Squares of (probabilistically chosen) Words

Thinking about magic squares, I had the idea of doing something roughly similar with words, but using usage patterns rather than arithmetic equations. I'm pasting below an example, using statistical data from Poe's texts:

Poe

the same manner as if
most moment in this we
intense and his head were
excitement which i have no
greatly he could not one

The word on the top-left cell in the grid is the most frequently used in Poe's writing, "the" — unsurprisingly so, as it's the most frequently used word in the English language. Now, the word immediately to its right, "same," is there because "same" is one of the words that follows "the" most often in the texts we're looking at. The word below "the" is "most" because it also follows "the" very often. "Moment" is set to the right of "most" and below "same" because it's the word that most frequently follows both.

The same pattern is used to fill the entire 5-by-5 square. If you start at the topmost left square and then move down and/or to the right, although you won't necessarily be constructing syntactically correct phrases, the consecutive word pairs will be frequent ones in Poe's writing.

Although there are no ravens or barely sublimated necrophilia in the matrix, the texture of the matrix is rather appropriate, if not to Poe, at least to Romanticism. To convince you of that, here are the equivalent 5-by-5 matrices for Swift and Chesterton.

Swift

the world and then he
same in his majesty would
manner a little that it
of certain to have is
their own make no more

Chesterton

the man who had been
other with that no one
and his it said syme
then own is i could
there are only think be

At least compared against each other, it wouldn't be too far fetched to say that Poe's matrix is more Poe's than Chesterton's, and vice versa!

PS: Because I had a sudden attack of curiosity, here's the 5-by-5 matrix for my newest collection of short stories, Time of Punishment (pdf link).

Time of Punishment

the school whole and even
first dance both then four
charge rants resistance they think
of a hundred found leads
punishment new astronauts month sleep