Wednesday, October 12, 2016

Lexical Pound Cake

Have you ever heard of lexical density? I hadn't until just recently. In basic terms, lexical density is a measure of how difficult it is to read a particular piece of text. This is calculated by dividing the number of unique words in that text by the total number of words, which grants a percentage value. Lower values indicate text that is easy to read, while higher values show text that is more difficult, or "lexically dense."

For some reason I was strangely enamored with this concept, so I decided to do some more research. I found that in general, works of fiction tend to have lexical densities between 49% and 51%. If that sounds like a very narrow range, just know that I thought so, too. And it warranted an experiment.

Now, I have been known to spend a lot of time doing calculations on relatively stupid topics (If you don't believe me, read this post from my other blog. You will never see gears the same way again). I also happen to be sitting on the complete manuscript for a sci-fi novel I spent most of high school writing, so I figured why not have a little fun?

Knowing that fiction is supposed to be between 49-51%, I wanted to know how my own work of fiction stacks up. I found a text analysis website that calculates lexical density and went to work. I had gleaned from my research that larger samples of text give lower values because you repeat words more often (my book uses the word "the" about 6,800 times), but I had no idea how different the results would be. Putting on my mad scientist hat for a moment, I did an analysis of the entire book, which caused the website to crash a couple of times before it finally worked. Pro Tip: Do NOT try to copy/paste an entire novel. Some websites just can't handle it.

The result? 18%. At first I was utterly shocked. Compared to the roughly 50% goal marker, 18% made my novel look like a Dr. Seuss book, right? I was highly skeptical, and remembering what I'd heard about the sample text size, I wondered what sample sizes were used to obtain the 49-51% figure. Cue more math.

I did another analysis on each chapter of the book individually, and the results were astonishingly different!

Chapter Lexical Density
Prologue 62%
1 44%
2 50%
3 45%
4 58%
5 50%
6 53%
7 57%
8 49%
9 48%
10 56%
11 53%
12 51%
13 52%
14 53%
15 69%
AVERAGE: 53%

Suddenly it went from a picture book to the Oxford English Dictionary! What happened? I figure a chapter is a good enough mixture of description and dialogue that it should be a good cross-section of the work, but my average is 53%, which is definitely above 49-51%. And just look at the last chapter. That's the kind of number that you'd expect from some stuffy academic dissertation, not YA fiction.

I have a few different writing styles, each one meant for a different purpose. I thought that perhaps lexical density would be proportional to the level of formality, so I ran a diagnostic on one of my blog posts, where I'm definitely not formal in any way. (It was the organization one from last month, if you want to know). The result? 74%. Not what I expected at all.

So what does any of this mean? Frankly, I'm not even sure. But, according to the math, I use a greater word variety than most writers, but according to a reading difficulty index based on a different formula (the website gave me both), my writing is on the easy-to-read side. I didn't think those two things could go together, but I figure that clear writing with above-average word variety has to be a good thing.

I don't know what I'll take away from this exploration of useless stats and figures, and I bet you'll get even less from it, but at least we both know more about lexical density than we did yesterday, right? Plus, I think this has all been rather fun, even if the math says my writing is more like pound cake than meringue. But I think I'll let you be the judge.

Hic Manebimus Optime!

4 comments:

  1. This is a perfect blog post to let your readers see (although perhaps not comprehend) the inner workings of your mind! Great post. Please don't analyze the lexical density of my comment though. However, I find that both meringue and pound cake are delicious...just sayin'.

    ReplyDelete
    Replies
    1. It's 94%. (Sorry, I had to!) But see what I mean about sample size? Thanks for reading!

      Delete
  2. HAHA! I always wondered why when you explain stuff, I can somehow marginally understand things but when other people drone on about something technical, my eyes glaze over. Must be that your lexical pound cake recipe is more palatable than your average smart guy!

    ReplyDelete
    Replies
    1. Thanks! I'm glad you don't find me too difficult to understand.

      Delete