Data Overload: Reader behavior data lacking in crucial context

I just read this piece on NPR about whether the data collected on reader behavior from ereaders is useful to writers. My gut reaction is, “nope,” but upon further reflection, I can see some circumstances under which some data along those lines could be of use. It’s not a simple, black or white question, however. It all depends on who has the data, and who’s using it and how they overcome the problem of lacking proper context.

I can easily envision a circumstance where a publisher says to a writer, “Our ereader data suggests 63% of your readers were more engaged in the portions of your last book where the hero fought werewolves. We’d like your next book to include more werewolves.” That’s not appreciably different than it is now, only with more data that appears to reinforce their beliefs. Publishing has always been an industry that, when success strikes, beats every ounce of that success into the ground. Fifty shades of erotic romance, anyone? If werewolves are showing signs of being the hot new thing, bring on the werewolves!

But is that interpretation of the data correct? Were those readers more engaged because of the werewolves or because it was a high-tension, exciting sequence that just happened to involve werewolves? That’s a pretty important distinction. The problem is, we can’t say without more data to properly explain this data.

Here’s a point made by author Scott Turow that raises a similar concern in mind:

“I would love to know if 35 percent of my readers were quitting after the first two chapters because that frankly strikes me as, sometimes, a problem I could fix.”

Possibly. But what if that 35% is industry-standard for readers dropping books after the first few chapters? How do we know? I know my reading habits often have me starting books, putting them down for other books, sometimes coming back later, sometimes not. There’s no rhyme or reason related to quality for it, either. Some of my favorite books were started three or four times before I finally followed through. And I’ve read some total tripe cover to cover.

We need a whole lot more information before making any creative decisions based on this. What if we come to discover that 35% is actually better than average? What if 40-45% turns out to be the figure? Would Turow no longer have a problem to fix? He’d still have a third of his readers not getting past chapter two, but he’d also be outperforming the industry. What if we discover this having similarities to baseball, where failing 7 times out of 10 makes you an All Star? We are lacking the frame of reference to make useful decisions based on this data. Finding answers from data lacking adequate context is like reading tea leaves or interpeting ancient religious texts; anybody can do it and find a justification to point to as evidence, even if another person can credibly interpret the proofs you site the exact opposite way.

Turow also said this:

“Would I love to hitch the equivalent of a polygraph to my readers and know how they are responding word by word? That would be quite interesting.”

Frightening might be another word for it. Hell, I sense a dystopian novel where corporations have hitched everyone to a giant monitoring device to record their every impulse and give them back only products that serve their immediate desires, sort of a permanent cultural feedback loop. I don’t see how that much data is even useful. Writers, generally speaking, have varying degrees of OCD. I can easily see the hypochondriac impulse taking over, and some writers getting obsessively lost trying to make sense of this mass of often conflicting information.

He does make a cogent point here, from a publisher’s point of view:

“Why should we publish this book if 11 readers out of 12 can’t make it past page 36?”

It’s hard to argue that. Publishers need to make money to survive. So do writers but on a different scale. If data suggests a book isn’t attracting an audience sizable enough to support publisher overhead, then why should they publish? From the other side, if a book is not showing scale that befits a relationship with a publisher, maybe that’s a way for writers to help determine if a work is better served as an independent release. After all, the term “hybrid authors” is all the rage these days. You have to choose your publishing approach somehow.

But again, this only works if the data means what we think it means. Besides, there’s also the paradox of the fact that the book has to be released in order to collect reader data on it. So, at best, unless we’re talking about turning books into software and releasing beta versions we fix after getting customer feedback, this ereader data is only really useful in a predictive sense for future work. Which means that all we’ve done is pile a lot more data into a decision we’re already making based on an already-existing pile of considerations today. Will it improve end results? Maybe, maybe not. But what it will do is provide justifications to make the initial decision more defensible, regardless of outcome. I’m not certain that’s a good thing because it has the distinct potential to provide pseudo-evidentiary cover for making bad calls on whether or not to publish.

Books will still succeed despite data that suggested beforehand that they wouldn’t. And books will still fail despite having all the indicators of a sure thing. This data is nice, but there are numerous factors at work in a successful novel, reader behavior while reading is a small part of that I can’t definitively say holds much significance. I can’t say it doesn’t, either. We just don’t have enough data. In the future, we’ll fix that, I’m sure, and be awash in all the facts, figures and statistics we can stand on reader behavior. But we’ll still be lacking the context. Without that, I’m not convinced we’ll ever be able to interpret this information properly. Short of Turow’s all-encompassing polygraph or some piece of future tech that reads minds, that context isn’t readily accessible and likely never will be.

More and accurate data is always a good thing, but who wields it and how is crucial. I have a feeling that this will turn out to be little more than echo chamber material. Anyone making an argument will be able to find the numbers somewhere in the increasingly vast data pool to support it, no matter how outlandish.

Will I use this data for something, if available? Absolutely. I can totally see its value from a marketing standpoint. Will I change a character, story or rewrite portions of work based on this information? Absolutely not. I have little confidence that any of this data means what I think it means. I have even less confidence it means what other people think it means. If it only serves to reinforce already existing opinions, then it brings little of value to the table. Maybe I can glean a way to sell more books with this data, and that’s worth a shot, but changing the actual work in response to it is a bridge too far.

Advertisements

The URI to TrackBack this entry is: https://watershedchronicle.wordpress.com/2013/02/03/data-overload-reader-behavior-data-lacking-in-crucial-context/trackback/

RSS feed for comments on this post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: