Books I read in 2017

Long term readers (hi!) may recall my failure to achieve the target I had of reading 50 books in 2016. I had joined the 2016 Goodreads reading challenge, logged my reading activity, and hence had access to the data needed track my progress at the end of the year. It turns out that 41 books is less than 50.

Being a glutton for punishment, I signed up again in 2017, with the same cognitively terrifying 50 book target – basically one a week, although I cannot allow myself to think that way. It is now 2018, so time to review how I did.

Goodreads allows you to log which books you are reading and when you finished them. The finish date is what counts for the challenge. Nefarious readers may spot a few potential exploits here, especially if competing for only 1 year. However, I tried to play the game in good faith (but did I actually do so?  Perhaps the data will reveal!).

As you go through the year, Goodreads will update you on how you are doing with your challenge. Or for us nerd types, you can download a much more detailed and useful CSV. There’s also a the Goodreads API to explore, if that floats your boat.

Similarly to last year, I went with the CSV.  I did have to hand-edit the CSV a little, both to fill in a little missing data that appears to be absent from the Goodreads dataset, and also to add couple of extra data fields that I wanted to track that Goodreads doesn’t natively support. I then popped the CSV into a Tableau dashboard, which you can explore interactively by clicking here.

Results time!

How much did I read

Joyful times! In 2017 I got to, and even exceeded, my target! 55 books read.

In comparison to my 2016 results, I got ahead right from the start of the year, and widened the gap notably in Q2. You can see a similar boost to that witnessed in 2016 around the time of the summer holidays, weeks 33-35ish. Not working is clearly good for one’s reading obligations.

What were the characteristics of the books I read?

Although page count is a pretty vague and manipulable measure – different books have different physical sizes, font sizes, spacing, editions – it is one of the few measures where data is easily available so we’ll go with that. In the case of eBooks or audio books (more on this later) without set “pages” I used the page count of the respective paper version. I fully acknowledge this rigour of this analysis as falling under “fun” rather than “science”.

So the first revelation is that this year’s average pages per read book was 300, a roughly 10% decrease from last year’s average book. Hmm. Obviously, if everything else remains the same,  the target of 50 books is easier to meet if you read shorter books! Size doesn’t always reflect complexity or any other influence around time to complete of course.

I hadn’t deliberately picked short books – in fact, being aware of this incentive I had tried to be conscious of avoiding doing this, and concentrate on reading what I wanted to read, not just what boosts the stats. However, even outside of this challenge, I (most likely?) only have a certain number of years to live, and hence do feel a natural bias towards selecting shorter books if everything else about them was to be perfectly equal. Why plough through 500 pages if you can get the same level of insight about a topic in 150?

The reassuring news is that, despite the shorter average length of book, I did read 20% more pages in total. This suggests I probably have upped the abstract “quantity” of reading, rather than just inflated the book count by picking short books. There was also a little less variation in page count between books this year than last by some measures.

In the distribution charts, you can see a spike of books at around 150 pages long this year which didn’t show up last year. I didn’t note a common theme in these books, but a relatively high proportion of them were audio books.

Although I am an avid podcast listener, I am not a huge fan of audio books as a rule. I love the idea as a method to acquire knowledge whilst doing endless chores or other semi-mindless activities. I would encourage anyone else with an interest of entering book contents into their brain to give them a whirl. But, for me, in practice I struggle to focus on them in any multi-tasking scenario, so end up hitting rewind a whole lot. And if I am in a situation where I can dedicate full concentration to informational intake, I’d rather use my eyes than my ears. For one, it’s so much faster, which is an important consideration when one has a book target!  With all that, the fact that audio books are over-represented in the lower page-counts for me is perhaps therefore not surprising. I know my limits.

I have heard tell that some people may consider audio books as invalid for the book challenge. In defence, I offer up that Goodreads doesn’t seem to feel this way in their blog post on the 2018 challenge. Besides, this isn’t the Olympics – at least no-one has sent me a gold medal yet – so everyone can make their own personal choice. For me, if it’s a method to get a book’s contents into my brain, I’ll happily take it. I just know I have to be very discriminating with regards to selecting audio books I can be sure I will be able to focus on. Even I would personally regard it cheating to log a book that happened to be audio-streaming in the background when I was asleep. If you don’t know what the book was about, you can’t count it.

So, what did I read about?

What did I read

Book topics are not always easy to categorise. The categories I used here are mainly the same as last year, based entirely on my 2-second opinion rather than any comprehensive Dewey Decimal-like system. This means some sort of subjectivity was necessary. Is a book on political philosophy regarded as politics or philosophy? Rather than spend too much time fretting about classification, I just made a call one way or the other. Refer to above comment re fun vs science.

The main changes I noted were indeed a move away from pure philosophical entries towards those of a political tone. Likewise, a new category entrant was seen this year in “health”. I developed an interest in improving one’s mental well-being via mindfulness and meditation type subjects, which led me to read a couple of books on this, as well as sleep, which I have classified as health.

Despite me continuing to subjectively feel that I read the large majority of books in eBook form, I actually moved even further away from that being true this year. Slightly under half were in that form. That decrease has largely been taken up by the afore-mentioned audio books, of which I apparently read (listened?) 10 this year. Similarly to last year, 2 of the audio entries were actually “Great Courses“, which are more like a sequence of university-style lectures, with an accompanying book containing notes and summaries.

My books have also been slightly less popular with the general Goodreads-rating audience this year, although not dramatically so.

Now, back to the subject of reading shorter books in order to make it easier to hit my target: the sheer sense of relief I felt when I finished book #50 and hence could go wild with relaxed, long and slow reading, made me concerned as to whether I had managed to beat that bias or not. I wondered whether as I got nearer to my target, the length of the books I selected might have risen, even though this was not my intention.

Below, the top chart shows that average page count by book completed on a monthly basis, year on year.

Book length ofer time

 

The 2016 data risks producing somewhat invalid conclusions, especially if interpreted without reference to the bottom “count of books” chart, mainly because of the existence of a  September 2016, a month where I read a single book that happened to be over 1,000 pages long.

I also hadn’t actually decided to participate in the book challenge at the start of 2016. I was logging my books, but just for fun (imagine that!). I don’t remember quite when it was suggested I should explicitly join then challenge, but before then it’s less likely I felt pressure to read faster or shorter.

Let’s look then only at 2017:

Book length ofer time2Sidenote: What happened in July?! I only read one book, and it wasn’t especially long. I can only assume Sally Scholz’s intro to feminism must have been particularly thought-provoking.

For reference, I hit book #50 in November this year. There does seem some suggestion in the data that indeed that I did read longer books as time went on, despite my mental disavowal of doing such.

Stats geeks might like to know that the line of best fit shown in the top chart above could be argued to represent that 30% of the variation in book length over time, with each month cumulatively adding on an estimate of an extra 14 pages above a base of 211 pages.  It should be stated that I didn’t spend too long considering the best model or fact-checking the relevant assumptions for this dataset. Instead just pressed “insert trend line” in Tableau and let it decide :).

I’m afraid the regression should not be considered as being traditionally statistically significant at the 0.05 level though, having a p-value of – wait for it – 0.06. Fortunately, for my intention to publish the above in Nature :), I think people are increasingly aware of the silliness of uncontextual hardline p-value criteria and/or publication bias.

Nonetheless, as I participate in the 2018 challenge – now at 52 books, properly one a week – I shall be conscious of this trend and double-up my efforts to keep reading based on quality rather than length. Of course, I remain very open – some might say hopeful! – that one sign of a quality author is that they can convey their material in a way that would be described as concise. You generous readers of my ramblings may detect some hypocrisy here.

For any really interested readers out there, you can once more see the full list of the books I read, plus links to the relevant Goodreads description pages, on the last tab of the interactive viz.

Advertisements

Books I read in 2016

Reading is one of the favoured hobbies in the DabblingWithData household. In 2016 my beloved fiance invited me to participate in the Goodreads Reading Challenge. It’s simple enough – you set a target and then see if you can read that many books.

The challenge does have its detractors; you can see that an obsession with it will perversely incentivise reading “Spot the Dog” over “Lord of the Rings“. But if you participate in good spirits, then you end up building a fun log of your reading which, if nothing else, gives you enough data that you’ll remember at least the titles of what you read in years hence.

I don’t quite recall where the figure came from, but I had my 2016 challenge set at 50 books. Fifty, you might say, that’s nearly one a week! Surely not possible – or so I thought. I note however that my chief competitor, following a successful year, has set this year’s target to 100, so apparently it’s very possible for some people).

Anyway, Goodreads has both a CSV export feature of the books you log as having read in the competition, and also an API.  I therefore thought I’d have a little explore of what I managed to read. Who knows, perhaps it’ll help improve my 2017 score!

Please click through for slightly more interactive versions of any chart, or follow this link directly. Most data is taken directly from Goodreads, with a little editing by hand.

How much did I read.png

Oh no, I missed my target 😦 Yes, fifty books proved too challenging for me in 2016 – although I got 80% of the way there, which I don’t think is too terrible. My 2017 target remains at fifty.

The cumulative chart shows a nice boost towards the end of August, which was summer holiday time for me. This has led me to conclude the following actionable step: have more holidays.

I was happy to see that I hadn’t subconsciously tried to cheat too much by reading only short books. From the nearly 14k page-equivalents I ploughed through, the single most voluminous book was Anathem. Anathem is a mix of sci-fi and philosophy, full of slightly made-up words just to slow you down further – an actual human:alien glossary is generously included in the back of the book.

The shortest was the Ladybird Book of the Meeting. This was essential reading for work purposes of course, and re-taught me eternal truths such as “Meetings are important because they give everyone a chance to talk about work. Which is easier than doing it”.

Most of my books were in the 2-400 page range – although of course different books make very different usages of a “page”.

So what did I read about?

what-did-i-read

Science fiction is #1 by book volume. I have an affinity for most things that have been deemed geeky through history (and perhaps you do too, if you got this far in!), so this isn’t all that surprising.

Philosophy at #2 is a relatively new habit, at least as a concerted effort. I felt that I’d got into the habit of concentrating too much on data (heresy I know), technology and related subjects in previous years’ reading habits – so thought I’d broaden my horizons a bit by looking into, well, what Google tells me is merely the study of “the fundamental nature of knowledge, reality, and existence”. It’s very interesting, I promise. Although it can be pretty slow to read as every other sentence one does risk ending up staring at the ceiling wondering whether the universe exists, and other such critical issues. Joking aside, the study of epistemology, reality and so on might not be a bad idea for analysty types.

Lower down we’ve got the cheap thriller and detective novels that are somewhat more relaxing, not requiring either a glossary or a headache tablet.

I was a little surprised at what a low proportion of my books were read in eBook format. For most – not all – books, I think eReaders give a much superior reading experience to ye olde paper. This I’m aware is a controversial minority  opinion but I’ll stick to it and point you towards a recent rant on the Hello Internet podcast to explain why.

 

So I’d have guessed a 80-90% eBook rate – but a fair number of paper books actually slipped in. Typically I suspect these are ones I borrowed, or ones that aren’t available in eBook formats. Some of Asimov’s books, of which I read a few this year, for instance are usually not available on Kindle.

On which subject, authors. Most included authors only fed my book habit once last year, although the afore-mentioned Asimov got his hooks into me. This was somewhat aided by the discovery of a cluster of his less well-known books fortuitously being available for 50p each at a charity sale. But if any readers are interested in predictive analytics and haven’t read the Foundation Trilogy, I’d fully recommend even a full price copy for an insight into what the world might have to cope with if your confusion matrix ever showed perfection in all domains.

Sam Harris was the second most read. That fits in with the philosophy theme. He’s also one of the rare people who can at times express opinions that intuitively I do not agree with at all, but does it in a way such that the train of thought that led him to his conclusions is apparent and often quite reasonable. He is, I’m aware, a controversial character on most sides of any political spectrum for one reason or another.

Back to format – I started dabbling with audio books, although at first did not get on so well with them; there’s a certain amount of concentration needed which comes easier to me when visual-reading than audio-reading. But I’m trying again this year, and it’s going better – practice makes perfect?

The “eBook /Audio” category refers to a couple of lecture series from the Great Courses  which give you  a set of half hour lectures to listen to, and an accompanying book to follow along with. These are not free but they cover a much wider range of topics than the average online MOOC seems to (plus you don’t feel bad about not doing assignments – there are none).

Lastly, the GoodReads rating. Do I read books that other people think are great choices? Well, without knowing the background distribution of ratings, and taking into account the number of reviews and from whom, it’s hard to do much except assume a relative ranking when the sample gets large enough.

It does look like my books are on the positive side of the 5-points scale, although definitely not the amongst GoodReads’ most popular. Right now, that list starts with The Hunger Games, which I have read and enjoyed, but it wasn’t in 2016. Looking down the global popularity list, I do see quite a few I’ve had a go at in the past, but almost none that I regret choosing one of my actual choices over this year at first sight!

For the really interested readers out there, you can see the full list of my books and links to the relevant Goodreads pages on the last tab of the viz.

When is it safe to stop watching the match?

Despite the Harvard Business Review‘s insistence that data analyst is the sexiest job of the 21st century, ask a non-quant about popular references to data analyssis and you are quite likely to hear some reference to Moneyball (be that book or film). Spoiler alert: “sabermetric” data analysis enabled a baseball team with less money to beat another one that had a lot more money.

Very cool, except – in possibly the most inflammatory statement likely to make it onto this blog – in general watching team sport matches at length is pretty pointless.

Evidence? Clauset et al. have contributed to the field in their recent paper “Safe leads and lead changes in competitive team sports”, published recently in the Physical Review journal.

Within it, they attempt to use data to model and validate how the lead changes between teams playing certain sports. For instance, team A might score the first point in a match, but – specific-sport-allowing – team B might well then score 2 points and seize the lead. The usual rule of course is whoever happens to have the lead after a set amount of time is deemed the winner.

Although they dabble quite successfully in others, the sport they model most accurately is basketball. Their rationale for starting here is that basketball has a high rate of points scoring, with NBA statistics showing an average of 93.6 baskets with an average value of 2.07 points per basket.

Modelling frequent events accurately is almost always easier than modelling infrequent events, so it’s clear why they picked basketball over UK football for instance, where FiveThirtyEight reports that the most common score found in almost 200,000 English football games was a thrilling 1:0. This occurred in about 16% of the matches. In fact not far off 10% of games ended with no-one scoring and no-one winning at all, just to make it sound even more exciting.

Anyway, that aside, how did Clauset’s team model the changes in lead of basketball so accurately that it significantly beat previous heuristics? Advanced logistic neural network forest tree linear super-regressions? Nope, they used a random walk.

For those unfamiliar with random walk models, it’s quite easy to understand at least at the simplest level.

You can imagine a random walk in physical terms. Consider a situation where you’re standing on a platform and can walk either forwards or backwards. Flip a coin – heads you walk forwards, tails you walk backwards. Repeat until 48 minutes have elapsed and consider that your result.

Sounds fantastically trivial, right? What in the uber-complexities of reality could really be modelled by anything derived from such a basic process? Oh, nothing much, just simple things like the stock market and molecular movements amongst others.

And sports, apparently.

The team concludes:

A model based on random walks provides a remarkably good description for the dynamics of scoring in competitive team sports.

In fact the same set of laws can determine many aspects of having the lead in a game.

…we found that the celebrated arcsine law of Eq. (1) closely describes the distribution of times for: (i) one team is leading …,
(ii) the last lead change in a game …
and (iii) when the maximal lead in the game occurs…

The model even covers the empirical fact that if something exciting is going to happen (an “extremal value”) then it tends to be near the very start or the very end of the game.

Lest it be said that I am unfairly representing the model due to my personal views of the merits of long-term sport viewing, towards the end of the article the authors similarly commit:

Cynically, our results suggest that one should watch only the first few and last few minutes of a professional basketball game; the rest of the game is as predictable as watching repeated coin tossings.

And I don’t think they mean that in a positive way!

For the full formulae, validation and so on, see the original paper.

But being in the middle of an arena-crowd watching said sport is probably not an ideal time to whip out the scientific calculator to determine if the lead will change and when – so there is a handy rule of thumb one can use to determine if the match is effectively over, as Slate reports.

it can be expressed as a rule of thumb for determining what the lead and remaining time have to be for a team to have a 90 percent chance at maintaining that lead:

L = .4602√t

, where L is the lead and t is the number of seconds remaining.*

As even the most ardent fan is unlikely to think in terms of seconds remaining, the below chart will tell you when it’s safe to make your excuses and leave the NBA stadium, assuming a 90% confidence level is within your tolerance.

Lead needed to predict win

Assuming a standard 48 minute basketball match, locate the number of minutes that have elapsed already on the x axis, and if the current winning team is leading by at least the y-axis number of points then they are at least 90% sure to win overall. For instance, if you’ve watched 40 minutes of play, and your team is ahead by around 10 points then there’s really not much point in watching it play out – go flip some real coins at the bartender whilst there’s not a queue.

(Journal reference: Phys. Rev. E 91, 062815 (2015))

Every death in the Game of Thrones – a visualisation

TBronn is the #1 killerhe Washington Post published a nice visualisation concerning the many, many deaths in Game of Thrones yesterday – apparently there have been 456 such violent extravaganzas.

Coded by season, allegiance, importance of character, method of death and other such metadata it gives a nice refresh of the important parts of the storyline. Find out which location was deadliest, which character has the most kills and other such fascinating and vital facts.

One has to love the understated nature of the associated data. They record the death of Oberyn as being “Method category: Hands” which, whilst undoubtedly accurate, does not entirely set the scene as to the horror-fest that is more elucidated by Time magazine’s description of it as “his head popped like a grape”.

It certainly made me pull a face not dissimilar to the expression of the unfortunate bystander below.

Reaction to Oberyn's death

Of course the scene is on Youtube if you really must re-view.