Non-linear relationships: The importance of examining distributions

Recently I was analyzing some data to help answer the question “what are the demographic differences between program graduates and program drop outs?” I did some modelling and found a few predictors, one of which was age.

I compared the average age between the groups and saw that the drop outs had a lower average age (42 years) than graduates (44 years). Simple enough. But this simplistic explanation didn’t jive with anecdotal information the program staff had given me. I wondered if the relationship between age and program completion was linear (i.e., does a change in age always produce a chance in the likelihood of graduating).

As I mentioned in my last post, I’ve been playing around with R. I recently came across something called a violin plot and I wanted to try it out. A violin plot is kind of like a box plot, except that instead of a plain old box it shows you the distribution of your data.

Here is an example of a box plot:

boxplot

The main thing that I immediately see from this chart is that on average, the drop outs were younger than the graduates.

Here is an example of a violin plot:

violin

I get a different takeaway from this plot. You can see from the violin plot that the distribution of age for the drop outs looks a lot different than the distribution of age for the graduates. The bottom of the drop out violin is wider, indicating that the drop outs skew a lot younger than the graduates. This indicates that we should be exploring the relationship between age and graduation more closely.

But what if you don’t use R and can’t create a violin plot? Histograms are standard tools to show distributions and are much more common. A histogram is essentially a column chart that show the frequency of values in your distribution (so for this example, it would show how many participants were 20 years old, 21 years old, 22 years old, you get the idea). Excel actually has a built in feature to create histograms (click here for instructions). The tool bugs me a lot and it isn’t super intuitive to use, but it gets the job done.

Here is the distribution for age for both the drop outs and graduates. Yes, yes, I know that my x-axes aren’t labelled and that my y-axes use different scales but these choices were intentional because I want you to focus on the shape of the distributions, not the content.

histograms

Again, you can see that the age of the drop outs skews to the left (meaning that there is a higher proportion of younger participants than older). The histogram for the graduated group looks quite different.

All of this evidence points to a non-linear relationship, meaning that age has an effect on whether or not a participant graduates for participants in different age groups.

To take a closer look at this relationship, I calculated the drop out rate for different age groupings and put them on a line chart. Aha! If the relationship between age and program completion was linear, we would expect this line to be straight. But it’s not. You can see that the drop-out rate declines with age until we hit age 40 or so. After that it’s more or less flat until age 70, and then goes down again.

dropouts.PNG

This is an important piece of knowledge for program staff to target retention efforts and something that we wouldn’t have uncovered if we simply had stopped at comparing the average age between the drop-outs and the graduates.

Showing two main points on one chart

It’s (usually) fairly straightforward to choose a chart type when you know what the main point you are trying to get across is. Is your message that there has been a change over time? Do you want to show a difference between groups? There are all kinds of online chart choosers to help you do this (here is one of my favourites). But what about when you have two main points to make?

I was recently working on a chart where I wanted to make the following two points:

  1. 2016 was the only year that participants had a statistically signifcant increase in health ratings; and
  2. participants had lower health ratings pre-program in 2016 vs. other years

I started with the chart below. Here the different color used in 2016 really highlights that something different happend that year (half of point #2), but it is difficult to see the change over time (point #1, half of point #2):

chart1

Alright then, let’s change to a line graph. It is much easier to see the change over time. However, the statistical change in pre- and post-test scores was important to the program and they wanted to highlight that. That piece of information isn’t easy to see here.

chart2

I added a transparent rectangle to highlight the difference between pre- and post-test scores and this is the result:

chart3

I think that this chart nicely conveys the two main points that I wanted to make and is a vast improvement over the first chart. It also goes to show that it’s worthwhile to play around with different chart types while working on reporting!

Note: I have changed the results to fictional data to keep things anonymous

The Importance of Context

Recently I was looking at some data and I noticed a trend in a neighbourhood surrounding a community centre that was evaluating the effectiveness of their poverty reduction work. The number of families classified as having a low income had decreased over recently (Neighbourhood A). Several nearby neighbourhoods (Neighbourhoods B and C) had definitely not seen this decrease.

neighbourhoods

(Shout out to Stephanie Evergreen for forever changing my life with small multiples)

At first glance this looked promising – had the poverty reduction campaign contributed to this? People were excited but I had my reservations about claiming success so quickly.

If you’ve recently visited Toronto you know that there are building cranes everywhere. Neighbourhoods are changing (read: gentrifying) very, very quickly as luxury condos go up and lower income families are driven further and further out of the core. It was possible that the income level of residents hadn’t changed – perhaps the low income residents had moved out and more affluent residents had moved in. First piece of evidence: Neighbourhood A had four condominium projects completed in that time frame whereas Neighbourhood B had one and Neighbourhood C had zero.

Next we looked at demographics. Canada completes a census every five years. We had could compare 2006 and 2011 data as the 2016 is not yet available. Second piece of evidence: Neighbourhood A had decreases in children, youth, and seniors (and families overall) but an increase in working age adults). The change wasn’t near as drastic in Neighbourhoods B and C.

Fortunately we had a lot of other data to look at in order to evaluate the program but I thought that this was a nice illustration of why it’s really important to look at the context behind the data and examine other possible explanations before claiming success.

 

Yet another reason why theory should guide evaluation

The “replication crisis” has been a hot button issue in science for awhile now. Simply put, many experiments are difficult or impossible to replicate. I’m a social psychologist and so that is where I have been following the discourse. For example, many of the “classic” social psychology experiments that you may have learned about in Psychology 101 have failed to be replicated. This study suggests that perhaps we should discount two thirds of published findings in social psychology! This is especially disheartening when I  think about how many studies I read in the course of 9 years studying social psychology in university.

Roger Peng (who teaches great courses over on Coursera by the way, which is how I found his blog) recently wrote a super interesting post about this topic. Peng talks about how in fields with a strong background theory (as well as in fields that do not rely on experimental design) there isn’t a crisis.

This led me to think about evaluation and the importance of having a solid theory of change guide your work. If we evaluate a program and we don’t have a theory of change we call this a “black box evaluation.” Our results can tell us whether or not a program had an effect…but we have no idea why. Was it due to a particular component of the program? Effective staff? Something about the participants? And if we can’t answer why a program did or did not have an effect we certainly can’t replicate the program in other places.

Previous to today I had mostly thought of the replication crisis as a research problem (and one I think about when I wear my “researcher hat”) but I found it super interesting to see how it can also be an evaluation problem (and I will certainly incorporate it into my “evaluator hat” thinking!).

Upcoming free course on Bayesian analysis

The bulk of my statistical training is based on null hypothesis significance testing (NHST – for the non stats geeks out there I’m talking about the tests that return p values, among other things). This knowledge has served me well in the past decade; however, increasingly more and more organizations and publications are moving beyond NHST (here is a statement from the American Statistical Association and here is an example of a publication banning p values).

Bayesian analysis is an alternative to NHST that updates the probability of a hypothesis as you collect more information. I’m not well versed in it so I’m going to steer you to a definition from stata.com. I’ve been curious about how Bayesian analysis can be applied to evaluation and more and more examples of it being used are popping up every day (here is one such example that I have recently read).

I’ve been wanting to learn about Bayesian methods and apply them to a current project but just haven’t had the time to delve into a textbook on my own. I was quite happy to see that Coursera will be offering a course starting Aug 29th and wanted to share it with others who may be interested. You can use either R or Excel for the coursework.

In the meantime, please share any examples of Bayesian analysis in evaluations below! I would love to check them out!