Yet another reason why theory should guide evaluation

The “replication crisis” has been a hot button issue in science for awhile now. Simply put, many experiments are difficult or impossible to replicate. I’m a social psychologist and so that is where I have been following the discourse. For example, many of the “classic” social psychology experiments that you may have learned about in Psychology 101 have failed to be replicated. This study suggests that perhaps we should discount two thirds of published findings in social psychology! This is especially disheartening when I  think about how many studies I read in the course of 9 years studying social psychology in university.

Roger Peng (who teaches great courses over on Coursera by the way, which is how I found his blog) recently wrote a super interesting post about this topic. Peng talks about how in fields with a strong background theory (as well as in fields that do not rely on experimental design) there isn’t a crisis.

This led me to think about evaluation and the importance of having a solid theory of change guide your work. If we evaluate a program and we don’t have a theory of change we call this a “black box evaluation.” Our results can tell us whether or not a program had an effect…but we have no idea why. Was it due to a particular component of the program? Effective staff? Something about the participants? And if we can’t answer why a program did or did not have an effect we certainly can’t replicate the program in other places.

Previous to today I had mostly thought of the replication crisis as a research problem (and one I think about when I wear my “researcher hat”) but I found it super interesting to see how it can also be an evaluation problem (and I will certainly incorporate it into my “evaluator hat” thinking!).

Upcoming free course on Bayesian analysis

The bulk of my statistical training is based on null hypothesis significance testing (NHST – for the non stats geeks out there I’m talking about the tests that return p values, among other things). This knowledge has served me well in the past decade; however, increasingly more and more organizations and publications are moving beyond NHST (here is a statement from the American Statistical Association and here is an example of a publication banning p values).

Bayesian analysis is an alternative to NHST that updates the probability of a hypothesis as you collect more information. I’m not well versed in it so I’m going to steer you to a definition from I’ve been curious about how Bayesian analysis can be applied to evaluation and more and more examples of it being used are popping up every day (here is one such example that I have recently read).

I’ve been wanting to learn about Bayesian methods and apply them to a current project but just haven’t had the time to delve into a textbook on my own. I was quite happy to see that Coursera will be offering a course starting Aug 29th and wanted to share it with others who may be interested. You can use either R or Excel for the coursework.

In the meantime, please share any examples of Bayesian analysis in evaluations below! I would love to check them out!