Upcoming free course on Bayesian analysis

The bulk of my statistical training is based on null hypothesis significance testing (NHST – for the non stats geeks out there I’m talking about the tests that return p values, among other things). This knowledge has served me well in the past decade; however, increasingly more and more organizations and publications are moving beyond NHST (here is a statement from the American Statistical Association and here is an example of a publication banning p values).

Bayesian analysis is an alternative to NHST that updates the probability of a hypothesis as you collect more information. I’m not well versed in it so I’m going to steer you to a definition from stata.com. I’ve been curious about how Bayesian analysis can be applied to evaluation and more and more examples of it being used are popping up every day (here is one such example that I have recently read).

I’ve been wanting to learn about Bayesian methods and apply them to a current project but just haven’t had the time to delve into a textbook on my own. I was quite happy to see that Coursera will be offering a course starting Aug 29th and wanted to share it with others who may be interested. You can use either R or Excel for the coursework.

In the meantime, please share any examples of Bayesian analysis in evaluations below! I would love to check them out!

data viz tools

Awhile ago I posted about the data viz catalogue. It’s a neat resource that helps you choose a visualization that best tells the story of your data. The creator has recently posted a roundup of the 20 best tools for data visualization. It includes tools that have no coding required as well as tools for developers. There were definitely a couple that were new to me and I look forward to checking them out.

On my 2016 to-do list: learn enough coding that I can play around with the dev tools.

Recap (and downloads) from the Recreation Connections Manitoba conference

I had a lot of fun yesterday presenting at the Recreation Connections Manitoba conference. I presented a two hour workshop designed to give a “crash course” in developing a program theory and measuring program impact. If you attended the workshop and are looking for the handout, you can download it here.

It was so interesting to hear the wide variety of programs that the attendees were working on…everything from composting to an after-school program with children. I also really enjoyed talking about the different challenges that were faced when it came to measurement, such as response rates, adapting measurement tools for children, juggling limited resources, and survey bias. Rest assured that these are issues that most (all?) evaluators face! We talked about some ideas in the workshop, but I want to expand on these in future blog posts.

Thanks for the great time, Winnipeg!

Developing valid self-report measures

Self-reported measures (i.e., respondents read the question and select a response by themselves) are pretty common in evaluation. They are relatively cheap and easy to administer to a large group of people. It’s a lot easier to email a survey link than it is to hire and train a team of research assistants to follow and observe your participants and record their observations.

Some purists are quick to dismiss self-reported data. Studies have shown that people are not very honest when it comes to self-reporting their college grades, height and weight, or seat belt usage, among other things. Some problems with self-report data include:

Social desirability bias: Self-report measures rely on the honesty of your participants. “Social desirability bias” is a fancy way of saying, generally speaking, people want to present themselves in the best light possible. If your survey is asking about a sensitive topic, such as exercise frequency, eating habits, or alcohol consumption, participants might not be truthful in their responses. One way to combat this is to make questionnaires anonymous.

Understanding and interpretation: Self-report measures also rely on participants understanding your questions and the available response options. If your survey item is being misunderstood, your resulting data isn’t going to tell you much.

Memory: Even if a participants are being honest and they perfectly understand your survey questions, the quality of your data is also dependent on participants accurately remembering pertinent details. Human memory is a lot worse than people generally realize.

Response bias: Several other factors can influence how a participant responds to a question. If you are in a good mood, you may be more likely to answer the question positively. The reverse is true as well – a bad mood can predispose you to answer a question negatively. Even your personality can influence how you answer a question!

Yikes, those are some serious problems. So what does this mean for evaluators? Given their many advantages, self-report measures are not going anywhere anytime soon. Thankfully there are some steps we can take to increase the validity of our surveys:

1. Pilot test your measures: Before you “go live” with your survey with your participants, you should pilot test your questionnaire with a small number of people (in a perfect world, this small group of people would be similar to your actual participants. So if your survey is designed for youth, you should be pilot testing it with youth, and so on). As part of your pilot test, you should conduct interviews in order to ensure your items and response options are being interpreted correctly.

2. Make your survey anonymous: Anonymity can encourage participants to be honest. It can also help if the evaluator leaves the room and participants are given privacy while completing the survey. Of course sometimes we need a way to be able to track surveys, as is the case if you are doing a traditional “pre-post” design (you will be matching a participant’s survey from before the program with a survey completed after the program). In this case, a random ID number can be used, although this can add a layer of complexity in your data management.

3. Counterbalance your measures: Counterbalancing means randomizing the order survey questions appear. It could mean the order of every single section is randomized (in which case you would have a lot of different versions of the survey), or it could be as simple as splitting the survey in half and reversing the order with some participants randomly receiving the first version and the others receiving the second version. You might use the two version method with a paper survey but if you are using an online survey, many of the main online survey providers offer ways to randomize question order making it easy to have many different versions.

How about you – do you ever worry about the validity of the self-reported data? If so, what are some techniques you use to increase the quality of your measures?

As a side note, I wonder if self-report measures will be less common in the future, particularly in the realm of health and exercise. ‘Wearable tech’ devices are becoming quite common (check out how these devices are used by Disney) and keep coming down in price. The devices can capture a tremendous amount of data and how exactly they can be used in evaluation will be fascinating. If anyone has an example of an evaluation that used wearable tech data I’d love to see it!

Measuring attitudes that predict behaviours

It’s pretty typical to come across surveys asking about attitudes in evaluations. These survey results are often (not always) used to make inferences about participants’ behaviours. How valid is this approach and are there ways to structure attitudinal questions that are more likely to predict behaviour?

In a lot of circles, it is accepted wisdom that attitudes don’t predict behaviours. The classic study in regards to this is LaPiere (1934). LaPiere, a sociology professor at Stanford University, spent two years traveling in the U.S. with a Chinese couple. Over the two years, they visited 251 hotels and restaurants and were treated hospitably at all but one. LaPiere found this surprising, and when he returned home he mailed a survey to all of the businesses visited asking:  “Will you accept members of the Chinese race in your establishment?” Of the 128 businesses that responded, 92% answered no. This study was seminal in establishing attitudes don’t match behaviours and is still talked about in undergraduate social psychology and sociology classes.

Over the years it has been debated if LaPiere’s study truly shows a discrepancy between attitudes and behaviours or if it simply shows that often surveys only measure general attitudes (e.g., in general, would you allow members of the Chinese race in your business?) rather than specific attitudes (e.g., would you allow this specific Chinese couple in your business?), with specific attitudes being more likely to predict actual behaviour.

This notion is related to what is known as the Theory of Compatibility (Ajzen & Fishbein, 2005). Simply put, this theory states that attitudes are more likely to predict behaviour when they are measured at the same level of specificity. For example, general attitudes toward organ donation are quite positive, but the actual number of people who register as donors is low – a discrepancy that has frustrated and confused researchers. But when Seigel et al. (2014) asked about attitudes specific to registering as a donor, they found that they could explain over 70% more of the variance in actual registration rates. A meta-analysis of over 88 studies provides further evidence: when the theory of compatibility was adhered to, the average correlation between attitudes and behaviours was r = 0.50. When it wasn’t, the correlation was only r = 0.14.

So what does this mean for measurement in evaluation? First, as with most measurement questions, I would suggest looking at the theory of change. What is the program actually trying to accomplish – a change in attitudes or a change in behaviour or both? Often, there is the assumption that providing participants with knowledge on a topic (e.g., what are healthy eating habits) will result in attitude change (e.g., “I should eat more healthy foods”) which will then result in a behaviour change (e.g., the participants increase their intake of healthy food) – this is known as a results chain.

Keeping with the above example, let’s say you are measuring the impact of a healthy eating workshop and you will be delivering a survey immediately following the workshop. This means that you can’t assess the impact on behaviours – your only options are knowledge and attitudes. How can we use the theory of compatibility to increase the chance that our attitude questions will actually predict behaviour? Rather than asking about general attitudes toward healthy eating (e.g., “How important do you think it is to eat healthy foods?”), we should be asking about specific attitudes (e.g., “How important do you think it is for you to eat 7-8 servings* of fruits and/or vegetables per day?”).

I’m curious about how others approach this in evaluations. Do you generally measure attitudes or behaviours or  both?

*For the sake of this example, I used the guidelines from Canada’s Food Guide for an adult female, although that resource is certainly not without its controversy.






Data Viz Catalogue

I just came across this great data visualization resource through BetterEvaluation – the Data Visualization Catalogue. Choosing a chart or other visualization type that best tells the story of your findings is the most fundamental part of data visualization. This site helps you by allowing you to search data visualizations by function.



Once you choose a function, it will give you some suggested visualization types:



Very neat!

The site was created by Severino Ribecca and will be added to over time.