Love this. I've never had an opportunity to use MTurk in industry, but I've always wanted to run these types of tests.
Two things I love about this:
Behavioural, (human) performance based testing is severely underrepresented in user research literature. Human factors forms one of the historical bases for UX, yet the most common discussion I see are ethnographic methods or user interviews (and Prismatic frames the testing from the lens of 'usability testers'). These are great, but can be subject to interviewer effects. In fact, the way some of the tasks were worded really wouldn't have flown if I were running a direct interview with a user.
Second, The post really shows how MTurk is a great way to get a lot of participant and participants for behavioural tasks. When you can isolate something very specific you want to test, MTurk is a great platform to get participants. It will fall apart if you're looking for for substantial survey type data or 'long form' responses. That being said, I've seen some interesting surveys posted there with very particular eligibility criteria. It could easily work as a supplement to any primary user pool you have.
Some things questions/feedback:
- If this was a scientific study, there would probably be a few methodology problems. But I am rusty, so I'm not sure they would necessarily apply. I have no qualms with the end result; the main point here is that I think some basic experimental framework could easily apply here.
- Continuing with the above point, I'm fairly certain there's a better way to interpret the data. Similar to how typical A/B testing uses statistical significance to determine a winner, there's ways to analyze data here.
- The N/N recommendation for number of usability testing participants probably doesn't apply here. That recommendation was based on using a usability test to discover problems. The experiment here had some very specific conditions and hypotheses.
I really wouldn't consider this a usability test. It's user research, but it very clearly to me was an experiment. I'm being pedantic.
Was anything done to ensure a Turker doesn't complete the task twice?
I'd love to hear how much Prismatic paid each Turker and if they experimented with compensation.
Great points! Appreciate the feedback. Would love to talk more about it . Shoot me an email at email@example.com.
Definitely will reach out soon!
Has anyone else used Mechanical Turk for usability testing?
I've used it for some academic research. I executed an existing study tin order to validate MTurk's validity and the results were very close to existing studies.
(this was a cognitive psychology, behavioural study)