Wednesday, January 2, 2013

January 2013 - appropriately reported by Summer 2012 Research Journal

My new years resolution: only work with unlimited quantities of perfect data. So not off to an excellent start, but I am doing my part by organizing the data I have. Since I cleaned up the giant mess of name-that-country that I was working with before, I reproduced some of the graphs from earlier, and reposted the data I am using to generate them. I am soon to embark on studies of the quality-control side of this data, so now seems as good a time as any to summarize the state of affairs so far.  

We posted HITs in a ton of different languages


And they were picked up by mostly by Turkers from India and the US, most of whom report English as their native language.

India accounts for a large proportion of Turkers translating across the Indian languages (Gujarati, Telugu, Tamil, Newar, Bengali, Punjabi, Hindi, Malayalam, Marathi, Kannada) as well as a few surprise languages (Norwegian, Kapampangan, Sicilian, and Asturian). Pakistan, Macedonia, and the Philippines took the reigns on translating their respective languages.
Some Turkers get really excited about our HITs, and decide to try a translating some languages that they may not exactly speak. So some of our HITs have a decent number of assignments submitted by Turkers claiming not to be in the country that javascript says they are...
...and instead to be in the country that conveniently speaks our HIT's language.



Luckily, these misreported assignments are attributable to just a handful of Turkers, suggesting that good quality control should be able to weed them out.


No comments:

Post a Comment