Tuesday, January 15, 2013

Babel of towers

Here's a breakdown of some of the quality of some of our high-traffic HIT languages, in terms of the locations of the turkers. Not surprisingly, the higher submissions tend to come from countries where the HIT language is likely to be spoken. On exception is that Russians are apparently not that good at Russia. Shown are the 6 languages in which had the highest number of assignments submitted...

But first! Some whining and excuses:
- I admit, bar graph is not an ideal visualization of this, a table would probably be nicer...but laziness kicks in. I'll make it into a nice table tomorrow.
- These show any country from which at least 5 assignments were submitted, hence the large error bars on some. 
- N/A means that we do not have country information for that assignment. These should (and will eventually) be trimmed out...

Urdu


Macedonian
Telugu
Malayalam

Russian

Spanish
The same idea, but breakdown of quality of 6 most represented countries (by assignments submitted) in terms of HIT language. Again, no big surprises: turkers in India do better on Indian languages than European languages. Turkers in the US also do surprisingly better on Indian languages, but this is likely because they are being submitted by Turkers who are not born in the US. I will rerun this analysis with reported native language (Although we have sadly less data for reported native languages. I know, life is tough. We persevere.)

India

US

Macedonia

Philippines

Moldova

Malaysia



No comments:

Post a Comment