Happy Christmas, holidays, New Years, Decembers, Fridays, fewer-spam-emails-this-morning, or whatever you might be celebrating.
Picking up from a while back in the semester, I am looking again at Turker demographics, specifically their language abilities. In contrast to the analysis I had begun earlier, I am looking at things on a by-turker basis now.
I generated a list of tuples of (turker_id, geo-location, self-reported language, self-reported country) which can be downloaded
here. I weeded out turkers who reported more than one native language. This left 2652 turkers to study, less than half of all the turkers in the full dataset.
| No language reported | 2556 |
| One language reported | 2652 |
| More languages reported | 828 |
Among turkers who reported multiple languages to be their native language, some reported only a modest two, others were bold and claimed more than 5 native languages. One turker went to town, and claimed a full 15 native languages, giving the EU a run for its money. Of those claiming multiple languages, English appears in nearly every list, accounting for most of the double-native-languagers.
| Num languages reported | Frequency |
| 2 | 684 |
| 3 | 94 |
| 4 | 23 |
| 5 | 7 |
| 6 | 8 |
| 7 | 4 |
| 8 | 1 |
| 9 | 2 |
| 10 | 3 |
| 15 | 1 |
Here are distributions of the top 15 most represented countries and languages in terms of number of turkers.
 |
| 15 most represented countries |
 |
| 15 most represented languages |
As an interesting comparison to my work from earlier this semester, here is the by-country distribution in terms of number of HITs submitted (rather than number of turkers).
 |
| Most represented countries by number of HITs submitted |
Also, out of curiosity, since English was the most common language while India is the most represented country, I checked the most represented countries among only English speakers:
 |
15 most represented countries among self-reported English speakers
|