Some initial results on quality. Not to undersell, but I am not really sure what these charts mean, so take them with a grain of salt.
I am using a 'quality' measure which is available on most of translations submitted for the vocabulary HIT. This is a score (either 0, 0.5, or 1) which was assigned in a translation-rating HIT. From what I recall from Dmitry, this was 0 for poor quality, 1 for good quality. I believe there was also some adjustment made for looking like machine translation. I am going to follow up with Dmitry to find out exactly what these scores mean. My main concern is that the data looks suspiciously clean (very very high agreement across raters), so I am really not sure what to make of it, or if its worth using to draw any conclusions.
All that aside, I decided to make some graphs anyway, because what the hell. So assuming these scores mean something, I have some figures for average translation quality across a few cuts of the data. Among the sea of data, it is worth highlighting that that people misreporting their location do appear to produce weaker translations.
| Avg. | 99% Conf. Int. | n | |
| Overall | 0.823 | (0.821, 0.825) | 124063 |
| Misreport | 0.785 | (0.775, 0.794) | 8449 |
| Correct report | 0.825 | (0.823, 0.828) | 115614 |


No comments:
Post a Comment