Summer 2012 Research Journal: May 2012

Thursday, May 31, 2012

May 31 - Lit Review

Google Charts Practice After reading through multiple CS papers, and finding few comprehensive lists of common ESL errors, I followed Chris's suggestion and have been reading through linguistics papers. I read through some journal articles, courtesy of Google Scholar, but ran into similar problems in general that I ran into while reading CS papers: the focus tends to be on a specific error type and an academic analysis of it, rather than on a survey of many common error types. Interestingly, I found the best lists of errors by doing just a basic search and reading through universities' tutoring or TOEFL prep websites.*

I read through 5 to 10 different sites with lists of error types, and kept a list of the errors they mentioned as common. Errors in articles, prepositions, and verb tense are mentioned most consistently. More stress was on incorrect articles (using 'a/an' in place of 'the' or vis-versa) than on missing articles, although missing articles were common as well. No differentiation was made between different kinds of preposition errors. Another common error was in "verb form," which usually meant that an infinitive was used in place of a gerund or a gerund in place of an infinitive (e.g. "I want succeeding in school" instead of "I want to succeed in school").

I thought that it was interesting comparing the CS articles to the linguistics/education articles, and am thinking we will need to use some compromise of both approaches for our MTurk study. The CS articles tend to focus on fewer and broader error types, while the websites aimed at education focus on very specific errors. In a highly unscientific study, I turned my lists of errors into tallies, so that each time an error type was mentioned in a paper, it got a point. Then I made graphs of the number of times each type was mentioned out of the total mentions - mostly just because I really like graphs, and also to practice with Google Plots.
What I see as being the most difficult part of translating the many error types listed on the education sites is the inconsistency. These sites aren't attempting to partition all errors, so using these error types directly could lead to a corpus with a few errors tagged as specifically as "effect vs. affect" or "which vs. that" and then a very large "other" category. This type of schema seems very non-ideal from a ML perspective. On the other hand, the insert/delete/substitute error types that have been the focus of many of the CS studies aren't highly descriptive and don't offer much linguistic insight or fine-tuning. I suppose this will be something we will have to wait and hash out after we have run some tests on Mturk.

As mentioned, these are very unscientific. The scale has very little meaning since the CS articles tended to focus on one or two errors per paper whereas the linguistics ones usually gave lists of 8 to 12 errors. In other words, 10% of mentions in CS could be literally one mention while 10% in linguistics is probably around 5 mentions. Like I said, these are just a way of listing the error types and an excuse to make pretty graphs, not meant to be a deep analysis.

*I have a list of citations for the papers I read, but left it on my other computer, so I will post it this evening when I get home.
**I feel like a commercial for Google, using Google blogs, Google plots, Google docs, Google scholar...at least no Android phone...yet...

Sunday, May 27, 2012

May 27th - Joshua

I forgot to update on Joshua progress:

Per Matt's suggestions, I added the HADOOP environment variables and added the hadoop flag when running the pipeline, and the hadoop problem went away. But, as I have come to expect, there is another error now. The pipeline is exiting with the message:

* FATAL: couldn't fine tuning source file 'data/tune/tune.tok.lc.en'

I followed the path, and there is a tune.tok.lc.ur file, but not the English version. I am not sure what would prevent the English file from being produced. Matt - any suggestions? I'll try tinkering more, and maybe running just certain pieces of the pipeline, instead of the whole thing beginning to end. Or possibly my many failed runs have left something in an inconsistent state, so I might try just clearing out the directory and starting from scratch.

May 27 - Video Archiving and Lit Review

I got final cut pro on my Mac and began today by trying to start the video archiving. I made a spreadsheet to try to keep track of which videos have been archived already and which ones are in progress, since its not immediately obvious. Then I spent a little time getting familiar with Luke's python script, had to reinstall ffmbc, and started decompressing and compiling the May 5th seminar. Unfortunately, after running for quite a while, it became obvious that my poor Mac can't handle the 70+ gigs required for two videos in FCP's prores format. So I'll have to wait until I get home and get my external so I can start try again. Sigh.

In the meantime (while waiting for my hard drive to get annexed by prores files) I accomplished some good reading. The couple papers I read were not as focused on the ESL error-types, so they didn't add much to the list I'm trying to compile, but I'm enjoying reading about the more theoretical side of NLP, so I can't complain. This paper compared different parsing methods, and their varying strengths for the ESL domain. They focused on the benefits of the Yamada-Knight model, which doesn't assume any proper structure on the source-language side, only on the target. (I think I will go back and read the original Yamada-Knight paper now, to get a better sense of how it works.)

In terms of error types, since they deal with manipulating parse trees rather than analyzing sentences explicitly, they use a broad three-bucket characterization: insertion, substitution, and reordering. From the standpoint of building a richly annotated ESL corpus, this is probably not a good categorization schema for us to follow. But I have to say that, from a programmatic standpoint, I like that its simple and elegant, and aligns neatly with the layout of the error-generation tool I'd been working with earlier this semester.

Tuesday, May 22, 2012

May 22 - Joshua

I have continued fiddling with Joshua. I still have not gotten a good run of it, and part of the problem is that I haven't been getting very consistent errors. I have gotten the pipeline to run most of the way, and its produced some output (alignments and some of the grammar files) but it has not run all the way through and I haven't gotten any BLEU scores to graph yet. I am now crashing on something hadoop related:

cmd=hadoop/bin/hadoop jar /home/hltcoe/epavlick/joshua/thrax/bin/thrax.jar thrax-hiero.conf thrax > thrax.log 2>&1; rm -f grammar grammar.gz; hadoop/bin/hadoop fs -getmerge thrax/final/ grammar; gzip -9nf grammar

JOB FAILED (return code 1)

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/PlatformName

Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.PlatformName

I'm not sure where to go from here. I'll keep looking.

On a positive note, I was able to get my examples to run by adding a few sym links and commenting out a few lines in the config file that contained "unknown parameters." Yes, it was kind of a hacky solution, but it is at least reassuring to have seen something run through to completion.

I am enjoying fiddling with Joshua even though I've had very little success. (I think I have low standards for enjoyment, but I think that is an asset sometimes.) Unlike a lot of my peers, I did not grow up in the linux terminal, so I am learning a lot just from having to use it more. For example, I now know that echo export VAR=/some/path > .bashrc turns out to be VERY different from echo export VAR=/some/path >> .bashrc. I had to steal a copy Matt's .bashrc file twice before I figured this one out...

Friday, May 18, 2012

May 18 - Lit Review

I have also began my ESL literature review. I've read through a couple of papers associated with Microsoft research. One focused on mass-count noun errors and the other on preposition and determiner errors. I am keeping notes and will post a more thorough writeup once I have done more reading but here are the main points of each paper (focusing especially on the error types, since that is particularly relevant to our Mturk application).

Correcting ESL Errors Using Phrasal SMT Techniques

- MS Word and other error-detection tools are designed for native speakers, so tend to over look errors that are common for ESL speakers but rare for native speakers (such as forgotten determiners: "I am teacher" instead of "I am a teacher").

- A common mistake is the misuse of mass-count/uncountable nouns; the study was restricted to 14 words that were misused often: knowledge, food, homework, fruit, news, color, nutrition, equipment, paper, advice, haste, information, lunch, and tea

- Were able to automatically correct many errors, but depended on artificially generated errors for training

- Generated errors to reflect actual errors by doing the following:

• much -> many: much advice -> many advice

• some -> a/an: some advice -> an advice

• conversions to plurals: much good advice -> many good advices

• deletion of counters: piece(s)/ item(s)/sheet(s) of

• insertion of determiners

- Emphasized need for a large, annotated corpus of pre- and post- examples

Using Contextual Speller Techniques and Language Modeling for ESL Error Correction

- Identified 8 different kinds of errors:

1. Preposition presence and choice: In the other hand, ... (On the other hand ...)

2. Definite and indefinite determiner presence and choice: I am teacher... (am a teacher)

3. Gerund/infinitive confusion: I am interesting in this book. (interested in)

4. Auxiliary verb presence and choice: My teacher does is a good teacher (my teacher is...)

5. Over-regularized verb inflection: I writed a letter (wrote)

6. Adjective/noun confusion: This is a China book (Chinese book)

7. Word order (adjective sequences and nominal compounds): I am a student of university (university student)

8. Noun pluralization: They have many knowledges (much knowledge)

- Noted that preposition errors and determiner errors were the most prevalent (most often, determiners were missing and needed to be inserted, whereas prepositions were present but the wrong preposition was used)

- Emphasized their use of a language model component, so that changes would- intuitively- only be suggested if the proposed change had a higher LM score than the existing sentence

I also began reading this paper about automated correction of standardized essay tests. It gave little information in the way of error types, but is interesting to read (partially just because I like the idea of a machine grading an essay- I always apologized in my head to the poor graders while I was cranking out my generic SAT essays...) The authors discuss a linking parser that focuses on identifying distinct relationship between pairs of words, rather than traditional parsing, and suggest that the model is more flexible for dealing with erroneous grammar and poor word choice, which could confuse traditional parsers. Their automatic grader tended to agree relatively well with human graders (about 66%), but did worse on the tail (the very good or very poor) essays.

May 18 - Joshua

I started working with the Joshua pipeline this afternoon. I am trying to get comfortable running the pipeline from beginning to end and to make a learning curve of performance against amount of training data. I got a chance to run the pipeline in January once or twice, so it is not unfamiliar, and luckily all the initial set up is done and the dependencies are installed. I got the newest version from git. (I'm still not completely comfortable with git and am always a little afraid I will somehow accidentally push some detrimental change to the master and ruin years of other peoples' work. People have assured me repeatedly that this isn't possible, but its a monster under the bed situation. :) )

I was able to build successfully and the unit tests didn't complain, but I ran into some bumps when I tried running the tests. I followed the directions on github for running examples, but I kept getting errors. I tweaked a few things (for example the path to the Z-Mert example files was listed as ZMERT-example/ on git but was in examples/example/ZMERT in my version of Joshua) but still couldn't get it to run all the way through. I decided to ignore the examples for a little, in case they were not current, and try running the pipeline as normal. I tried using the urdu language data, and followed the notes I'd had from January, which said to run:
$JOSHUA/scripts/training/pipeline.pl --corpus input/train --source ur --target en --tune input/dev --test input/test --lm berkelelm
At first things looked good, and it started running. (Then I panicked for a second because I realized I had forgotten to qlogin, and the documents I'd gotten when I got my hltcoe account had said no less than three times to NEVER run run tests on the master node.) But...luckily...it crashed after only a few minutes. So I didn't accidentally take over the master node's resources for too long, but I also don't have results for my urdu test. Tradeoffs.

I checked the mert.log file, which is where the pipeline was logging when it crashed, and it gave me the following complaint:
--- Starting Z-MERT iteration #1 @ Friday May 18 16:28:23 EDT 2012 ---
Running external decoder...
Call to decoder returned 1; was expecting 0.
Z-MERT exiting prematurely (MertCore returned 30)...
So I am not sure what to do next. I was thinking of revisiting the examples in the morning and seeing if I can find the issue from that angle, since both issues were Z-Mert related. I'm not sure if they are actually part of the same problem or not, but it is somewhere to start.

Wednesday, May 16, 2012

May 16

I began playing around with the Chart Tool API. (It is beautifully easy to use...Google knows what they are doing.) I have only worked with some toy data, but beginning tomorrow, I will trying working with Joshua performance data. In order to get familiar with using the Joshua pipeline, I'll try to build a learning curve of Joshua's performance against the size of the training data. More on that tomorrow.

I spent a good chunk of time yesterday and today working with Luke Orland on the video archiving. Since I will be using Final Cut Pro to edit the videos, and the Hopkins media center will be closing tomorrow for the summer, we decided to get my mac set up with the necessary tools so I can work on them from home. This ended up being a lot more involved that I originally thought. Luke has been using a nice little python script to get the videos into a format that will make FCP happy, but my poor mac had none of the development tools in place to allow the script to work (I've never used my mac for my CS work). But it is now equipped with ffmbc and mencoder, as well as an Ubuntu virtual machine and is ready to go. The whole process gave me a good brief intro to Homebrew and using mac as a development environment: good to know in case I decide I need a change of scene from my Ubuntu...but probably not too likely... :)

May 16 - First Post

I will be working this summer in the CLSP at Johns Hopkins with Chris Callison-Burch and Matt Post. My projects are focused generally on ESL errors in machine translation; specifically, I'll be working with Amazon Mechanical Turk to correct English language errors in non-professional translations of sentences from various Indian languages.

By the end of the summer, we hope to use the results to:
- draw conclusions about the ability of turkers to reliably and repeatably correct ESL errors and the benefit of using those corrections in training an MT system
- create a richly annotated corpus of ESL errors using Mechanical Turk
- provide constructive feedback to Mturk translators regarding their most frequent mistakes, and measure the effect of that feedback on translation quality

I worked last semester on a web interface that would allow turkers to correct errors in translations. The finalization of that HIT is currently in progress, so I am beginning with a few smaller tasks:
- Creating a Google Chart Tools graph of Joshua's performance, to serve as a benchmark
- Doing a general literature review of common ESL errors
- Helping to archive videos of CLSP seminars (a good excuse to chain watch seminars that I didn't get to attend in person...)

I will keep track of my daily progress here.