AT&T Labs Fellowship Award Winner
Each year, AT&T Research through the AT&T Labs Fellowship Program (ALFP) offers three-year fellowships to outstanding under-represented minority and women students pursuing PhD studies in computing and communications-related fields. This year three students received fellowships, including Katie Kuksenok.
An interactive translation system
Katie Kuksenok lives at just the right time for someone with a love for language and literature and a double-major in computer science and math.
Machine translation, an obvious choice for someone with her combination of interests, is a bustling research area making rapid forward advancement. Ten and even five years ago, machine translation was seen as a nearly insurmountable problem. Computational linguists were still grappling with the technical problem of how best to model human language (a prerequisite for machine translation), coming at the problem from two ways: from a linguistic-centered approach that attempted to encode the rules of grammar and syntax, and a computer-science, strictly data-driven approach that relied on statistical methods to detect patterns from real-world data (see sidebar for an overview).
But in just the past few years, progress has been such that machines are now actually pretty good at capturing the literal meanings of words. Type almost any sentence into Google Translate and instantly get a serviceable translation. The wording may be a little off (especially when the two languages are dissimilar), but the meaning is usually clear.
. . . everything that makes writing and language compelling and readable is still out of reach for machine translation.
It’s data that’s made the difference. The Internet has unleashed a tremendous amount of data in the form of high-quality, parallel translations scraped from websites or crowdsourced from Mechanical Turk and other forums. Feeding this data into the statistical models has enabled machine translation systems to achieve a relatively high level of literal accuracy.
The rapid advance in statistical modeling data-driven methods has eclipsed the purely linguistic, rule-based methods. But it’s also becoming increasingly apparent that statistical models can only get so far.
Go back to Google Translate or a similar site and type in a paragraph from a novel, a few lines of poetry or lyrics from a song, and you instantly run into the limits of literal accuracy. Fluency, tone, consistency of voice, metaphor, cultural references, humor, word-play—everything that makes writing and language compelling and readable is still out of reach for machine translation.
The big problem for machines is that language has significance far beyond the literal meanings of the words alone, and these implicit secondary meanings are incredibly entrenched in human culture.
Take idioms for example. They often say one thing but mean another. “Break a leg” conveys exactly the opposite sentiment than that expressed by the literal meaning.
In ignoring context and the whole hidden language of non-literal communication, machine translations often lead to serious communication gaps . . .
Or take the phrase “shock and awe.” In the US, it can’t be used without conjuring up a specific military strategy from a specific war. The phrase still retains its literal meaning but is inextricably linked to an unnamed event that is nonetheless well understood, at least to those in the US. Phrases such as “shock and awe” serve as a sort of touchstone or shortcut that immediately conveys a lot of information using few words. Using such phrases in another context is almost impossible without implying a certain irony (“my presentation before the board will begin with a 'shock and awe' of sales and revenue projections”).
But irony doesn’t usually come across in a literal translation. Nor does metaphor, innuendo, ambiguity.
Context also determines meaning. In a fairy tale, “His mother is a dog” is an innocent statement of a fantasy world; in adult dialog, provocative hostility. A human instantly interprets the correct meaning from the context. Machine translation cannot because the models are built to look at a single sentence, ignoring the entire discourse structure that gives context to a sentence and influences its interpretation. The lack of context also makes it impossible to maintain themes, consistent terminology, or a uniform tone or voice from sentence to sentence.
(Machine translation does consider context, but only within the sentence; machine translation will thus correctly identify “honey” in “Honey, can you pass the mustard,” as a term of endearment, not the product of bees.)
In ignoring context and the whole hidden language of non-literal communication, machine translations often lead to serious communication gaps. Closing these gaps will require models that consider the problems of language, specifically how language is used in implicit and explicit ways to communicate meaning. For Katie, it means she gets to fully engage her two main interests: language and math. (“It’s gorgeous” she says, that she can do both.)
The quest for new translation data is the starting point for Katie’s summer project at AT&T Research . . .
Incorporating real-world knowledge and context sensitivity into the models will require great leaps in current technology, and a complete re-examination of the models and their limitations. Has the data-driven method reached an impasse? Should researchers now concentrate on improving rule-based methods? Or is it that the statistical models need new and different types of data? If so, what should this data be, and how should it be fed back into the models?
It’s a huge problem (though she says “huge” in the same way she says “gorgeous,” it’s part of the opportunity).
The quest for new translation data is the starting point for Katie’s summer project at AT&T Research, an interactive translation system that is being designed specifically to collect knowledge data for speech models.
The idea is to rapidly machine-translate a rough draft that humans will then edit, using their real-world knowledge and contextual awareness to decide on an appropriate translation. Katie’s system will capture these edits, which represent exactly the kind of data needed to improve the models.
The system, which she’s named ChoiceWords, has two inseparable components. The backend generates initial translations using an engine created at AT&T Research by her mentor Srini Bangalore. The back-end component also manages and stores the edits, and extracts and evaluates possible edits given a translation. The front end, which Katie is designing, not only enables the capture of edits but supports a whole infrastructure that makes it easy for all users to decide on the best possible translation edits to make.
One such capability is to make it possible for anyone to translate, even without knowing the source language.
The project rides the wave of the current trend of combining machine and human translation, both to create tools that speed the work of professional translators, and to provide machine translation researchers with data to feed back into the models. Katie wants to extend this paradigm to both capture more data and to capture finer-grained, richer data. Currently researchers isolate the translator-added information by comparing the starting point (the first translation) with the end point (the corrected draft done by a human translator).
ChoiceWords will instead capture individual edits. Every single edit—each word substitution, deletion or insertion, each punctuation change, or each time a word or phrase is moved within a sentence—will be logged in the order made and by which user (users log in to access the system). Katie wants to see if this richer data will improve the models more than current methods.
At the same time she wants to vastly increase the amount of knowledge data collected. Because models learn from recurring patterns, it’s necessary to have an extremely large corpus of real-world sentences. (See sidebar.)
To get data at the scale needed, she needs to draw as many people to the system as possible. To do so, she’s aiming to build a better translation system that will provide capabilities not currently available in other systems.
One such capability is to make it possible for anyone to translate, even without knowing the source language. Translation until now has been the almost exclusive domain of those fluent in the source and target languages; this is a relatively small group of people (for less-spoken languages such as Icelandic or Basque, the number of translators dwindles precipitously). If the essential meaning contained in the source language is there, as provided by the machine translation, people can–as studies have shown–figure out the rest of it, especially if they have a little help in the form of suggestions.
ChoiceWords will provide that help in a number of ways. One way, besides always displaying the original sentence for users, will be offering alternative translations for each phrase. Translation engines consider a whole range of possible translations, ranking them by probability based on other words in the sentence. Normally the alternative translations are kept hidden, but ChoiceWords will bring this information forward, by listing other possible translations on a dropdown menu for each phrase, making it easy for users to spot a more appropriate phrase, and perhaps also gain insight into possible secondary meanings.
How ChoiceWords works: A machine translation opens in a text editor. Highlighting a sentence shows the original (nontranslated) sentence, and each phrase can be selected for a dropdown of alternate translations. Here, the Spanish word “clinica” might be better translated as “clinic,”, “a clinic,”, “the clinic,” or “hospital.”
These edit suggestions, which may be provided by the system or from other sources, may help to enable even users with limited knowledge of the source language to contribute their world knowledge to the editing process.
Another way to draw more users is to ensure the final translation is the best that it can be. For a task as difficult as translation—requiring as it does language and writing skills, knowledge of the culture and history of two languages, and often specialized or technical vocabulary—it only makes sense to enlist the help of others. Crowdsourcing translation sites already enable people to work together to improve a translation.
Moving forward, ChoiceWords will do the same, and by incorporating collaboration, ChoiceWords will not only give users the benefit of crowdsourcing, but will collect more data for the models, especially data contextualized in the relevant discussion and opinions. Users often disagree on translations, and there is tremendous knowledge embedded in their arguments, knowledge not captured in a single-person translation or in a wholesale comparison between a before and after translation. One naïve way of using this kind of data, for example, would be to flag ambiguous language or text that has been historically controversial. As ChoiceWords becomes more social and collaborative, Katie may add in annotation, threaded discussions, and other functionality to expose more information to users, helping them not only to make better translation decisions, but contribute information.
Information contributed by humans can be incorporated into translation systems to help users who don't know the source language.
In the case of “break a leg” being translated into Russian, if a few users changed the literal translation to be “neither fluff nor feathers” ("Ни пуха ни пера!"), the system could offer an accompanying discussion that would help non-English speaking users to use the phrase effectively. After a while there would be enough data that training a new statistical machine translation model would make "break a leg" one possible translation for "Ни пуха ни пера." Thus even those who don’t know the source language help contribute data that can be incorporated into the system. It will only get better after time.
The system can incorporate inserted knowledge in other ways. For phrases for which there is a lack of clear-cut agreement on an appropriate translation, the system might, instead of simply inserting a phrase, require the user to choose between two or three options, perhaps accompanied with statistics showing the breakdown of how others translated the phrase. And by flagging controversial translations, perhaps with annotations, the system will enable users to make a more informed decision if they were previously unaware of the associated negative or positive connotations.
It’s in building a collaborative platform that her mentor, Srini Bangalore, sees one of the more difficult aspects. How do you make a large collaborative editor work? How do you handle disagreements among translators, many of whom can be passionate about word choice, syntax, or grammar? How do you manage all the edits flowing in, or weed out the ones that aren’t relevant?
For the moment, Katie is focused on making the interface effectively enable the underlying functionality, which she considers a big challenge. The interface not only has to be easy to use and provide users with a clear benefit (in this case, a good translation), it has to be designed to capture the types of edits that will best improve the models.
What makes something easy for the user might interfere with the type of data collected.
The problem for her is making one interface serve two purposes. This is a lot to ask of an interface, and she should know. Her specific computer science focus is on the intersection of human computer interaction and machine learning systems. She’s more aware than most of the importance for streamlined and intuitive interfaces in the face of difficult design challenges. And in ChoiceWords there are a few.
Making the right tradeoffs among competing different is one. What makes something easy for the user might interfere with the type of data collected. Enabling and logging small-scale edits, like deleting, inserting, moving, and replacing individual words, is necessary to collect rich data, but the need to support it requires sprinkling a great many controls into the interface; this makes the interface something that a user must get used to, rather than grasping intuitively, as would be the case with the classic approach of retyping a sentence into a simple textbox. Additionally, she might identify ways in which data previously collected can be useful in feeding back through the system as additional suggestions or flags at the interface level, adding even more complexity. There are many actions that can be enabled through such an interface, and information to make visible; the key is to judiciously balance the benefits and trade-offs of ecologies of possible elements.
Nevertheless, rich data remains a key goal, even if it means a more cluttered interface. And if Katie’s system can deliver on the promise of a better translation experience, users may find that getting used to another application is a small price to pay for a translation of a quality they couldn’t get (or pay for) otherwise.
She will soon begin to find out. This summer she completed a single-user version interface that will be user-tested starting this fall. Test participants will be asked to translate a series of recipes written in Russian (a language they do not know) using ChoiceWords. Their comprehension of what they translated will be evaluated by how well they answer questions she poses to them (for example, which recipe is not appropriate to serve to a vegetarian) and by how confident they are in their answers, or how easily they can be argued out of them. If the tests prove the concept works—that the test subjects successfully translated a text from a language they do not speak—she will continue to the next step, making ChoiceWords a platform for collaborative translation.
In the meantime she will be pursuing PhD studies with a heavy emphasis on human-computer interaction (HCI), particularly interactive natural language processing (NLP) systems. Moving forward in her graduate-level studies, she will study the technology behind speech models and algorithms in order to understand how ChoiceWords data can be incorporated into the models, and how best to improve the models so they consider context and real-world knowledge. As an HCI researcher, she will continue to put new technologies to the test by building applications that enable real users to engage with intelligent systems in real contexts.
A little background on statistical machine translation
Some translations are statistically more likely than others.
Machines learn which ones by exploiting a corpus of parallel sentences that have already been translated between two languages. A corpus may have millions of sentences, and by analyzing these millions of sentences, the machine learns how a certain phrase is usually translated, that "casa blanca" usually translates to "white house,” for example.
Sometimes a better translation might actually be “a cream-colored house” or “home of white” or “White House”, and these alternatives are stored as well, creating a type of phrase dictionary that associates a phrase in one language with possible phrases in another language.
When the machine translates a phrase, it does a lookup in its phrase dictionary to know what phrase to insert. Normally the most common translation is inserted, unless the context of the sentence demands a different one.
Sometimes the phrase selected by the machine is not the correct one.
By making all the phrases easily visible to the user, Katie’s system makes it easy for anyone, whether they know the source language or not, to make an appropriate selection.
If not computer science or machine learning, what else? Anthropology.
Heroes from history.
Academia, the business world, or somewhere else?
Academia, but not necessarily.
What motivates you?
What single course helped decide your future study?
Winter Term at Oberlin, when I went to CMU HCII, and discovered HCI as a field.
Most fun course?
Course you most regret not taking?
More studio art classes
One where I can have creative freedom, the capacity to mentor and
teach others, and continue to be surrounded by brilliant people.