ACL-IJCNLP 2015
TechTalks from event: ACL-IJCNLP 2015
session 2A Machine Translation
-
syntax-based simultaneous translation through prediction of unseen syntactic constituentsSimultaneous translation is a method to reduce the latency of communication through machine translation (MT) by dividing the input into short segments before performing translation. However, short segments pose problems for syntax-based translation methods, as it is difficult to generate accurate parse trees for sub-sentential segments. In this paper, we perform the first experiments applying syntax-based SMT to simultaneous translation, and propose two methods to prevent degradations in accuracy: a method to predict unseen syntactic constituents that help form a complete parse tree, and a method that waits for more input when the current utterance is not enough to generate a fluent translation. Experiments on English-Japanese translation show that the proposed methods allow for improvements in accuracy, particularly with regards to word order of the target sentences.
-
efficient top-down btg parsing for machine translation preorderingWe present an efficient incremental top-down parsing method for preordering based on Bracketing Transduction Grammar (BTG). The BTG-based preordering framework (Neubig et al., 2012) can be applied to any language using only parallel text, but has the problem of computational efficiency. Our top-down parsing algorithm allows us to use the early update technique easily for the latent variable structured Perceptron algorithm with beam search, and solves the problem.Experimental results showed that the top-down method is more than 10 times faster than a method using the CYK algorithm. A phrase-based machine translation system with the top-down method had statistically significantly higher BLEU scores for 7 language pairs without relying on supervised syntactic parsers, compared to baseline systems using existing preordering methods.
-
online multitask learning for machine translation quality estimationWe present a method for predicting machine translation output quality geared to the needs of computer-assisted translation. These include the capability to: i) continuously learn and self-adapt to a stream of data coming from multiple translation jobs, ii) react to data diversity by exploiting human feedback, and iii) leverage data similarity by learning and transferring knowledge across domains. To achieve these goals, we combine two supervised machine learning paradigms, online and multitask learning, adapting and unifying them in a single framework. We show the effectiveness of our approach in a regression task (HTER prediction), in which online multitask learning outperforms the competitive online single-task and pooling methods used for comparison. This indicates the feasibility of integrating in a CAT tool a single QE component capable to simultaneously serve (and continuously learn from) multiple translation jobs involving different domains and users.
-
a context-aware topic model for statistical machine translationLexical selection is crucial for statistical machine translation. Previous studies separately exploit sentence-level contexts and documentlevel topics for lexical selection, neglecting their correlations. In this paper, we propose a context-aware topic model for lexical selection, which not only models local contexts and global topics but also captures their correlations. The model uses target-side translations as hidden variables to connect document topics and source-side local contextual words. In order to learn hidden variables and distributions from data, we introduce a Gibbs sampling algorithm for statistical estimation and inference. A new translation probability based on distributions learned by the model is integrated into a translation system for lexical selection. Experiment results on NIST Chinese-English test sets demonstrate that 1) our model significantly outperforms previous lexical selection methods and 2) modeling correlations between local words and global topics can further improve translation quality.
- All Sessions
- SessionName
- tutorials T1
- tutorials T5
- tutorials T2
- tutorials T6
- tutorials T3
- tutorials T7
- tutorials T4
- tutorials T8
- session 1B Language and Vision/NLP Applications
- session 2A Machine Translation
- session 3A Language Resources
- session 4A Semantics
- session 2B Question Answering
- session 3B Sentiment Analysis: Cross-/Multi Lingual
- session 4B Sentiment Analysis
- session 1C Semantics: Embeddings
- session 2C Semantics: Distributional Approaches
- session 3C Natural Language Generation
- session 4C Summarization and Generation
- session 1D Machine Learning
- session 2D Parsing: Neural Networks
- session 3D Spoken Language Processing and Understanding
- session 4D Discourse, Coreference
- session 1E Information Extraction 1
- session 2E Information Extraction 2
- session 3E Information Extraction 3/Information Retrieval
- session 4E Language and Vision
- session 1 Machine Translation: Neural Networks
- president talk
- session 5B Machine Learning and Topic Modeling
- session 6A Discourse, Pragmatics
- session 7A Discourse, Coreference
- student research workshop
- session 5C Semantics, Linguistic and Psycholinguistic Aspects of CL
- session 6C Semantics: Semantic Parsing
- session 7C Semantics: Semantic Parsing
- session 6B Machine Learning: Embeddings
- session 7B Topic Modeling
- session 7D Lexical Semantics
- session 6D Sentiment Analysis: Learning
- session 5D Parsing, Tagging
- session 5E Information Extraction
- session 6E Grammar Induction and Annotation
- session 7E Parsing
- invited talk
- session 5A Machine Translation
- session 8B Automatic Summarization
- session 9B Word Segmentation
- session 8C Linguistic and Psycholinguistic Aspects of NLP
- session 9C Morphology, Phonology
- session 8D NLP for the Web: Social Media
- session 9D NLP for the Web: Twitter
- session 8E Text Categorization/Information Retrieval
- session 9E POS Tagging
- session 8A Machine Learning: Neural Networks
- session 9A Multilinguality
- session BP Best Paper Session