A Novel Chinese-English Neural Machine Translation Model Based on BERT
Keywords:
Transformer, Chinese-English Neural Machine Translation, BERT, Multi-granularity word segmentation technologyAbstract
In recent years, neural machine translation has rapidly developed and replaced traditional machine translation, becoming the mainstream paradigm in the field of machine translation. Machine translation can reduce translation costs and improve translation efficiency, bring good news to cultural exchanges and international cooperation, and help national development. However, neural machine translation is highly dependent on large-scale high-quality parallel corpus, and there are problems such as uneven quality and sparse data, so it is imperative to study and explore neural machine translation. The purpose of this paper is to construct pseudo-parallel corpus using data enhancement technology, improve the diversity of Chinese and English materials, and then optimize the translation model to improve the translation effect of the model. Based on BERT pre-training technology, this paper first analyzes the limitations of the traditional Transformer model, and then puts forward two directions for model optimization. On the one hand, in the data preprocessing stage, multi-granularity word segmentation technology is used for word segmentation to help Chinese-English neural machine translation model better understand the text. On the other hand, in the pre-training stage, this paper adopts the strategy of deep integration of BERT dynamic word embedding and original word embedding. On the basis of the original Transformer, a fusion module is added, through which the original word embeddings and BERT dynamic word embeddings are simple linear splicing, and then fed into the encoder. The attention mechanism is used for deep integration and better word vector representation, enabling the Transformer model to take full advantage of the external semantic information introduced by BERT. Finally, the feasibility and effectiveness of the Transformer architecture adopted in this paper are verified by the comparison experiment between RNN and Transformer model. Through the ablation experiments of different word vector representation and different stages using BERT pre-training technology, the effectiveness of BERT dynamic word embedding and deep fusion of word embedding and the rationality of using pre-training technology only in the encoder stage are verified.