Is it possible to use BERT word embeddings along with this NMT implementation? The goal is to use a pre-trained BERT language model so the contextualized embedding could be leveraged. I am wondering whether anyone implemented or was able to run this model with any other contextualized embedding like ELMO or BERT.
Is it possible to use BERT word embeddings along with this NMT implementation?
The goal is to use a pre-trained BERT language model so the contextualized embedding could be leveraged.
I am wondering whether anyone implemented or was able to run this model with any other contextualized embedding like ELMO or BERT.