Development of the Estonian-Latvian Neural machine translation toolkit
Aivars.Berzins@tilde.com
The objective of the project was to develop artificial intelligence (AI) based Estonian-Latvian machine translation toolkit EstLat Translator by a combination of Tilde’s (LV) state-of-the-art machine translation technologies and Avatar’s (EE) extensive data pool and translation expertise.
The new toolkit addresses the market need for an efficient cross-border communication solution and creates new cross-border business opportunities. Both public and private entities benefits from the Estonian-Latvian-Estonian machine translation service. It provides both project partners with a unique product to expand their businesses and seize new market opportunities.
Main activities were:
- Analysis and specification that resulted in user requirement analysis, definition and prioritization of user stories, functional design and technical specifications.
- Industry research for data modelling – during this activity, the partners worked on the required data development activities. Significant resources were invested in data criteria modeling for data categories to be used as a part of complex systems, identification of suitable data sources for parallel and monolingual corpora, development of automatic and unsupervised data harvesting and preprocessing workflows, human evaluation and quality assurance, and processing of the data for development of neural machine translation engines.
- Neural Machine Translation (NMT) Training: this activity allowed to develop the Estonian-Latvian and Latvian-Estonian neural machine translation systems. The focus and the main objective of this activity was to develop machine translation systems by relying on linguistic resources collected and processed during the data acquisition activity. To address data sparsity, novel methods for handling rare and unknown phenomena (e.g. named entities and terminology) were studied and applied to hybridize the neural machine translation models. Furthermore, the machine translation models were deployed to enable dynamic (on-the-fly) adaptation of the models allowing to use domain-specific resources such as bilingual terminology and named entity dictionaries. Language-specific natural language processing methods were applied to improve and fine-tune the neural machine translation models (e.g. by using linguistic input features, improving rare named entities or term detection and processing quality, etc.).
- Development and deployment of the EstLat Translator. The main focus of this activity was to develop and deploy the EstLat Translator and all related services – a one-stop productivity tool for companies and the public sector with any functionality that may be needed for effective automated translation (between Estonian and Latvian) and computer-assisted translation via a management system for post-editing and direct translation orders.
- Marketing and sales planning. This activity allowed to prepare a value proposition for the EstLat Translator according to a set of functions and usage scenarios of each function. Identification of the target groups, development of the marketing strategy and channels for reaching out to these groups were an integral part of all activities. For the marketing purposes, the partners employed various methodologies and tools: digital brochures and presentations, a common visual identity, digital marketing campaigns (Google Ads, LinkedIn, Facebook and others), e-mail marketing, press releases and the official launching event with prominent speakers like Ieva Ilves (Advisor on Digital Policy to the President of Latvia) and others.
The outcome of the project is the EstLat Translator toolkit. The new toolkit addresses the market need for an efficient cross-border communication solution, and will create new cross-border business opportunities.
Main achievements of the project:
- By combining the effort, experience and skills, successful and fruitful cooperation between Avatar and Tilde to develop a new service to offer more added value to the current customers and extend the services provided by both companies in Estonia and Latvia;
- Joint development of a public AI-based Estonian-Latvian machine translation toolkit. The toolkit gives access to simple text translation and entire document translation that preserves the original formatting, and any user has a possibility to request professional translation services. This also includes a sophisticated platform for professional users where it is possible to process simple translations, use a professional online translation environment with integrated CAT (Computer-Assisted Translation) tools, build and manage translation memories and manage translation orders.
- Successful marketing campaigns to reach out to a large audience and new customers in both partner countries.