One of the most common questions I get from colleagues and clients is: “Should we use ChatGPT/Claude for translation, or stick with dedicated MT engines?”
The answer, as with most things in technology, is: it depends.
The Case for Dedicated NMT Engines
Neural Machine Translation engines like Google Translate, DeepL, or custom-trained models have several advantages:
Speed: NMT engines are optimized for translation. They process thousands of words per second, making them ideal for high-volume workflows.
Cost: Per-word costs are typically much lower than LLM API calls, especially at scale.
Consistency: Given the same input, NMT engines produce consistent output. This predictability is valuable in production environments.
Language Coverage: Dedicated MT systems often support more language pairs, including lower-resource languages.
The Case for LLMs
Large Language Models bring different strengths to the table:
Context Understanding: LLMs excel at understanding nuance, tone, and context. They can maintain consistency across longer documents.
Flexibility: You can guide LLMs with specific instructions—“translate formally,” “maintain the playful tone,” “use Latin American Spanish.”
Multi-task Capability: Beyond translation, LLMs can simultaneously handle terminology extraction, quality assessment, or content summarization.
Domain Adaptation: With well-crafted prompts, LLMs can adapt to specialized domains without requiring training data.
My Decision Framework
Here’s how I typically approach the choice:
| Factor | Use NMT | Use LLM |
|---|---|---|
| Volume | High (>100k words/day) | Low to medium |
| Budget | Tight | Flexible |
| Content type | Repetitive, technical | Creative, nuanced |
| Turnaround | Real-time needed | Batch processing OK |
| Customization | Have training data | No training data |
The Hybrid Approach
In practice, I often recommend hybrid workflows:
- NMT for first pass on high-volume content
- LLM for quality estimation to flag problematic segments
- LLM for post-editing assistance on flagged content
- Human review for final quality assurance
This approach balances cost, speed, and quality effectively.
Real-World Example
Recently, I worked on a project involving technical documentation across 12 languages. We used:
- DeepL for the initial translation (fast, cost-effective)
- Claude for QE scoring with custom criteria
- Human reviewers for segments scoring below our threshold
The result? 40% reduction in review time while maintaining quality standards.
Looking Forward
The line between NMT and LLM translation is blurring. Google’s latest models incorporate LLM capabilities, and purpose-built translation LLMs are emerging.
My advice: don’t commit to one approach. Build flexible workflows that can leverage the best of both worlds as the technology evolves.
Working on a translation workflow and not sure which approach fits? Let’s discuss—I enjoy these architecture conversations.