January 5, 2025

LLMs vs Neural MT: Choosing the Right Tool

When should you use a dedicated NMT engine versus a large language model for translation? A practical comparison based on real-world experience.

One of the most common questions I get from colleagues and clients is: “Should we use ChatGPT/Claude for translation, or stick with dedicated MT engines?”

The answer, as with most things in technology, is: it depends.

The Case for Dedicated NMT Engines

Neural Machine Translation engines like Google Translate, DeepL, or custom-trained models have several advantages:

Speed: NMT engines are optimized for translation. They process thousands of words per second, making them ideal for high-volume workflows.

Cost: Per-word costs are typically much lower than LLM API calls, especially at scale.

Consistency: Given the same input, NMT engines produce consistent output. This predictability is valuable in production environments.

Language Coverage: Dedicated MT systems often support more language pairs, including lower-resource languages.

The Case for LLMs

Large Language Models bring different strengths to the table:

Context Understanding: LLMs excel at understanding nuance, tone, and context. They can maintain consistency across longer documents.

Flexibility: You can guide LLMs with specific instructions—“translate formally,” “maintain the playful tone,” “use Latin American Spanish.”

Multi-task Capability: Beyond translation, LLMs can simultaneously handle terminology extraction, quality assessment, or content summarization.

Domain Adaptation: With well-crafted prompts, LLMs can adapt to specialized domains without requiring training data.

My Decision Framework

Here’s how I typically approach the choice:

Factor	Use NMT	Use LLM
Volume	High (>100k words/day)	Low to medium
Budget	Tight	Flexible
Content type	Repetitive, technical	Creative, nuanced
Turnaround	Real-time needed	Batch processing OK
Customization	Have training data	No training data

The Hybrid Approach

In practice, I often recommend hybrid workflows:

NMT for first pass on high-volume content
LLM for quality estimation to flag problematic segments
LLM for post-editing assistance on flagged content
Human review for final quality assurance

This approach balances cost, speed, and quality effectively.

Real-World Example

Recently, I worked on a project involving technical documentation across 12 languages. We used:

DeepL for the initial translation (fast, cost-effective)
Claude for QE scoring with custom criteria
Human reviewers for segments scoring below our threshold

The result? 40% reduction in review time while maintaining quality standards.

Looking Forward

The line between NMT and LLM translation is blurring. Google’s latest models incorporate LLM capabilities, and purpose-built translation LLMs are emerging.

My advice: don’t commit to one approach. Build flexible workflows that can leverage the best of both worlds as the technology evolves.

Working on a translation workflow and not sure which approach fits? Let’s discuss—I enjoy these architecture conversations.

LLMNMTmachine translationGPTcomparison