Introducing multilingual capabilities with Palmyra LLMs

As enterprises look to accelerate growth and diversify revenue streams, expanding their global presence has become increasingly important. So it’s no surprise that multilingual capabilities are some of the most commonly requested features from our customers. Today we’re thrilled to share that Palmyra, the Writer-built family of LLMs, supports highly-accurate text generation and translation in over 30 languages, including Spanish, French, Chinese, Hindi, Arabic, and Russian.

Exceptional performance and accuracy

Palmyra has undergone rigorous benchmarking tests to measure its performance in translations and text generation. In fact, in Stanford HELM’s most recent update, Palmyra scored the highest of all models on WMT 2014 – BLEU-4, a leading benchmark on translation performance, outperforming PaLM by Google, Claude by Anthropic, and GPT-4 by OpenAI.

To further evaluate translation accuracy by language, the Writer team measured Palmyra against BLEU, another commonly used translation benchmark, in each supported language. A score above 60 on BLEU indicates that the quality exceeds that of human translation. We’re excited to share that Palmyra’s score by language ranges between 52.5 and 79.3.

To evaluate the quality of Palmyra’s multilingual text generation capabilities, the Writer team evaluated output by language against the MMLU and MLMM benchmarks. These tests cover 57 tasks, including elementary mathematics, US history, computer science, law, and more. To achieve a high score, models must possess extensive world knowledge and problem-solving skills.

As a comparison, in Stanford HELM’s most recent update, Palmyra scored 70.2 on MMLU in English, taking the top score of all production-ready models evaluated. We’re thrilled to share that Palmyra’s MMLU and MLMM scores for non-English languages range between 63.3 and 77.9.

Multilingual benchmark results for most common languages

Text generation
benchmark (MMLU/MLMM)
Spanish: 72.5
French: 69.1
Chinese (simplified): 71.7
Hindi: 77.9
Arabic: 68.9
Russian: 75.1

Translation benchmark (BLEU)
Spanish: 79.3
French: 63.1
Chinese (simplified): 63.8
Hindi: 68.4
Arabic: 61.2
Russian: 65.2

While these benchmarking results are strong, like with any generative AI output, generated text should only be viewed as an exceptional first draft. We recommend that human experts review all outputs to guarantee accuracy. Detailed benchmarking results by language are available here.

Endless real-world applications

Multilingual capabilities are now available in the chat interface, Ask Writer, on desktop experiences, and in AI apps. Here are just a few ways that these new capabilities can support your entire organization to run faster:

  • Shorten sales cycles by creating personalized outbound emails in your prospect’s language
  • Accelerate time to market by quickly translating product descriptions into multiple languages
  • Improve customer satisfaction by enabling support teams with digital assistants that answer questions in local languages

A powerful family of LLMs

The accuracy of its multilingual capabilities is just one of the reasons Palmyra LLMs stand out in their field. Palmyra is trained on 1 trillion tokens of formal writing and is completely auditable, with the ability to inspect code, data, and model weights. We keep customer data private and do not use or share it for model training. In addition to being top-ranked by Stanford HELM, Palmyra is fine-tuned for specific industries like healthcare.

Enterprises choose the Writer full-stack generative AI platform not just for our powerful models, but also for our graph-based RAG, Knowledge Graph, powerful AI guardrails, and a flexible application layer. Writer makes it easy for enterprises to transform work with generative AI.

To learn more about our multilingual capabilities and the capabilities of Palmyra LLMs, schedule a demo with our sales team today.