Innovation

– 6 min read

Introducing Palmyra X5

The 1M-token enterprise LLM, now on AWS Bedrock

Writer Team | April 28, 2025

Introducing Palmyra X5: The 1 M-token LLM built for enterprise-scale agents

Generative AI is past the “interesting demo” phase: enterprise teams now need models that can ingest large amounts of domain data, invoke dozens of tools and agents, and still deliver minimal latency. Palmyra X5 is the engine for enterprise agents. Its one-million-token window, adaptive reasoning, and industry-leading cost efficiency makes it the perfect fit for building scalable agents.

With our recent announcement of AI HQ — an end-to-end platform or building, activating, and supervising agents — enterprises now have the ability to move from experimental AI pilots to reliable, auditable agents that transform your business.

Palmyra X5 is available on WRITER and Amazon Bedrock starting today.

Meet your enterprise LLM

Our model strategy is straight forward: we ship precise, fully owned models that never undergo post-training quantization or distillation, so the behavior you validate today is the behavior you’ll see tomorrow. Palmyra X5 builds on that promise with strict backward compatibility to spare teams the pain of re-tuning agents, a published enterprise roadmap that customers can influence, and latency-optimized inference that makes LLM interactions and retrieval-augmented generation (RAG) feel instantaneous even at million-token scale.

Palmyra X5 stats. 300ms time-to complete multi-turn function call. 22s to process 1 million tokens

Palmyra X5 can process a full million-token prompt in ~22 seconds and fire off multi-turn function-calls in ~300 milliseconds, while costing 3–4× less per token than GPT-4.1. Those two numbers—speed and price—unlock agent behaviors that were previously cost- or time-prohibitive.

On OpenAI’s MRCR 8-needle test—a long-context benchmark that hides eight identical requests in a massive conversation and challenges the model to find the correct one—Palmyra X5 scores 19.1%, compared to 20.25% for GPT-4.1 and 17.63% for GPT-4o. With near top-tier retrieval performance at a dramatically lower cost, Palmyra X5 gives enterprises the ideal balance for scaling production agents without breaking budgets.

Palmyra X5 is also one of the top ranked models on the BigCodeBench (Full, Instruct) evaluation with a score of 48.7, showcasing its ability tackle practical and challenging programming tasks.

When used with our end-to-end platform for building, activating and supervise enterprise agents, enterprises on WRITER can power:

Real-time orchestration – Sub-second round-trips keep multi-tool agents interactive for end users.
Full execution context – A single call can hold every prompt, retrieved doc, and JSON tool response, eliminating brittle chunking logic.
Economics of parallelism – At $0.60 per 1M input tokens, teams can fan-out multiple specialized agents for less than one GPT-4.1 call.

Put simply, X5 brings the low latency of a chat model to workflows that involve thousands of pages, dozens of tool calls, and complex memory.

Enterprise use-case highlights

For enterprise engineers building AI agents in Agent Builder (Beta), X5 unlocks a variety of long-context use cases like:

Revenue & reporting: Ingest full RFPs, pull from Salesforce, and draft first responses automatically. Or generate fund reports by joining third-party market data with internal research—all in a single agent flow.
Support & knowledge management: Classify tickets, stage CMS updates, and publish content with review workflows baked in. Agents also keep knowledge bases fresh by flagging outdated content and suggesting revisions.
Regulatory & compliance intelligence: Analyze lengthy contracts, 10-Ks, or EHRs in one pass. Extract key clauses, identify risks, and summarize with citations—ideal for finance, healthcare, and legal teams.
Customer & research insights: Summarize thousands of survey responses or research papers. Surface themes, extract insights, and accelerate product or R&D decisions with minimal human input.

Our Agent Library also includes a growing number of X5-powered pre-built agents — large file summary, regulatory document analysis, healthcare thought leadership deliverables, medical record summary, and more. Each agent inherits Palmyra X5’s 1M-token context and sub-second function-calling, so they can digest hundreds of pages and take advantage of X5’s multi-modal inputs to deliver results without any building required.

Under the hood: Hybrid attention that scales

Palmyra X5 introduces a hybrid attention mechanism that blends linear and softmax attention to handle multi-million-token inputs with enterprise-grade efficiency.

Optimized attention

X5 blends traditional Softmax attention with a more efficient linear mechanism. This lets it handle sequences up to a million tokens without the memory and speed penalties of standard attention. The result: real-time performance even on massive inputs, with no tradeoff in accuracy.

Mixture of Experts (MoE)

Instead of activating the entire model for every task, X5 uses a dynamic routing system to engage only the most relevant expert subnetworks. That means more scale, lower latency, and reduced compute cost—ideal for multi-step agent workflows.

Together, these upgrades make Palmyra X5 not just able to handle larger context windows, but smarter and faster—engineered to meet the needs of high-volume, tool-heavy enterprise agents.

Benchmarking X5: Enterprise-ready performance

Palmyra X5 demonstrates robust performance across a suite of industry-standard benchmarks, showcasing its capabilities in reasoning, retrieval, and domain-specific tasks.

Benchmark highlights:

BBH (Big-Bench Hard): Evaluates complex reasoning and compositional logic. Palmyra X5 achieves a competitive score of 70.99%, aligning closely with top-tier models.
GPQA (Graduate-Level Google-Proof Q&A): Assesses the model’s ability to answer challenging, graduate-level questions in biology, physics, and chemistry that are resistant to simple lookup strategies. X5’s score of 47.20% indicates strong performance in scientific reasoning tasks.
MMLU_PRO: Focuses on professional-level knowledge across various domains such as law, medicine, and finance. Palmyra X5 scores 65.02%, demonstrating its suitability for enterprise applications in regulated sectors.
MATH_HARD: Tests symbolic reasoning and multi-step problem-solving abilities. X5’s score of 71.57% showcases its proficiency in handling complex analytical tasks.

These results, combined with Palmyra X5’s extended context capabilities and cost efficiency, affirm its readiness for deployment in enterprise environments requiring high accuracy and performance.

Availability

Palmyra X5 is available today on both the WRITER platform and Amazon Bedrock, making it easier than ever for enterprises to deploy high-performing, agent-ready AI.

AWS is the first major cloud provider to offer fully managed access to WRITER’s Palmyra models. This unlocks seamless, scalable deployment for enterprises in regulated industries like finance and healthcare.

“Palmyra X5 offers impressive performance over long context inputs and enterprise-grade reliability and speed,” said Atul Deo, Director of Amazon Bedrock at AWS. “Seamless access will help organizations scale agentic AI across massive enterprise data workloads.”

Get started

Spin up Palmyra X5 on Bedrock, or join the Agent Builder beta to watch million-token agents come to life in AI HQ. For developers, Palmyra X5 is also accessible via the Writer API and SDKs, enabling seamless integration into your applications. Whether you’re building with Python, Node.js, or other supported environments, the Writer SDKs provide the tools you need to harness X5’s capabilities. When your agents require a million-token memory and enterprise-grade reliability, X5 is ready.

Innovation

Introducing Palmyra X5

Meet your enterprise LLM

Enterprise use-case highlights

Under the hood: Hybrid attention that scales

Optimized attention

Mixture of Experts (MoE)

Benchmarking X5: Enterprise-ready performance

Availability

Get started

More resources

Research

Introducing self-evolving models

Product updates

WRITER is now available for LangChain: Build powerful AI applications with ease

Product updates

Introducing actions
with Palmyra X4

Innovation

Introducing Palmyra X5

Meet your enterprise LLM

Enterprise use-case highlights

Under the hood: Hybrid attention that scales

Optimized attention

Mixture of Experts (MoE)

Benchmarking X5: Enterprise-ready performance

Availability

Get started

More resources

Research

Introducing self-evolving models

Product updates

WRITER is now available for LangChain: Build powerful AI applications with ease

Product updates

Introducing actions with Palmyra X4

Introducing actions
with Palmyra X4