Inside Writer

– 4 min read

Meet Palmyra-Vision, our multimodal LLM with vision capabilities

Writer Team

The Writer Team

At Writer, our focus is to deliver enterprise generative AI that makes it easier for people to produce high-quality written output. But our customers live in a multimedia world and handle mediums beyond text in their daily workflows. Today we’re thrilled to introduce Palmyra-Vision, our multimodal LLM for visual and language understanding, which can analyze and generate text based on images.

Summarized by Writer

  • Palmyra-Vision is a multimodal large language model (LLM) with vision capabilities developed by Writer that can analyze and generate text based on images.
  • It excels in tasks such as extracting handwritten text, classifying objects, analyzing graphs and charts, and answering specific questions based on visual inputs. Palmyra-Vision achieved a score of 84.4% on VQAv2 benchmark, outperforming other prominent multimodal models.
  • Palmyra-Vision offers a range of practical applications in the enterprise, including product description generation, interpreting charts and graphs, compliance detection, improving accessibility by creating ALT descriptions, and text extraction from handwritten reports.
Summarized by Writer

Achieve state-of-the-art accuracy

Palmyra-Vision excels at a range of tasks, including extracting handwritten text, classifying objects or color, and describing charts, graphs, infographics, and flowcharts. Not only can it understand visuals, it can also answer specific questions, analyze graphs, and generate new content based on your images.

We benchmarked Palmyra-Vision against VQAv2, a dataset of open-ended questions on over 265,000 images that requires an understanding of vision, language, and common-sense knowledge. Palmyra-Vision achieves a score of 84.4%, outperforming both GPT-4V and Gemini 1.0 Ultra.

High-impact enterprise use cases

Palmyra-Vision’s accuracy and capabilities enable a breadth of high-impact use cases in the enterprise. Here are just a few examples:

  • Compliance teams must ensure promotional materials are compliant. For example, pharmaceutical companies can use Palmya-Vision to check if advertisements for their medications meet medical, legal, and regulatory standards.
  • Retail companies need to generate thousands of product description pages to power their e-commerce business. Operations teams can shorten time to market and increase conversions by using Palmyra-Vision to quickly draft high-quality product descriptions.
  • For those that regularly work with charts and graphs, Palmyra-Vision gives a boost in productivity by helping you quickly interpret those images and summarize key takeaways. For example, financial advisors can create summaries of each client’s portfolio allocation and performance in a snap.
  • Customer experience teams can use Palmyra-Vision to quickly draft ALT descriptions to improve accessibility and enhance SEO performance.
  • Companies that need to digitize handwritten reports, such as insurance companies processing written reports for claims or healthcare companies processing doctor’s notes for medical reports, can use Palmyra-Vision for text extraction, even if handwriting quality is low.
  • Review advertisement images for compliance
  • Generate product descriptions in a snap using Image analyzer
  • Digitize handwritten police reports for insurance claims using Image analyzer
  • Summarize charts, graphs, and infographics for reports with Image analyzer

You can access vision capabilities with our image analyzer app, which is now available in the Writer library of prebuilt apps. We can also build custom apps with Palmyra-Vision to fit specific use cases, taking into account bespoke input requirements and structured output standards.

The latest innovation to our LLMs

Palmyra, the Writer family of LLMs, is purpose-built for the enterprise to give you accuracy and control, without high costs. Palmyra-Vision is just the latest in a series of model innovations. Recently, we announced that Palmyra supports multilingual capabilities in over 30 languages, including Spanish, French, Chinese, Hindi, Arabic, and Russian. We’re also proud that Palmyra LLMs achieved outstanding results in the latest Stanford HELM evaluation, earning top rankings among production-ready models on key benchmarks and outperforming models by OpenAI, Google, and Anthropic.

These new advancements in model technology, combined with our graph-based RAG, powerful AI guardrails, and the ability to build custom apps for any business process makes Writer the preferred full-stack generative AI platform for enterprises.

To learn more about our vision capabilities and Palmyra-Vision, take a product tour or schedule a call with our sales team today.