Inside Writer
– 4 min read
Meet Palmyra-Vision, our multimodal LLM with vision capabilities
At Writer, our focus is to deliver enterprise generative AI that makes it easier for people to produce high-quality written output. But our customers live in a multimedia world and handle mediums beyond text in their daily workflows. Today we’re thrilled to introduce Palmyra-Vision, our multimodal LLM for visual and language understanding, which can analyze and generate text based on images.
- Palmyra-Vision is a multimodal large language model (LLM) with vision capabilities developed by Writer that can analyze and generate text based on images.
- It excels in tasks such as extracting handwritten text, classifying objects, analyzing graphs and charts, and answering specific questions based on visual inputs. Palmyra-Vision achieved a score of 84.4% on VQAv2 benchmark, outperforming other prominent multimodal models.
- Palmyra-Vision offers a range of practical applications in the enterprise, including product description generation, interpreting charts and graphs, compliance detection, improving accessibility by creating ALT descriptions, and text extraction from handwritten reports.
Achieve state-of-the-art accuracy
Palmyra-Vision excels at a range of tasks, including extracting handwritten text, classifying objects or color, and describing charts, graphs, infographics, and flowcharts. Not only can it understand visuals, it can also answer specific questions, analyze graphs, and generate new content based on your images.
We benchmarked Palmyra-Vision against VQAv2, a dataset of open-ended questions on over 265,000 images that requires an understanding of vision, language, and common-sense knowledge. Palmyra-Vision achieves a score of 84.4%, outperforming both GPT-4V and Gemini 1.0 Ultra.
High-impact enterprise use cases
Palmyra-Vision’s accuracy and capabilities enable a breadth of high-impact use cases in the enterprise. Here are just a few examples:
- Compliance teams must ensure promotional materials are compliant. For example, pharmaceutical companies can use Palmya-Vision to check if advertisements for their medications meet medical, legal, and regulatory standards.
- Retail companies need to generate thousands of product description pages to power their e-commerce business. Operations teams can shorten time to market and increase conversions by using Palmyra-Vision to quickly draft high-quality product descriptions.
- For those that regularly work with charts and graphs, Palmyra-Vision gives a boost in productivity by helping you quickly interpret those images and summarize key takeaways. For example, financial advisors can create summaries of each client’s portfolio allocation and performance in a snap.
- Customer experience teams can use Palmyra-Vision to quickly draft ALT descriptions to improve accessibility and enhance SEO performance.
- Companies that need to digitize handwritten reports, such as insurance companies processing written reports for claims or healthcare companies processing doctor’s notes for medical reports, can use Palmyra-Vision for text extraction, even if handwriting quality is low.
You can access vision capabilities with our image analyzer app, which is now available in the Writer library of AI apps. We can also build AI apps with Palmyra-Vision to fit specific use cases, taking into account bespoke input requirements and structured output standards.
The latest innovation to our LLMs
Palmyra, the Writer family of LLMs, is purpose-built for the enterprise to give you accuracy and control, without high costs. Palmyra-Vision is just the latest in a series of model innovations. Recently, we announced that Palmyra supports multilingual capabilities in over 30 languages, including Spanish, French, Chinese, Hindi, Arabic, and Russian. We’re also proud that Palmyra LLMs achieved outstanding results in the latest Stanford HELM evaluation, earning top rankings among production-ready models on key benchmarks and outperforming models by OpenAI, Google, and Anthropic.
These new advancements in model technology, combined with our graph-based RAG, powerful AI guardrails, and the ability to build AI apps for any business process makes Writer the preferred full-stack generative AI platform for enterprises.
To learn more about our vision capabilities and Palmyra-Vision, take a product tour or schedule a call with our sales team today.