AI in the enterprise

– 11 min read

Considering DIY generative AI? Be prepared for these hidden costs

Andrew Miller

Considering DIY generative AI? Be prepared for these hidden costs

Just because an enterprise company can develop custom software in-house doesn’t mean they should. There’s a perception that a DIY solution is more cost-effective or can achieve use cases perfectly tailored to the company’s needs. While DIY itself isn’t bad and can be highly beneficial, the challenge lies in the complexity of using a massive, unconnected set of technologies from many different vendors.

Generative AI is particularly complex. There are numerous technology vendors to select from. Stitching these vendors together can be slow, risky, and resource intensive, and results often don’t meet the accuracy needed to move from Proof of Concept (POC) to production.

On top of that, ‌technology is evolving at an unprecedented pace, which means more maintenance for the solution to remain effective. And for generative AI to be an effective enterprise solution, it must dynamically connect to internal data to provide the most accurate, high-quality responses.

Hiring, development, and maintenance all come at a price. Enterprise companies need to factor in those costs when considering DIY generative AI versus licensing a full-stack platform.

Summarized by Writer

  • Implementing generative AI for enterprise use is complex. It’s slow, risky, and resource-intensive to build a solution in-house.
  • DIY generative AI has hidden costs, such as development and production servers, databases, storage, and the need for a large language model (LLM) and retrieval augmented generation (RAG) approach.
  • A full-stack platform eliminates the costs of servers, storage, and in-house resources but involves annual platform fees, onboarding costs, and monthly subscription costs.
  • Full-stack solutions offer faster deployment, higher output quality, better security, and easier ROI success than DIY generative AI.

The hidden cost of DIY generative AI

Generative AI will have a few things in common with other solutions developed in-house. You’ll need to pay for development and production servers, databases, and storage. Critically, you’ll need to pick a large language model (LLM) and a retrieval augmented generation (RAG) approach.

An LLM is the foundation of generative AI — ‌massive datasets that use machine learning algorithms and natural language processing to generate output. Various LLM builders are available, all of which have pros and cons. Skill-specific models, like image analysis, can be great for narrowly defined use cases but fall short for general use cases. Picking a general-purpose model, like GPT-4 may seem smarter, but these models are often huge and cost-prohibitive when used at scale.

The most impactful generative AI solutions often use RAG. Rather than guessing or “hallucinating” (as AI is known to do), with RAG, responses are grounded in your internal company data, allowing users to analyze and ask questions specific to your business use cases.

A common way of implementing RAG is vector retrieval, which uses a combination of vector databases and embedding models. Embedding models convert data for storage in a vector database.

The image shows how content is fed into an embedding model to generate a vector embedding. Then it’s stored in a vector database. An application to retrieve similar content can query the vector database.
The connections between content, the application, and a vector database.

However, building your own RAG solution can become expensive fairly quickly. Gartner found that extending generative AI models via data retrieval can cost between $750,000 and $1,000,000. A RAG solution commonly requires 2–3 engineers to build and maintain a system.

Once set up, your primary cost will be your cost per query, which can be broken down into two main parts:

  • Vector storage cost: The most important factor will be how much data you’d like to store and how many replicas you’d like to maintain. Data storage amounts are relatively straightforward and depend on your use case, while replicas depend on latency. The more replicas you have, the more concurrent queries your vector retrieval solution can process.
  • LLM RAG costs: When asking your vector DB questions, normal LLM costs apply. The system calculates costs using tokens or characters in your prompt. RAG can be particularly expensive because you’re often filling up your context window with chunks of documents that your vector retrieval method identified as relevant to your query. For more expensive models like GPT-4, these costs can get quite high.

Let’s look at an example. Assume you’re a mid-sized enterprise looking to process about 200k queries a month. Your database has about 100k pages for product FAQs, research docs, and more — roughly equivalent to the Encyclopedia Britannica. The resulting cost is estimated to be over $190,000 per month.

The image shows a graph comparing the cost of vector storage per month to the number of queries. The cost per query for vector retrieval and storage is $0.14, while the cost per query for LLM prompt and response is $0.5040. The total monthly cost of using vector storage with two to three engineers is $193,800.
The expected costs associated with DIY generative AI vs vector retrieval

Next, you’ll need to develop an application for your users. This might be a chat interface or a UI that walks users through a series of inputs and uses that information to create a prompt.

To be the most effective, the application needs to be “where your users are” — both for easy access and to fit into their existing workflows. It also needs to be flexible enough to support multiple use cases, which might mean a chatbot and a UI.

We’ve seen development costs range from $600,000 to $1,500,000 for a single app. When you’re ready to scale and want to develop more applications, deploying five apps can cost around $5 million. This doesn’t include recurring costs, which can fall between $350,000 to $820,000 annually for a single app.

An integrated platform approach to developing custom apps and agents can help you future-proof your initiatives. This will allow you to focus on a cycle of high ROI rather than costly, ongoing maintenance.

The total cost of ownership of a full-stack platform

With any licensed full-stack solution, you’re immediately eliminating the costs of servers, storage, and in-house resources (at least those associated specifically with the project).

But, like any licensed product, you’ll have associated costs for access to a full-stack generative AI solution, including any of the following:

  • Annual platform fees to access the product, which can vary by vendor
  • Onboarding costs to train your users
  • Monthly subscription costs, such as per seat or by query volume (or both)

You may need to rely on internal resources, such as an engineer, to help with the initial setup. But outside of that, a full-stack platform has already integrated the services required — an LLM and RAG — so your users can immediately begin testing and implementing use cases. Instead of spending millions on a handful of apps, you only need to invest a fraction of the amount for triple the output.

Unlike vector databases, Knowledge Graph from Writer connects internal company data sources for richer RAG that can preserve semantic relationships and encode structural information.

RAG storage types - strengths and weakness

After the initial set up, your internal resources will be minimally involved, such as adding new users or staying up to speed with newly released features. Both functions would be similar to what you need for a DIY solution — and easier to manage with a full-stack platform. You also don’t need to worry about ongoing maintenance and upkeep.

With Writer, you also benefit from our expertise. We help enterprise companies with adoption and change management so you can build the right use cases and custom apps for your business.

The key roles to drive success in enterprise generative AI implementation

Read more

Things to consider when comparing the two approaches

Comparing the hard costs is only one part of the equation. There are other factors to consider that are harder to quantify but definitely impact your organization’s ability to effectively implement and scale generative AI.

Writer stack vs standard generative AI stack

Time to deployment

When you build a generative AI solution in-house, you’ll need to consider the amount of time involved and the resources. It could take an engineering team months — or even years — to piece together the disparate services you need for generative AI. If you don’t have engineers with the right expertise, then on top of that, you’ll need time to hire.

Writer recently ran a survey with Dimensional Research and found that 88% of companies building in-house solutions need six months or longer to have a solution operating reliably. It can take months to years for these companies to successfully deploy up to a handful of apps.

A full-stack solution allows you to realize value faster. A solution like Writer can get you up and running with dozens or more apps in just a few weeks to a few months. You also get premium support, vetted solutions, and AI apps tailored just for your company. This means you won’t waste time figuring out maintenance, upkeep, or staying on top of the latest generative AI trends.

How enterprise companies can map generative AI use cases for fast, safe implementation

How enterprise companies can map generative AI use cases for fast, safe implementation

Read more

Quality of output

Most LLMs are for general-purpose use cases, so they undergo training for a breadth of knowledge and skills, including casual users and personal use cases.

As our report shows, 61% of companies building DIY solutions have experienced AI accuracy issues. Most consider their output as mediocre or worse, with only 17% indicating that their DIY solution is excellent. While some enterprises are experimenting with foundation models, less than 10% have progressed from POC to scale. The process is complex and resource-intensive, with challenges like limited support for end-to-end workflow automation and issues with governance, leaving enterprises trapped in POC purgatory.

You need a model designed for business. This leads to higher accuracy and better results for your users. Without focusing on business output, your users may spend a lot of time editing, which doesn’t help your organization realize as much value from the DIY solution. Additionally, users may become frustrated if they have to spend a lot of time editing. Partnering with a full-stack platform to support your DIY efforts helps you easily scale your generative AI projects from POC to production.

The Writer approach to retrieval-augmented generation, combined with Palmyra LLMs, has an accuracy that’s almost twice as accurate as common alternatives.

Security

Our survey found that 94% of companies consider data protection a significant concern with generative AI. Depending on your industry, your data may need to meet security and compliance standards, such as SOC 2 Type II, PCI, or HIPAA. A DIY solution might struggle to meet these requirements.

When you integrate an LLM into your in-house solution, you often give up control over your data. The vendor often has the right to access, retain, and use your data as part of the generative AI responses — from any user at any organization accessing the same LLM. Your data isn’t segregated or protected.

You’ll also need to consider security within the DIY solution, which can be spread across the disparate components.

A full-stack generative AI platform has security built-in, from user access to how the product uses your data.

Change management

Generative AI represents a big shift in how people get work done. To get the most value out of AI, you need to identify the right use cases. You also need to future-proof your teams by training them to use the tool effectively and facilitating smooth adoption. Building an AI solution in-house means you’ll handle user adoption on your own while partnering with a full-stack platform provider provides valuable support in education and training.

Slow implementation and the additional costs associated with these efforts can impact the ROI of your project.

Integrated features

Stitching together an in-house generative AI platform often means you miss out on the integration and cohesive features that only a full-stack platform can provide. This might lead to potential gaps in security, efficiency, and user experience. Everything is designed from the ground up to work together by default with a platform like Writer. We provide flexible development tools and integrated features to make sure your entire organization sees value from generative AI faster. Here are some of the key features:

  • Integrated user and app management ensures easy onboarding, permission management, and usage monitoring 
  • Built-in security and privacy measures, like data encryption and compliance with industry standards, ​​protect your data and guarantee regulatory adherence 
  • Customizable AI guardrails allow you to set and adjust ethical and business guidelines, making sure AI operates within your desired parameters
  • Prebuilt templates for common use cases can be quickly adapted, accelerating your development and deployment
  • Knowledge Graph integration improves AI’s context and accuracy by using your existing data and insights for enhanced context
  • Fine-tuned models like Palmyra LLMs can be easily customized to meet your specific needs, achieving high performance without extensive data collection and training
  • Dedicated AI program management supports you from initial ideation throughout the change management process

Your total cost of ownership impacts ROI

ROI often measures the success of software, and ‌cost is a key consideration in the calculation. With DIY generative AI, the hard costs alone can be substantial, especially as query usage increases across the organization. Include additional factors such as time, quality, and security, and it becomes even harder to reach ROI.

Meaningful ROI comes from implementing a solution that can deliver at the pace the technology develops. With a DIY solution, you may find that you’re always “playing catch up” with allocating the resources, time, and technology needed to match generative AI developments.

As a result, you’ll always have a gap between what your DIY solution can produce and what’s possible. And generative AI isn’t a technology where you want to be behind the curve.

Our approach at Writer isn’t some untested, experimental approach to generative AI — it’s a proven solution. Hundreds of enterprise customers like Uber, L’Oréal, and Intuit have used our platform to get past ‌POC purgatory, scaling generative AI across dozens of mission-critical use cases. Writer allows you to reap the benefits of building in-house while minimizing the risks and maximizing the value of your generative AI initiatives.

Unsure how to achieve ROI with generative AI? Follow this six-step path.