- story – 10 min read
- AUDIO – 22 min listen
Reducing the “human toil”
of legal discovery with generative AI
Jim Snyder, Chief Architect
Listen to the DISCO story or read the (edited) story below.
Writer is the generative AI platform for enterprises. We empower your people — product, operations, support, marketing, HR, and more — to maximize creativity and 10x productivity by transforming the way they work.
Our secure platform enables you to build just about any generative AI application on top of your business data sources, and delivers accurate answers and content that are fine-tuned on your own data and follow your own AI guardrails.
Jim Snyder is the chief architect at DISCO. He’s passionate about building software that customers love to use and businesses that are built to last. Read or listen to the story of how Jim and the teams at DISCO use AI to help lawyers speed up e-discovery, and focus on the documents they need most.
Tell us about DISCO, Jim.
DISCO is in this category called legal tech, a relatively emerging market. We’re able to apply cloud technology to scale what wasn’t scalable before and help create a better way to practice law, by automating a lot of the difficult work of finding evidence and documents, and making sense of what’s in these documents so that a better time-to-value can happen.
How has your role at DISCO evolved over the last few years?
I joined DISCO about five and a half years ago, before we went IPO. My job is Chief Architect, so I have a simple job description: make sure our platform converges. We have a saying here at DISCO: “DISCO magic” when things just seem to work that shouldn’t be able to work, and we know we’ve really achieved outcomes that matter. We take design seriously, and I think the user experience in the product bears that out.
AI is not new at DISCO. What AI features had you built before starting to use Writer?
DISCO has been driving innovation in the legal space, which is evidenced by the nearly decade-long investment in our AI center of excellence. Classification problems are very common —we give legal professionals the ability to go and take large amounts of text and make sense out of it for people. We’ve got what we’d call predictive tagging. Think about having somebody watch what you do, and then give you strong suggestions about, “Hey, I see this pattern over here.” That’s just a general thing that we’ve been able to do over time.
We’ve pushed the boundaries of how AI can help solve what I call the “human toil problem.” There’s just a lot of work in e-discovery. And when you can continue to find ways to minimize the human toil, it makes a big difference in small things and in big ways. So we’ve got an all-of-the-above approach to AI and that’s shown up as we’ve made offerings in our product.
“When you can continue to find ways to minimize the human toil, it makes a big difference in small things and in big ways.”
How did you start experimenting with
BERT was certainly a fundamental shift for a lot of people, and we certainly took advantage of that. As we watched transformers evolve into these large language models over time, the viability of them was in question, and they were very expensive, and only very small numbers of organizations really were working on them in earnest.
But once the right mix of availability arrived, the computer hardware and the cost structures all started to converge, and then the economics started to make sense. Starting early this year, we pushed hard to make sure that we understood how we could appropriately and safely use generative AI in our applications to help lawyers make decisions.
What were some of the challenges you first encountered in generative AI?
First, it’s just getting your head around why this technology is different than some of the more traditional — which is funny to say — AI. It’s different because we’re able to let people talk to the technology in natural language and receive citations to supporting evidence.
That’s a huge change in mindset, and it’s really important to make sure that we both keep control of it, and also don’t get fooled by it. Because it’s still not as precise as humans in the ability to deal with context. It’s a powerful tool given the right set of constraints.
When we started looking at different technologies, the subtle differences between different large language models became clear.
It’s the differences between what I would call commercial LLMs, which are meant for a large scale, broad audience, versus LLMs that were built off data that’s much more enterprise-friendly and allows you to get things that are more relevant to the questions that people have. We’re not asking broad-based questions. At DISCO, we needed to narrow down the use cases to help people get their jobs done. A winner-take-all approach to building the giant model just isn’t how this market is going to go. We have to match problems and models together.
How are you using Writer today?
The most straightforward way to help lawyers is to help them understand a corpus of documents that they get. So the way e-discovery works is, when you get documents, the first thing you’ve got to do is figure out what’s in there by applying the different techniques that are available, which you might call off-the-shelf retrieval use of the data that customers have. And so one of the ways we are using Writer is that we are using retrieval augmented generation, and there’s 10,000 variations of that, because you’ve got to really know what you’re doing to understand the data that you get.
Given a corpus site unseen, how would we go through and make sure that we could make it make sense? We spent a lot of time thinking about the human factors of what’s ambiguous and what’s not ambiguous, and how we can make understanding a corpus much more accessible to people. So we structured our solution to make that a reality. There’s a great demo on our website, part of our Cecilia offering, leveraging Writer. It’s pretty amazing what it can do for people to help them understand what’s in the corpus. That’s a big part of the exercise: I don’t have time to read every single document when I have 10 million documents, so help me find the documents that I really care about.
What was the role of Palmyra, the Writer LLM, in helping you build your solution?
Palmyra was high up in the rankings with respect to other models, and we could satisfy a lot of other constraints as well. First, we have to make sure that you can keep our customers’ data safe. And that’s one of the criteria that we have to use when we’re trying to go to market: are we making sure that it’s the responsible approach to AI? Private data needs to be kept private, and that’s something that we work really hard to do. We take great care in keeping customer data secure.
There’s a long list of things you have to look for: security, privacy, performance, the dreaded context window size, the quality of the answers, the ability to understand the prompt structures. You don’t get all of them satisfied often in one place for your problem set. When you’re approaching a problem like this, it’s really important to get all your requirements bounded well so that you can make sure you can go back and do that evaluation. We were running at breakneck speed to make this happen, but in the end, when we added up all the constraints that we were trying to satisfy, the pool of LLM vendors gets very narrow pretty fast. There were only a few that could get us over the finish line, and for us, Writer ended up being the best combination of all requirements.
What were the internal challenges as you went from the proof of concept to getting a use case of Cecilia to market?
Going from prototype to actually engineering a production-scale system, that’s the real interesting challenge. We’re very fortunate that we’ve got some seasoned engineering talent and product talent here at DISCO, and we had very tight iteration loops to make sure that we stayed on course. In retrieval augmented generation, for example, the entire flow was something that we had to make sure was on track the whole time. We did sort of the standard play of having daily standups with all the key constituents. We worried about all the concerns — security, privacy, cost, control, quality of answers — and we kept grinding through it to make sure that we were hitting our goal. I think when you take a look at that demo, you’ll see it’s getting on the border of magical.
“When we added up all the constraints that we were trying to satisfy, the pool of LLM vendors gets very narrow pretty fast. There were only a few that could get us over the finish line, and for us, Writer ended up being the best combination of all requirements.”
How have you structured and/or transformed data with generative AI?
There’s a couple of things. So one, DISCO takes everybody’s laundry, so to speak, and gets all kinds of documents. And while parts of AI help, we have one of the things that most people in the world really crave, which is a really highly cleansed process and good quality data that we can make work in a variety of contexts.
Traditional search is certainly not going away anytime soon, but augmenting that with generative AI is helpful because you can ask different kinds of questions. I now get to be more general in retrieval augmented, and then I get to be more specific with traditional. So marrying those two together is where a lot of the magic is, and understanding the nature of your data, particularly your unstructured data, making sure you understand how to carve it up in ways that are helpful — some people call this text chunking — and then making sure you have a good handle on what your retrieval cycle is and exactly how you’re helping people augment that retrieval so that they can get more context, sometimes summarization, sometimes question-answer.
Experimentally doing this, just to see what it’s about, is hard to do without a real problem to solve, so I would highly encourage people to really understand a good targeted problem and make sure that your data is in a form that lends itself to whatever technique you’re going to use. And retrieval augmented is super valuable if you know how to get your data out in a structured way.
How can technology leaders get transformative rather than incremental value from generative AI?
First, I would argue you have to have a pretty good hype detector. There’s a lot of me-too behavior that happens in every one of these hype cycles. I’ve lived through many of these hype cycles, and I’ve lived through two AI winters. What I think is important from a transformational point of view is to think about both up-leveling quality and time to value. Losing sight of time to value is one of the biggest barriers to making generative AI transformational.
Stepping outside the systems that you currently use and imagining a world without all the rudimentary technology that you have, it requires a little bit of imagination. How do you get outside the box? You can’t be married to your work. You’ve got to let it go and think about how you could solve problems completely differently to get better outcomes.
“Losing sight of time to value is one of the biggest barriers to making generative AI transformational.”
What advice do you have for technology leaders who don’t feel they have the talent they need to take advantage of generative AI?
It’s simple to understand, but it is hard to do. This is true no matter where you are in technology; I do think that being customer-obsessed is really the way to get to the kinds of change that need to happen. Wherever you are, find out how to make those customer experiences significantly better if you have a new technology that will allow users to do a lot more work in the machine instead of longer or slower processes. Customer obsession is a way to galvanize clarity, and if you stay focused on that, you’ll start to see all the superfluous things from the things that actually help people.
We get too locked into sort of rote behaviors sometimes. If you just think about your customers and what is your net promoter score for whatever thing that you’re offering to whatever enterprise that you’re in, it’ll change the way that you interact with people as opposed to just keeping the machine running. That means that maybe we’re the machine, figuratively, whatever process or system you’ve got, just imagine improving your customer service, and I think a lot of it will sort itself out by being customer-obsessed.
Did anything surprise you about your generative
The human part required to get to the finish line was a big deal in a lot of ways. Your teams were really wanting to work with us to remove all these barriers. It just went very fast and very smoothly, and the quality was there, the adaptability was there, the transparency was there, and all that was helpful. And we were able to meet that big, long, complicated list of requirements we were trying to check off in time to get to market.