IN THIS ARTICLE

    Content strategy

    Grammar technology has moved beyond rules and into meaning

    Grammar technology has moved beyond rules and into meaning

    How has writing changed technology? How has technology changed writing? In a continual quest to make work easier, people now rely on computers to help us become better writers. They fix our grammar and spelling mistakes, suggest better words, keep our style consistent.

    Natural language processing (NLP) is the field of artificial intelligence that focuses on teaching computers to understand and generate language. It’s used in translation, speech recognition, optical character recognition (understanding pictures of writing), semantic searches, and big text sentiment analysis.

    To understand the power of NLP, think about the difference between an online Spanish-English dictionary and Google Translate. The former lists a range of possible translations for a word; it’s rules-based and not really about processing the meaning (that’s still up to you to figure out). The latter provides more helpful, human-like assistance, taking context into account to evaluate the text and provide an optimal translation.

    Grammar correctors, including those built into the Writer platform, have been improving similarly using deep learning in order to give more accurate, context-based corrections.

    Online grammar check technology of the past

    There are two steps to improving natural language processing, and it’s cyclical:

    1. You need to improve your datasets to train better models.
    2. You need to improve your models to learn more from the datasets.

    1. Improve the dataset; 2. Improve the model

    For a long time, the most advanced English grammar check technologies were based on a dataset derived from Singaporean university students who were learning English. They wrote essays, those essays were corrected, and those corrections were used to build a dataset that taught a model what written English should and shouldn’t look like.

    Which is all fine as a starting point. But at some point, when you get good enough at NLP, all you will accomplish when you make a better model is to make it better at that very specific subproblem: correcting Singaporean university students who are learning English.

    That dataset quickly became of limited use for helping native English speakers and professional writers, who were unlikely to make exactly the same set of mistakes.

    Advancing online grammar check technology

    To provide more useful writing suggestions for native speakers and professional writers, spelling and grammar correction technology requires datasets that represent the target users. In other words, it needs writing samples from people who use more complex vocabulary and sentences. To this end, Writer has been building proprietary training and evaluation datasets based on native speakers and professional writers.

    Screenshot of Writer grammar checker
    Screenshot of Writer grammar checker

     

    For the second half, better models, Google Translate is again a good example. You may have noticed that their services got significantly better a few years ago. That’s because their team implemented a new type of deep learning model, neural sequence-to-sequence (seq2seq). “Sequence-to-sequence” means that the model reads a sequence of words (for example, in Spanish), and outputs a different sequence of words (for example, in English). “Neural” means that the model doing this is a neural network. Neural networks are a mathematical technique for modeling a wide variety of situations; this flexibility, as well as how nicely they scale on off-the-shelf hardware, has revolutionized machine learning and AI in the last decade.

    Writer’s grammar corrector now uses neural seq2seq as well. The new deep learning model, combined with Writer’s proprietary training and evaluation datasets based on native speakers, has already been outperforming leading grammar correction tools.

    • Precision = How many reported errors are real errors.
    • Recall = How many real errors are caught.
    • F_0.5 = A combination of recall and precision (we wanted to have one number to compare different systems, to make them easier to compare). It weighs precision higher than recall, to gauge if the model is reporting too many issues that aren’t errors.

    Making great grammar part of your brand consistency

    According to a study completed by our team, non-professional writers create about 90% of the content developed for businesses today. That makes it difficult to build a strong, consistent brand style. Even a thorough styleguide PDF is still an imperfect reference book. People will be forced to make their own writing decisions when they can’t find anything discussing what to do in their exact situation.

    But deep learning is changing how things are done, across a wide variety of industries. Problems considered unsolvable just a few years ago are now routinely done by systems powered by neural models. Today, advanced AI writing assistants like Writer have become the Google Translate of spelling and grammar check, as well as other writing mechanics like voice and terminology usage. Technology is going to continue to evolve and help writers improve the way they work and meet their goals. As a rule of thumb, I always say, “Don’t bet against deep learning.”