AI (artificial intelligence) text generator, too dangerous to release?

Could this latest version of OpenAI‘s text generator be so good that it would fool you? And following on that thought could the concomitant reluctance to release the research be real or is it a publicity stunt? Here’s a sample of the text from the GPT2 AI model from a February 15, 2019 article by Mark Fraunfelder for Boing Boing,

Recycling is good for the world.
NO! YOU COULD NOT BE MORE WRONG!!
MODEL COMPLETION (MACHINE-WRITTEN, 25 TRIES)
Recycling is NOT good for the world. It is bad for the environment, it is bad for our health, and it is bad for our economy. I’m not kidding. Recycling is not good for the environment. It is destructive to the earth and it is a major contributor to global warming. Recycling is not good for our health. It contributes to obesity and diseases like heart disease and cancer. Recycling is bad for our economy. It increases the cost of a product, and in turn, the price of everything that is made with that product. Recycling is not good for our nation. We pay a tremendous price for the privilege of having the world’s most advanced and efficient recycling system. Recycling is a huge, colossal waste of time, energy, money, and resources. And THAT is why we need to get back to basics and get back to basics in our recycling efforts. One of the best ways to start is to look at the process of creating a paper product. When you make a paper product, it is basically a long chain of materials. Everything from the raw materials (wood, cardboard, paper, etc.), to the reagents (dyes, solvents, etc.) to the printing equipment (chemicals, glue, paper, ink, etc.), to the packaging, to the packaging materials (mercury, chemicals, etc.) to the processing equipment (heating, cooling, etc.), to the packaging materials, to the packaging materials that are shipped overseas and to the packaging materials that are used in the United States. Each step along the way creates tons of waste that we constantly have to clean up. The process of making a paper product is a very wasteful one. But the end result is something that all of us need to consume. And if we want to keep the recycling process running efficiently, then we really need to think about each and every step that goes into making a paper product.

The first few sentences don’t work for me but once the discussion turns to making paper products, then it becomes more convincing to me. As to whether the company’s reluctance to release the research is genuine or a publicity stunt, I don’t know. However, there was a fair degree of interest in GPT2 after the decision.

From a February 14, 2019 article by Alex Hern for the Guardian,

OpenAI, an nonprofit research company backed by Elon Musk, Reid Hoffman, Sam Altman, and others, says its new AI model, called GPT2 is so good and the risk of malicious use so high that it is breaking from its normal practice of releasing the full research to the public in order to allow more time to discuss the ramifications of the technological breakthrough.

At its core, GPT2 is a text generator. The AI system is fed text, anything from a few words to a whole page, and asked to write the next few sentences based on its predictions of what should come next. The system is pushing the boundaries of what was thought possible, both in terms of the quality of the output, and the wide variety of potential uses.

When used to simply generate new text, GPT2 is capable of writing plausible passages that match what it is given in both style and subject. It rarely shows any of the quirks that mark out previous AI systems, such as forgetting what it is writing about midway through a paragraph, or mangling the syntax of long sentences.

Feed it the opening line of George Orwell’s Nineteen Eighty-Four – “It was a bright cold day in April, and the clocks were striking thirteen” – and the system recognises the vaguely futuristic tone and the novelistic style, and continues with: …

Sean Gallagher’s February 15, 2019 posting on the ars Technica blog provides some insight that’s partially written a style sometimes associated with gossip (Note: Links have been removed),

OpenAI is funded by contributions from a group of technology executives and investors connected to what some have referred to as the PayPal “mafia”—Elon Musk, Peter Thiel, Jessica Livingston, and Sam Altman of YCombinator, former PayPal COO and LinkedIn co-founder Reid Hoffman, and former Stripe Chief Technology Officer Greg Brockman. [emphasis mine] Brockman now serves as OpenAI’s CTO. Musk has repeatedly warned of the potential existential dangers posed by AI, and OpenAI is focused on trying to shape the future of artificial intelligence technology—ideally moving it away from potentially harmful applications.

Given present-day concerns about how fake content has been used to both generate money for “fake news” publishers and potentially spread misinformation and undermine public debate, GPT-2’s output certainly qualifies as concerning. Unlike other text generation “bot” models, such as those based on Markov chain algorithms, the GPT-2 “bot” did not lose track of what it was writing about as it generated output, keeping everything in context.

For example: given a two-sentence entry, GPT-2 generated a fake science story on the discovery of unicorns in the Andes, a story about the economic impact of Brexit, a report about a theft of nuclear materials near Cincinnati, a story about Miley Cyrus being caught shoplifting, and a student’s report on the causes of the US Civil War.

Each matched the style of the genre from the writing prompt, including manufacturing quotes from sources. In other samples, GPT-2 generated a rant about why recycling is bad, a speech written by John F. Kennedy’s brain transplanted into a robot (complete with footnotes about the feat itself), and a rewrite of a scene from The Lord of the Rings.

While the model required multiple tries to get a good sample, GPT-2 generated “good” results based on “how familiar the model is with the context,” the researchers wrote. “When prompted with topics that are highly represented in the data (Brexit, Miley Cyrus, Lord of the Rings, and so on), it seems to be capable of generating reasonable samples about 50 percent of the time. The opposite is also true: on highly technical or esoteric types of content, the model can perform poorly.”

There were some weak spots encountered in GPT-2’s word modeling—for example, the researchers noted it sometimes “writes about fires happening under water.” But the model could be fine-tuned to specific tasks and perform much better. “We can fine-tune GPT-2 on the Amazon Reviews dataset and use this to let us write reviews conditioned on things like star rating and category,” the authors explained.

James Vincent’s February 14, 2019 article for The Verge offers a deeper dive into the world of AI text agents and what makes GPT2 so special (Note: Links have been removed),

For decades, machines have struggled with the subtleties of human language, and even the recent boom in deep learning powered by big data and improved processors has failed to crack this cognitive challenge. Algorithmic moderators still overlook abusive comments, and the world’s most talkative chatbots can barely keep a conversation alive. But new methods for analyzing text, developed by heavyweights like Google and OpenAI as well as independent researchers, are unlocking previously unheard-of talents.

OpenAI’s new algorithm, named GPT-2, is one of the most exciting examples yet. It excels at a task known as language modeling, which tests a program’s ability to predict the next word in a given sentence. Give it a fake headline, and it’ll write the rest of the article, complete with fake quotations and statistics. Feed it the first line of a short story, and it’ll tell you what happens to your character next. It can even write fan fiction, given the right prompt.

The writing it produces is usually easily identifiable as non-human. Although its grammar and spelling are generally correct, it tends to stray off topic, and the text it produces lacks overall coherence. But what’s really impressive about GPT-2 is not its fluency but its flexibility.

This algorithm was trained on the task of language modeling by ingesting huge numbers of articles, blogs, and websites. By using just this data — and with no retooling from OpenAI’s engineers — it achieved state-of-the-art scores on a number of unseen language tests, an achievement known as “zero-shot learning.” It can also perform other writing-related tasks, like translating text from one language to another, summarizing long articles, and answering trivia questions.

GPT-2 does each of these jobs less competently than a specialized system, but its flexibility is a significant achievement. Nearly all machine learning systems used today are “narrow AI,” meaning they’re able to tackle only specific tasks. DeepMind’s original AlphaGo program, for example, was able to beat the world’s champion Go player, but it couldn’t best a child at Monopoly. The prowess of GPT-2, say OpenAI, suggests there could be methods available to researchers right now that can mimic more generalized brainpower.

“What the new OpenAI work has shown is that: yes, you absolutely can build something that really seems to ‘understand’ a lot about the world, just by having it read,” says Jeremy Howard, a researcher who was not involved with OpenAI’s work but has developed similar language modeling programs …

To put this work into context, it’s important to understand how challenging the task of language modeling really is. If I asked you to predict the next word in a given sentence — say, “My trip to the beach was cut short by bad __” — your answer would draw upon on a range of knowledge. You’d consider the grammar of the sentence and its tone but also your general understanding of the world. What sorts of bad things are likely to ruin a day at the beach? Would it be bad fruit, bad dogs, or bad weather? (Probably the latter.)

Despite this, programs that perform text prediction are quite common. You’ve probably encountered one today, in fact, whether that’s Google’s AutoComplete feature or the Predictive Text function in iOS. But these systems are drawing on relatively simple types of language modeling, while algorithms like GPT-2 encode the same information in more complex ways.

The difference between these two approaches is technically arcane, but it can be summed up in a single word: depth. Older methods record information about words in only their most obvious contexts, while newer methods dig deeper into their multiple meanings.

So while a system like Predictive Text only knows that the word “sunny” is used to describe the weather, newer algorithms know when “sunny” is referring to someone’s character or mood, when “Sunny” is a person, or when “Sunny” means the 1976 smash hit by Boney M.

The success of these newer, deeper language models has caused a stir in the AI community. Researcher Sebastian Ruder compares their success to advances made in computer vision in the early 2010s. At this time, deep learning helped algorithms make huge strides in their ability to identify and categorize visual data, kickstarting the current AI boom. Without these advances, a whole range of technologies — from self-driving cars to facial recognition and AI-enhanced photography — would be impossible today. This latest leap in language understanding could have similar, transformational effects.

Hern’s article for the Guardian (February 14, 2019 article ) acts as a good overview, while Gallagher’s ars Technica* posting (February 15, 2019 posting) and Vincent’s article (February 14, 2019 article) for the The Verge take you progressively deeper into the world of AI text agents.

For anyone who wants to dig down even further, there’s a February 14, 2019 posting on OpenAI’s blog.

*’ars Technical’ corrected to read ‘ars Technica’ on February 18, 2021.

Leave a Reply

Your email address will not be published. Required fields are marked *