Tag Archives: Turing Tests in the Creative Arts

Beer and wine reviews, the American Chemical Society’s (ACS) AI editors, and the Turing Test

The Turing test first known as the ‘Imitation Game’, was designed by scientist Alan Turing in 1950 to see if a machine’s behaviour (in this case, a ‘conversation’) could fool someone into believing it was human. It’s a basic test to help determine true artificial intelligence.

These days ‘artificial intelligence’ seems to be everywhere, although I’m not sure that all these algorithms would pass the Turing test. Some of the latest material I’ve seen suggests that writers and editors may have to rethink their roles in future. Let’s start with the beer and wine reviews.

Writing

An April 25, 2022 Dartmouth College news release by David Hirsch announces the AI reviewer, Note: Links have been removed,

In mid-2020, the computer science team of Keith Carlson, Allen Riddell and Dan Rockmore was stuck on a problem. It wasn’t a technical challenge. The computer code they had developed to write product reviews was working beautifully. But they were struggling with a practical question.

“Getting the code to write reviews was only the first part of the challenge,” says Carlson, Guarini ’21, a doctoral research fellow at the Tuck School of Business, “The remaining challenge was figuring out how and where it could be used.”

The original study took on two challenges: to design code that could write original, human-quality product reviews using a small set of product features and to see if the algorithm could be adapted to write “synthesis reviews” for products from a large number of existing reviews.

Review writing can be challenging because of the overwhelming number of products available. The team wanted to see if artificial intelligence was up to the task of writing opinionated text about vast product classes.

They focused on wine and beer reviews because of the extensive availability of material to train the algorithm. The relatively narrow vocabularies used to describe the products also makes it open to the techniques of AI systems and natural language processing tools.

The project was kickstarted by Riddell, a former fellow at the Neukom Institute for Computational Science, and developed with Carlson under the guidance of Rockmore, the William H. Neukom 1964 Distinguished Professor of Computational Science.

The code couldn’t taste the products, but it did ingest reams of written material. After training the algorithm on hundreds of thousands of published wine and beer reviews, the team found that the code could complete both tasks.

One result read: “This is a sound Cabernet. It’s very dry and a little thin in blackberry fruit, which accentuates the acidity and tannins. Drink up.”

Another read: “Pretty dark for a rosé, and full-bodied, with cherry, raspberry, vanilla and spice flavors. It’s dry with good acidity.”

“But now what?” Carlson explains as a question that often gnaws at scientists. The team wondered, “Who else would care?”

“I didn’t want to quit there,” says Rockmore. “I was sure that this work could be interesting to a wider audience.”

Sensing that the paper could have relevance in marketing, the team walked the study to Tuck Drive to see what others would think.

“Brilliant,” Praveen Kopalle, the Signal Companies’ Professor of Management at Tuck School of Business, recalls thinking when first reviewing the technical study.

Kopalle knew that the research was important. It could even “disrupt” the online review industry, a huge marketplace of goods and services.

“The paper has a lot of marketing applications, particularly in the context of online reviews where we can create reviews or descriptions of products when they may not already exist,” adds Kopalle. “In fact, we can even think about summarizing reviews for products and services as well.”

With the addition of Prasad Vana, assistant professor of business administration at Tuck, the team was complete. Vana reframed the technical feat of creating review-writing code into that of a market-friendly tool that can assist consumers, marketers, and professional reviewers.

“This is a sound Cabernet. It’s very dry and a little thin in blackberry fruit, which accentuates the acidity and tannins. Drink up.” Attribution: Artificial Intelligence review from Dartmouth project

The resulting research, published in International Journal of Research in Marketing, surveyed independent participants to confirm that the AI system wrote human-like reviews in both challenges.

“Using artificial intelligence to write and synthesize reviews can create efficiencies on both sides of the marketplace,” said Vana. “The hope is that AI can benefit reviewers facing larger writing workloads and consumers who have to sort through so much content about products.”

The paper also dwells on the ethical concerns raised by computer-generated content. It notes that marketers could get better acceptance by falsely attributing the reviews to humans. To address this, the team advocates for transparency when computer-generated text is used.

They also address the issue of computers taking human jobs. Code should not replace professional product reviewers, the team insists in the paper. The technology is meant to make the tasks of producing and reading the material more efficient. [emphasis mine]

“It’s interesting to imagine how this could benefit restaurants that cannot afford sommeliers or independent sellers on online platforms who may sell hundreds of products,” says Vana.

According to Carlson, the paper’s first author, the project demonstrates the potential of AI, the power of innovative thinking, and the promise of cross-campus collaboration.

“It was wonderful to work with colleagues with different expertise to take a theoretical idea and bring it closer to the marketplace,” says Carlson. “Together we showed how our work could change marketing and how people could use it. That could only happen with collaboration.”

A revised April 29, 2022 version was published on EurekAlert and some of the differences are interesting (to me, if no one else). As you see, there’s a less ‘friendly’ style and the ‘jobs’ issue has been approached differently. Note: Links have been removed,

Artificial intelligence systems can be trained to write human-like product reviews that assist consumers, marketers and professional reviewers, according to a study from Dartmouth College, Dartmouth’s Tuck School of Business, and Indiana University.

The research, published in the International Journal of Research in Marketing, also identifies ethical challenges raised by the use of the computer-generated content.

“Review writing is challenging for humans and computers, in part, because of the overwhelming number of distinct products,” said Keith Carlson, a doctoral research fellow at the Tuck School of Business. “We wanted to see how artificial intelligence can be used to help people that produce and use these reviews.”

For the research, the Dartmouth team set two challenges. The first was to determine whether a machine can be taught to write original, human-quality reviews using only a small number of product features after being trained on a set of existing content. Secondly, the team set out to see if machine learning algorithms can be used to write syntheses of reviews of products for which many reviews already exist.

“Using artificial intelligence to write and synthesize reviews can create efficiencies on both sides of the marketplace,” said Prasad Vana, assistant professor of business administration at Tuck School of Business. “The hope is that AI can benefit reviewers facing larger writing workloads and consumers that have to sort through so much content about products.”

The researchers focused on wine and beer reviews because of the extensive availability of material to train the computer algorithms. Write-ups of these products also feature relatively focused vocabularies, an advantage when working with AI systems.

To determine whether a machine could write useful reviews from scratch, the researchers trained an algorithm on about 180,000 existing wine reviews. Metadata tags for factors such as product origin, grape variety, rating, and price were also used to train the machine-learning system.

When comparing the machine-generated reviews against human reviews for the same wines, the research team found agreement between the two versions. The results remained consistent even as the team challenged the algorithms by changing the amount of input data that was available for reference.

The machine-written material was then assessed by non-expert study participants to test if they could determine whether the reviews were written by humans or a machine. According to the research paper, the participants were unable to distinguish between the human and AI-generated reviews with any statistical significance. Furthermore, their intent to purchase a wine was similar across human versus machine generated reviews of the wine. 

Having found that artificial intelligence can write credible wine reviews, the research team turned to beer reviews to determine the effectiveness of using AI to write “review syntheses.” Rather than being trained to write new reviews, the algorithm was tasked with aggregating elements from existing reviews of the same product. This tested AI’s ability to identify and provide limited but relevant information about products based on a large volume of varying opinions.

“Writing an original review tests the computer’s expressive ability based on a relatively narrow set of data. Writing a synthesis review is a related but distinct task where the system is expected to produce a review that captures some of the key ideas present in an existing set of reviews for a product,” said Carlson, who conducted the research while a PhD candidate in computer science at Dartmouth.

To test the algorithm’s ability to write review syntheses, researchers trained it on 143,000 existing reviews of over 14,000 beers. As with the wine dataset, the text of each review was paired with metadata including the product name, alcohol content, style, and scores given by the original reviewers.

As with the wine reviews, the research used independent study participants to judge whether the machine-written summaries captured and summarized the opinions of numerous reviews in a useful, human-like manner.

According to the paper, the model was successful at taking the reviews of a product as input and generating a synthesis review for that product as output.

“Our modeling framework could be useful in any situation where detailed attributes of a product are available and a written summary of the product is required,” said Vana. “It’s interesting to imagine how this could benefit restaurants that cannot afford sommeliers or independent sellers on online platforms who may sell hundreds of products.”

Both challenges used a deep learning neural net based on transformer architecture to ingest, process and output review language.

According to the research team, the computer systems are not intended to replace professional writers and marketers, but rather to assist them in their work. A machine-written review, for instance, could serve as a time-saving first draft of a review that a human reviewer could then revise. [emphasis mine]

The research can also help consumers. Syntheses reviews—like those on beer in the study—can be expanded to the constellation of products and services in online marketplaces to assist people who have limited time to read through many product reviews.

In addition to the benefits of machine-written reviews, the research team highlights some of the ethical challenges presented by using computer algorithms to influence human consumer behavior.

Noting that marketers could get better acceptance of machine-generated reviews by falsely attributing them to humans, the team advocates for transparency when computer-generated reviews are offered.

“As with other technology, we have to be cautious about how this advancement is used,” said Carlson. “If used responsibly, AI-generated reviews can be both a productivity tool and can support the availability of useful consumer information.”

Researchers contributing to the study include Praveen Kopalle, Dartmouth’s Tuck School of Business; Allen Riddell, Indiana University, and Daniel Rockmore, Dartmouth College.

I wonder if the second news release was written by an AI agent.

Here’s a link to and a citation for the paper,

Complementing human effort in online reviews: A deep learning approach to automatic content generation and review synthesis by Keith Carlson, Praveen K.Kopal, Allen Ridd, Daniel Rockmore, Prasad Vana. International Journal of Research in Marketing DOI: https://doi.org/10.1016/j.ijresmar.2022.02.004 Available online 12 February 2022 In Press, Corrected Proof

This paper is behind a paywall.

Daniel (Dan) Rockmore was mentioned here in a May 6, 2016 posting about a competition he’d set up through Dartmouth College,’s Neukom Institute. The competition, which doesn’t seem to have been run since 2018, was called Turing Tests in Creative Arts.

Editing

It seems the American Chemical Society (ACS) has decided to further automate some of its editing. From an April 28, 2022 Digital Science business announcement (also on EurekAlert) by David Ellis,

Writefull’s world-leading AI-based language services have been integrated into the American Chemical Society’s (ACS) Publications workflow.

In a partnership that began almost two years ago, ACS has now progressed to a full integration of Writefull’s application programming interfaces (APIs) for three key uses.

One of the world’s largest scientific societies, ACS publishes more than 300,000 research manuscripts in more than 60 scholarly journals per year.

Writefull’s proprietary AI technology is trained on millions of scientific papers using Deep Learning. It identifies potential language issues with written texts, offers solutions to those issues, and automatically assesses texts’ language quality. Thanks to Writefull’s APIs, its tech can be applied at all key points in the editorial workflows.

Writefull’s Manuscript Categorization API is now used by ACS before copyediting to automatically classify all accepted manuscripts by their language quality. Using ACS’s own classification criteria, the API assigns a level-of-edit grade to manuscripts at scale without editors having to open documents and review the text. After thorough benchmarking alongside human editors, Writefull reached more than 95% alignment in grading texts, significantly reducing the time ACS spends on manuscript evaluation.

The same Manuscript Categorization API is now part of ACS’s quality control program, to evaluate the language in manuscripts after copyediting.

Writefull’s Metadata API is also being used to automate aspects of manuscript review, ensuring that all elements of an article are complete prior to publication. The same API is used by Open Access publisher Hindawi as a pre-submission structural checks tool for authors.

Juan Castro, co-founder and CEO of Writefull, says: “Our partnership with the American Chemical Society over the past two years has been aimed at thoroughly vetting and shaping our services to meet ACS’s needs. Writefull’s AI-based language services empower publishers to increase their workflow efficiency and positively impact production costs, while also maintaining the quality and integrity of the manuscript.”

Digital Science is a technology company working to make research more efficient. We invest in, nurture and support innovative businesses and technologies that make all parts of the research process more open and effective. Our portfolio includes admired brands including Altmetric, Dimensions, Figshare, ReadCube, Symplectic, IFI CLAIMS, GRID, Overleaf, Ripeta and Writefull. We believe that together, we can help researchers make a difference. Visit www.digital-science.com and follow @digitalsci on Twitter.

Writefull is a technology startup that creates tools to help researchers improve their writing in English. The first version of the Writefull product allowed researchers to discover patterns in academic language, such as frequent word combinations and synonyms in context. The new version utilises Natural Language Processing and Deep Learning algorithms that will give researchers feedback on their full texts. Visit writefull.com and follow @writefullapp on Twitter.

The American Chemical Society (ACS) is a nonprofit organization chartered by the U.S. Congress. ACS’ mission is to advance the broader chemistry enterprise and its practitioners for the benefit of Earth and all its people. The Society is a global leader in promoting excellence in science education and providing access to chemistry-related information and research through its multiple research solutions, peer-reviewed journals, scientific conferences, eBooks and weekly news periodical Chemical & Engineering News. ACS journals are among the most cited, most trusted and most read within the scientific literature; however, ACS itself does not conduct chemical research. As a leader in scientific information solutions, its CAS division partners with global innovators to accelerate breakthroughs by curating, connecting and analyzing the world’s scientific knowledge. ACS’ main offices are in Washington, D.C., and Columbus, Ohio. Visit www.acs.org and follow @AmerChemSociety on Twitter.

So what?

An artificial intelligence (AI) agent being used for writing assignments is not new (see my July 16, 2014 posting titled, “Writing and AI or is a robot writing this blog?“). The argument that these agents will assist rather than replace (pick an occupation: e.g., writers, doctors, programmers, scientists, etc) is almost always used as scientists explain that AI agents will take over the boring work giving you (the human) more opportunities to do interesting work. The AI-written beer and wine reviews described here support at least part of the argument—for the time being.

It’s true that an AI agent can’t taste beer or wine but that can change as this August 8, 2019 article by Alice Johnston for CNN hints (Note: Links have been removed),

An artificial “tongue” that can taste minute differences between varieties of Scotch whisky could be the key to identifying counterfeit alcohol, scientists say.

Engineers from the universities of Glasgow and Strathclyde in Scotland created a device made of gold and aluminum and measured how it absorbed light when submerged in different kinds of whisky.

Analysis of the results allowed the scientists to identify the samples from Glenfiddich, Glen Marnoch and Laphroaig with more than 99% accuracy

BTW, my earliest piece on artificial tongues is a July 28, 2011 posting, “Bio-inspired electronic tongue replaces sommelier?,” about research in Spain.

For a contrast, this is the first time I can recall seeing anything about an artificial intelligence agent that edits and Writefall’s use at the ACS falls into the ‘doing all the boring work’ category and narrative quite neatly.

Having looked at a definition of the various forms of editing and core skills, I”m guessing that AI will take over every aspect (from the Editors’ Association of Canada, Definitions of Editorial Skills webpage),

CORE SKILLS

Structural Editing

Assessing and shaping draft material to improve its organization and content. Changes may be suggested to or drafted for the writer. Structural editing may include:

revising, reordering, cutting, or expanding material

writing original material

determining whether permissions are necessary for third-party material

recasting material that would be better presented in another form, or revising material for a different medium (such as revising print copy for web copy)

clarifying plot, characterization, or thematic elements

Also known as substantive editing, manuscript editing, content editing, or developmental editing.

Stylistic Editing

Editing to clarify meaning, ensure coherence and flow, and refine the language. It includes:

eliminating jargon, clichés, and euphemisms

establishing or maintaining the language level appropriate for the intended audience, medium, and purpose

adjusting the length and structure of sentences and paragraphs

establishing or maintaining tone, mood, style, and authorial voice or level of formality

Also known as line editing (which may also include copy editing).

Copy Editing

Editing to ensure correctness, accuracy, consistency, and completeness. It includes:

editing for grammar, spelling, punctuation, and usage

checking for consistency and continuity of mechanics and facts, including anachronisms, character names, and relationships

editing tables, figures, and lists

notifying designers of any unusual production requirements

developing a style sheet or following one that is provided

correcting or querying general information that should be checked for accuracy 

It may also include:

marking levels of headings and the approximate placement of art

Canadianizing or other localizing

converting measurements

providing or changing the system of citations

editing indexes

obtaining or listing permissions needed

checking front matter, back matter, and cover copy

checking web links

Note that “copy editing” is often loosely used to include stylistic editing, structural editing, fact checking, or proofreading. Editors Canada uses it only as defined above.

Proofreading

Examining material after layout or in its final format to correct errors in textual and visual elements. The material may be read in isolation or against a previous version. It includes checking for:

adherence to design

minor mechanical errors (such as spelling mistakes or deviations from style sheet)

consistency and accuracy of elements in the material (such as cross-references, running heads, captions, web page heading tags, hyperlinks, and metadata)

It may also include:

distinguishing between printer’s, designer’s, or programmer’s errors and writer’s or editor’s alterations

copyfitting

flagging or checking locations of art

inserting page numbers or checking them against content and page references

Note that proofreading is checking a work after editing; it is not a substitute for editing.

I’m just as happy to get rid of ‘boring’ parts of my work as anyone else but that’s how I learned in the first place and I haven’t seen any discussion about the importance of boring, repetitive tasks for learning.

Will AI ‘artists’ be able to fool a panel judging entries the Neukom Institute Prizes in Computational Arts?

There’s an intriguing competition taking place at Dartmouth College (US) according to a May 2, 2016 piece on phys.org (Note: Links have been removed),

Algorithms help us to choose which films to watch, which music to stream and which literature to read. But what if algorithms went beyond their jobs as mediators of human culture and started to create culture themselves?

In 1950 English mathematician and computer scientist Alan Turing published a paper, “Computing Machinery and Intelligence,” which starts off by proposing a thought experiment that he called the “Imitation Game.” In one room is a human “interrogator” and in another room a man and a woman. The goal of the game is for the interrogator to figure out which of the unknown hidden interlocutors is the man and which is the woman. This is to be accomplished by asking a sequence of questions with responses communicated either by a third party or typed out and sent back. “Winning” the Imitation Game means getting the identification right on the first shot.

Turing then modifies the game by replacing one interlocutor with a computer, and asks whether a computer will be able to converse sufficiently well that the interrogator cannot tell the difference between it and the human. This version of the Imitation Game has come to be known as the “Turing Test.”

On May 18 [2016] at Dartmouth, we will explore a different area of intelligence, taking up the question of distinguishing machine-generated art. Specifically, in our “Turing Tests in the Creative Arts,” we ask if machines are capable of generating sonnets, short stories, or dance music that is indistinguishable from human-generated works, though perhaps not yet so advanced as Shakespeare, O. Henry or Daft Punk.

The piece on phys.org is a crossposting of a May 2, 2016 article by Michael Casey and Daniel N. Rockmore for The Conversation. The article goes on to describe the competitions,

The dance music competition (“Algorhythms”) requires participants to construct an enjoyable (fun, cool, rad, choose your favorite modifier for having an excellent time on the dance floor) dance set from a predefined library of dance music. In this case the initial random “seed” is a single track from the database. The software package should be able to use this as inspiration to create a 15-minute set, mixing and modifying choices from the library, which includes standard annotations of more than 20 features, such as genre, tempo (bpm), beat locations, chroma (pitch) and brightness (timbre).

In what might seem a stiffer challenge, the sonnet and short story competitions (“PoeTix” and “DigiLit,” respectively) require participants to submit self-contained software packages that upon the “seed” or input of a (common) noun phrase (such as “dog” or “cheese grater”) are able to generate the desired literary output. Moreover, the code should ideally be able to generate an infinite number of different works from a single given prompt.

To perform the test, we will screen the computer-made entries to eliminate obvious machine-made creations. We’ll mix human-generated work with the rest, and ask a panel of judges to say whether they think each entry is human- or machine-generated. For the dance music competition, scoring will be left to a group of students, dancing to both human- and machine-generated music sets. A “winning” entry will be one that is statistically indistinguishable from the human-generated work.

The competitions are open to any and all comers [competition is now closed; the deadline was April 15, 2016]. To date, entrants include academics as well as nonacademics. As best we can tell, no companies have officially thrown their hats into the ring. This is somewhat of a surprise to us, as in the literary realm companies are already springing up around machine generation of more formulaic kinds of “literature,” such as earnings reports and sports summaries, and there is of course a good deal of AI automation around streaming music playlists, most famously Pandora.

The authors discuss issues with judging the entries,

Evaluation of the entries will not be entirely straightforward. Even in the initial Imitation Game, the question was whether conversing with men and women over time would reveal their gender differences. (It’s striking that this question was posed by a closeted gay man [Alan Turing].) The Turing Test, similarly, asks whether the machine’s conversation reveals its lack of humanity not in any single interaction but in many over time.

It’s also worth considering the context of the test/game. Is the probability of winning the Imitation Game independent of time, culture and social class? Arguably, as we in the West approach a time of more fluid definitions of gender, that original Imitation Game would be more difficult to win. Similarly, what of the Turing Test? In the 21st century, our communications are increasingly with machines (whether we like it or not). Texting and messaging have dramatically changed the form and expectations of our communications. For example, abbreviations, misspellings and dropped words are now almost the norm. The same considerations apply to art forms as well.

The authors also pose the question: Who is the artist?

Thinking about art forms leads naturally to another question: who is the artist? Is the person who writes the computer code that creates sonnets a poet? Is the programmer of an algorithm to generate short stories a writer? Is the coder of a music-mixing machine a DJ?

Where is the divide between the artist and the computational assistant and how does the drawing of this line affect the classification of the output? The sonnet form was constructed as a high-level algorithm for creative work – though one that’s executed by humans. Today, when the Microsoft Office Assistant “corrects” your grammar or “questions” your word choice and you adapt to it (either happily or out of sheer laziness), is the creative work still “yours” or is it now a human-machine collaborative work?

That’s an interesting question and one I asked in the context of two ‘mashup’ art exhibitions in Vancouver (Canada) in my March 8, 2016 posting.

Getting back to back to Dartmouth College and its Neukom Institute Prizes in Computational Arts, here’s a list of the competition judges from the competition homepage,

David Cope (Composer, Algorithmic Music Pioneer, UCSC Music Professor)
David Krakauer (President, the Santa Fe Institute)
Louis Menand (Pulitzer Prize winning author and Professor at Harvard University)
Ray Monk (Author, Biographer, Professor of Philosophy)
Lynn Neary (NPR: Correspondent, Arts Desk and Guest Host)
Joe Palca (NPR: Correspondent, Science Desk)
Robert Siegel (NPR: Senior Host, All Things Considered)

The announcements will be made Wednesday, May 18, 2016. I can hardly wait!

Addendum

Martin Robbins has written a rather amusing May 6, 2016 post for the Guardian science blogs on AI and art critics where he also notes that the question: What is art? is unanswerable (Note: Links have been removed),

Jonathan Jones is unhappy about artificial intelligence. It might be hard to tell from a casual glance at the art critic’s recent column, “The digital Rembrandt: a new way to mock art, made by fools,” but if you look carefully the subtle clues are there. His use of the adjectives “horrible, tasteless, insensitive and soulless” in a single sentence, for example.

The source of Jones’s ire is a new piece of software that puts… I’m so sorry… the ‘art’ into ‘artificial intelligence’. By analyzing a subset of Rembrandt paintings that featured ‘bearded white men in their 40s looking to the right’, its algorithms were able to extract the key features that defined the Dutchman’s style. …

Of course an artificial intelligence is the worst possible enemy of a critic, because it has no ego and literally does not give a crap what you think. An arts critic trying to deal with an AI is like an old school mechanic trying to replace the battery in an iPhone – lost, possessing all the wrong tools and ultimately irrelevant. I’m not surprised Jones is angry. If I were in his shoes, a computer painting a Rembrandt would bring me out in hives.
Advertisement

Can a computer really produce art? We can’t answer that without dealing with another question: what exactly is art? …

I wonder what either Robbins or Jones will make of the Dartmouth competition?