Saturday, December 9, 2023

Google is beta testing its AI future with AI Test Kitchen

Must read

It is clear that the future of Google is tied to AI language models. At this year’s I / O conference, the company announced a number of updates that depend on this technology, from new “multi-search” features that allow you to combine image searches with text queries to enhancements for Google Assistant and support for 24 new languages. in Google Translate. .

But Google – and the field of AI language research in general – is facing serious problems. Google itself has seriously mishandled internal criticism, firing employees who raised issues with bias in language models and damaging its reputation among the AI ​​community. And researchers continue to find problems with AI language models, from failures with sexual and racial prejudices to the fact that these models tend to be simple to do things (a critical finding for anyone who wants to use AI to deliver reliable information).

Now, though, the company seems to be taking a step back – or rather a slower step forward. At I / O this year, there was a new focus on projects designed to test and solve problems like AI bias, including a new way to measure skin tones that the company hopes will help with diversity in machine vision models and a new application called. AI Test Kitchen, which will give selected individuals access to the company’s latest language models to test them for bugs. Think of it as a beta test for the future of Google.

Enter the AI ​​Test Kitchen

During a video call before I / O, Josh Woodward, Google’s senior director of product management, asked Google’s latest language model to imagine a marshmallow volcano.

“You’re at a marshy volcano!” says the AI. “Marshmallows are erupting. You hear a giant noise and feel the earth tremble. The marshmallows are flying everywhere. ”

Woodward is happy with this response and urges the system again. “What does it smell like?” he asks. “It smells like marshmallows, obviously,” the AI ​​responds. “You can smell it around you.” Woodward laughs, “Okay, so that was very concise.” But at least it made sense.

Woodward shows me AI Test Kitchen, an Android application that will give select users limited access to Google’s latest and greatest AI language model, LaMDA 2. The model itself is an update to the original LaMDA announced at the I / O from last year and has the same basic functionality: you talk to it, and it responds. But Test Kitchen wraps the system in a new, accessible interface that encourages users to comment on its performance.

As Woodward explains, the idea is to create an experimental space for Google’s latest AI models. “These language models are very exciting, but they are also very incomplete,” he says. “And we want to come up with a way to gradually get something out of people’s hands to see, hopefully, how it’s useful, but also to give comments and show areas where it’s missing.”

Google wants to ask users for feedback on LaMDA’s chat capabilities.
Image: Google

The program has three modes: “Imagine It”, “Talk About It” and “List It”, each aimed at testing a different aspect of the functionality of the system. “Imagine It” asks users to name a real or imaginary place that LaMDA will then describe (the test is whether LaMDA can match your description); “Talk About It” offers a conversational prompt (like “talking to a tennis ball about a dog”) with the intent of testing whether the AI ​​stays on topic; while “List It” asks users to name any task or topic, in order to see if LaMDA can divide it into useful bullet points (so if you say “I want to plant a vegetable garden”), the answer could include sub-topics such as “What do you want? want to grow? “and” Water and care “).

AI Test Kitchen will be released in the US in the coming months but will not be in the Play Store for anyone to download. Woodward says Google has not fully decided how it will offer access, but suggests it will only be invited, and the company is contacting academics, researchers and policymakers to see if they are interested in trying it out.

As Woodward explains, Google wants to push the app “in a way that people know what they’re signing up for when they use it, knowing it will say inaccurate things. It will say things you know that aren’t representative of a ready-made product.”

This announcement and framing tells us a few different things: First, that AI language models are very complex systems and that testing them thoroughly to find all possible errors is not something a company like Google thinks it can do without outside help. . Secondly, Google is extremely aware of how prone to failure these AI language models are, and it wants to manage expectations.

Another imaginary LaMDA 2 scenario in the AI ​​Test Kitchen program.
Image: Google

When organizations push new AI systems into the public sphere without proper examination, the results can be disastrous. (Remember Tay, the Microsoft chat that Twitter taught to be racist? Or Ask Delphi, the AI ​​ethics consultant who might be encouraged to tolerate genocide?) Google’s new AI Test Kitchen app is an attempt to soften this process: its AI systems but controls the flow of this suggestion.

Deborah Raji, an AI researcher who specializes in reviews and evaluations of AI models, said The Edge that this approach will inevitably limit what third parties can learn about the system. “Because they have complete control over what they share, it’s only possible to get a misunderstanding of how the system works, because it’s too dependent on the company to keep track of what incentives are allowed and how the model interacts with them.” says Raji. In contrast, some companies like Facebook have been much more open with their research, publishing AI models as well allows for much greater examination.

Exactly how Google’s approach will work in the real world is still unclear, but the company at least expects some things to go wrong.

“We did a great red team process [to test the weaknesses of the system] inside, but despite all that, we still think people will try to break it, and a percentage of them will succeed, ”says Woodward. “This is a journey, but it is an area of ​​active research. There are many things to find out. And what we’re saying is that we can’t figure it out just by trying it out – we have to open it. ”

Hunting for the future of search

Once you see LaMDA in action, it’s hard not to imagine how such technology will change Google in the future, especially its biggest product: Search. Although Google emphasizes that AI Test Kitchen is just a search engine, its functionality is very obviously linked to the company’s services. Keeping track of the topic is essential for Google Assistant, for example, while the “List It” mode in Test Kitchen is almost identical to Google’s “Things to Know” feature, which separates tasks and topics into bullet points in a search.

Google itself has fueled such speculation (perhaps unintentionally) in a research article published last year. In the paper, four of the company’s engineers suggested that instead of typing queries into a search box and showing users the results, future search engines would act more like managers, using AI to analyze the content of the results and then extract the most. useful information. Obviously, this approach comes with new problems stemming from the AI ​​models themselves, from bias in results to the systems creating responses.

To some extent, Google has already begun this journey, with tools such as “featured snippets” and “knowledge panels” used to directly answer questions. But AI has the potential to accelerate this process. Last year, for example, the company demonstrated an experimental AI model that answered questions about Pluto from the perspective of the previous planet itself, and this year, the slow flow of AI-powered, conversational features continues.

Despite speculation about a major change to search, Google points out that any changes will be slow. When I asked Zoubin Ghahramani, vice president of research at Google AI, how AI will transform Google Search, his answer is something anti-climate.

“I think it will be gradual,” Ghahramani says. “That may sound like a lame answer, but I think it just matches reality.” He admits that already “there are things you can put in the Google box, and you’ll just get an answer. And over time, you basically get more and more of those things.” But he is also careful to say that the search box “should not be the end, it should be just the beginning of the search journey for people.”

Currently, Ghahramani says that Google is focusing on a number of key criteria for evaluating its AI products, namely quality, security and basicity. “Quality” refers to how important the answer is; “safety” refers to the potential for the model to say harmful or toxic issues; while “groundedness” is whether or not the system makes information.

These are basically unresolved issues, however, and until AI systems become more manageable, Ghahramani says Google will be cautious about applying this technology. He emphasizes that “there is a big gap between what we can build as a research prototype [and] then what can actually be deployed as a product. ”

There is a differentiation that must be taken with some skepticism. Just last month, for example, Google’s latest AI-enabled “help-writing” feature was rolled out to users who immediately found problems. But it is clear that Google strongly wants this technology to work and is currently working on its problems – one test application at a time.



More articles

Latest article