One of the more unexpected products to launch out of the Microsoft Ignite 2023 event is a tool that can create a photorealistic avatar of a person and animate that avatar saying things that the person didn’t necessarily say.
Called Azure AI Speech text to speech avatar, the new feature, available in public preview as of today, lets users generate videos of an avatar speaking by uploading images of a person they wish the avatar to resemble and writing a script. Microsoft’s tool trains a model to drive the animation, while a separate text-to-speech model — either prebuilt or trained on the person’s voice — “reads” the script aloud.
“With text to speech avatar, users can more efficiently create video … to build training videos, product introductions, customer testimonials [and so on] simply with text input,” writes Microsoft in a blog post. “You can use the avatar to build conversational agents, virtual assistants, chatbots and more.”
Avatars can speak in multiple languages. And, for chatbot scenarios, they can tap AI models like OpenAI’s GPT-3.5 to respond to off-script questions from customers.
Now, there are countless ways such a tool could be abused — which Microsoft to its credit realizes. (Similar avatar-generating tech from AI startup Synthesia has been misused to produce propaganda in Venezuela and false news reports promoted by pro-China social media accounts.) Most Azure subscribers will only be able to access prebuilt — not custom — avatars at launch; custom avatars are currently a “limited access” capability available by registration only and “only for certain use cases,” Microsoft says.
But the feature raises a host of uncomfortable ethical questions.
One of the major sticking points in the recent SAG-AFTRA strike was the use of AI to create digital likenesses. Studios ultimately agreed to pay actors for their AI-generated likenesses. But what about Microsoft and its customers?
I asked Microsoft its position on companies using actors’ likenesses without, in the actors’ views, proper compensation or even notification. The company didn’t respond — nor did it say whether it would require that companies label avatars as AI-generated, like YouTube and a growing number of other platforms.
Personal voice
Microsoft appears to have more guardrails around a related generative AI tool, personal voice, that’s also launching at Ignite.
Personal voice, a new capability within Microsoft’s custom neural voice service, can replicate a user’s voice in a few seconds provided a one-minute speech sample as an audio prompt. Microsoft pitches it as a way to create personalized voice assistants, dub content into different languages and generate bespoke narrations for stories, audio books and podcasts.
To ward off potential legal headaches, Microsoft’s requiring that users give “explicit consent” in the form of a recorded statement before a customer can use personal voice to synthesize their voices. Access to the feature is gated behind a registration form for the time being, and customers must agree to use personal voice only in applications “where the voice does not read user-generated or open-ended content.”
“Voice model usage must remain within an application and output must not be publishable or shareable from the application,” Microsoft writes in a blog post. “[C]ustomers who meet limited access eligibility criteria maintain sole control over the creation of, access to and use of the voice models and their output [where it concerns] dubbing for films, TV, video and audio for entertainment scenarios only.”
Microsoft didn’t answer TechCrunch’s questions about how actors might be compensated for their personal voice contributions — or whether it plans to implement any sort of watermarking tech so that AI-generated voices might be more easily identified.
For more Microsoft Ignite 2023 coverage:
This story was originally published at 8am PT on Nov. 15 and updated at 3:30pm PT.
Source link