|Photo by Aditya Saxena on Unsplash|
Its creators speculate that VALL-E could be used for high-quality text-to-speech applications, speech editing where a recording of a person could be edited and changed from a text transcript (making them say something they originally didn't), and audio content creation when combined with other generative AI models like GPT-3.At first the fakes will be detectable but the whole point of AI is that it will improve. Combining this with tools for text, photo and video generation and the potential for governments, corporations, political parties, extremists and conspiracy theorists is enormous. Just because we can develop this technology doesn't mean that we should, to paraphrase the famous quote from Jurassic Park. Do we really want to open this box? Can't we just step back?
"Since VALL-E could synthesize speech that maintains speaker identity, it may carry potential risks in misuse of the model, such as spoofing voice identification or impersonating a specific speaker. To mitigate such risks, it is possible to build a detection model to discriminate whether an audio clip was synthesized by VALL-E. We will also put Microsoft AI Principles into practice when further developing the models."'
So what happens when AI becomes increasingly smarter and we can no longer trust what we read, hear or see? In case you wondered, I actually wrote this myself.