Microsoft’s VALL-E 2 can convincingly recreate human voices using just a few seconds of audio, its creators claim.

Microsoft has developed a new artificial intelligence (AI) speech generator that is apparently so convincing it cannot be released to the public.

VALL-E 2 is a text-to-speech (TTS) generator that can reproduce the voice of a human speaker using just a few seconds of audio.

Microsoft researchers said VALL-E 2 was capable of generating “accurate, natural speech in the exact voice of the original speaker, comparable to human performance,” in a paper that appeared June 17 on the pre-print server arXiv. In other words, the new AI voice generator is convincing enough to be mistaken for a real person — at least, according to its creators...