Begin typing your search...

    Can a Synthetic Voice Be Taught to Sing Opera?

    The voice, an uncanny combination of expressive directness and can’t-quite-place-it strangeness, moved from one note to the next with the slick flexibility of rubber.

    Can a Synthetic Voice Be Taught to Sing Opera?
    X

    Mark Steidl, the star and co-librettist of “The Other Side of Silence,” which was performed in a workshop 

    Corinna da Fonseca-Wollheim

    The opera opened with amplified breathing: gasps, hisses, labored inhalations. A string quartet introduced spidery harmonics that consolidated into brighter chords and were joined, over time, by radiant voices. Exuberantly lyrical, their lines unfolded in stark contrast with those of the protagonist, who, strapped into a wheelchair center stage, had thus far contributed only some short comments in the machine tones of a text-to-speech synthesizer.

    But then the synthetic voice began to sing, cutting through acoustic textures with a sound profile all of its own. In the upper register it seemed to combine the timbre of a boy soprano with a brushed metal finish, while the lower range had some of the compressed warmth of a countertenor. The voice, an uncanny combination of expressive directness and can’t-quite-place-it strangeness, moved from one note to the next with the slick flexibility of rubber.

    The voice belonged to Mark Steidl, the star and co-librettist, with Katherine Skovira, of “The Other Side of Silence.” The first act of this opera, composed by Robert Whalen, was presented last week in a public workshop at the Experimental Media and Performing Arts Center at Rensselaer Polytechnic Institute in Troy, New York.

    Steidl has cerebral palsy and speaks through an augmentative and alternative communication (or AAC) device, which can make ordinary interactions painfully slow. Making space for underrepresented voices has become a stated priority for much of the opera world. To tell the story of a nonspeaking disabled character in “The Other Side of Silence,” a team of creators, researchers and software developers had to first learn how to engineer the voice itself.

    The opera’s creators believe that Steidl’s singing voice is the first case of a generative synthetic voice taught to sing opera. While “The Other Side of Silence” depicts a disabled person’s struggle for creative flourishing and agency, the underlying theme of opportunity and fear in the age of artificial intelligence has a Faustian resonance that fits comfortably into this art form’s canon.

    In the work, which is being developed with Opera Saratoga, Zari, a nonspeaking, nonbinary character based on Steidl, is heavily dependent on a team of caregivers, including a mother who chafes at her child’s gender identity. Zari decides to move into an experimental smart home in a remote desert, run by an AI entity called the Chimera that promises unprecedented independence. But with access to Zari’s thoughts, the Chimera begins to take over their decision-making and, in a medical crisis, intervenes in ways that leave Zari’s mind altered forever.

    When I asked Steidl in an email what singing means to them, the answer was simple: “Singing is self-expression.” But as an activist, Steidl is also keen to show the importance of improving the mechanical speaking voices that are still standard on most alternative communication devices, and which lack the nuance that gives a human voice its individuality.

    “I’m always frustrated because my DynaVox is monotone,” Steidl said, referring to their AAC device. “Because of my sass, I would like to show more emotions. When I say, ‘Darling it’s lovely to see you, may I please have a friendly kiss on each cheek?,’ my DynaVox Maestro isn’t as flamboyantly gay as I am.”

    Whalen, the opera’s composer, said in an interview that he was stunned to learn that while there are some 2 million AAC users in the United States, they can choose from only six standard voices. Companies like Vocal ID are beginning to offer customized options, but those are often prohibitively expensive to those who need them. While teaching an alternative communication device to sing opera may seem like a luxury next to the immediate need to make basic conversations less laborious, some form of musicality is essential to any solution. In the field of synthetic voices, Whalen said, “expression and prosody are always the holy grail.”

    In creating the opera, Whalen and his colleagues turned to Dreamtonics, a Tokyo-based specialist in vocal synthesis. Kanru Hua, the company’s founder and chief executive, said in an interview that an initial challenge was to overcome the software’s bias toward the sound of pop music.

    The majority of users of Hua’s real-time voice transformation tool favor a breathy pop sound. And the algorithm relies on users to rate the naturalness of any synthetic voice. “But people have their own standard of what is natural and what is not,” he said, “and most people are not used to the classical genre.”

    Designing Steidl’s singing voice, Hua said, meant coming up with “something that is both recognized as genuine — a weird word in this context, but that sounds like a real professional opera singer — and something that Mark would eventually be able to accept as a substitute for their own voice.” The final product was a collaboration between Dreamtonics, Steidl and members of the team at Rensselaer.

    Toward the end of the first act, Zari’s voice has a duet with the Chimera, which was sung with cool poise by mezzo-soprano Theo Vizcaino-Hayes. At one point during the workshop performance, their lines approached unison, the tiny dissonance creating shimmering tension. It felt like the musical equivalent of the uncanny valley, the state of unease engendered when a machine comes close to appearing natural.

    The challenge of how to bridge the final gap for synthetic voices will occupy scientists for a while to come. Along the way, opera does feel like an apt forum in which to explore the ethical dilemmas and expressive aspirations behind engineered voices, and as living proof of what the human voice can do. As Whalen said, the ultimate goal in vocal synthesis is to “create something that behaves like it’s moving on air.”

    NYT Editorial Board
    Next Story