I've been reading about the opportunities and perils of chatbot technologies recently. This is in part spurred by books written by Karen Hao and Sarah Wynn-Williams about industry players in the sector and in-part inspired by the recent articles on psychological peril for young people engaging with chatbot apps discussed in recent news. Separately, I have also been exploring approaches to capture 3 dimensional holograms called Gaussian Splats. Gaussian Splat synthesis does not use neural network stable diffusion. The two approaches seem metaphorically similar though. One is generative, one subtractive. One helps you see the real world with greater clarity, the other can be used to create fictional images. So I've been thinking about this boundary of truth enhancing and truth abstracting. My views aren't so much about the software approaches themselves, but rather what people can and tend to do with them. So this is a philosophical opinion more than a technical one.
Gaussian Splats stitch together thousands of photographic angles to create a seamless 3D space that can simulate motion in the space with time abstracted, even to real perspectives not in the original photographs. It’s like an average of millions of perspectives when only 10,000 points in the surface reality were sampled. The rest of the perspectives are then extrapolated to infer intermediary views between the sampled images. It's akin to sculpting a cloud of photon radiance fields of what appears to be present in the original scene while subtracting any light artifact that doesn't have confirmation of presence from multiple perspectives. The true form becomes more accurately depicted the more views it's seen from as any aberration in the image is smoothed away. The "novel views" that this approach generates are the perspectives inferred and rendered in the output "ply" file as what the scene likely would appear as given the consistencies observed from other angles. There is nothing dishonest in averaging 10,000 views to guess at the novel view between those points. Gaussian splats are not generative AI like stable diffusion is. But the inference between multiple points of view is analogous. Stable diffusion in photo processing removes fidelity in an image through blurring then attempts to rebuild the connections formerly-blurred to establish a more refined image of what is actually in the image. It can sharpen and enhance images captured. It can also be used to generate entirely new images from sampled pixels without identifying what the subjects in the image are. So it is used in generative AI cases to synthesize images that are not just the averages of perspectives.
Generative AI is in heated critique these days for the risk, and in some cases the fact, of
propagating disinformation beyond cosmetic image creation and creative writing tools. In the context of generating emoticons or
proof reading grammar, the benefits to the user is measurable while any
harmful effects are negligible. When they're used to deceive the general population or affect significant points of pubic discourse, their risk is societal instead of personal. When a chatbot is set up to act as an authority in psychological or medical contexts, it can lead the otherwise uninformed user to take actions that harm themselves and thereby significantly impact the community around them. Going down this path of research, you'll come to the topic of grief bots like the original Replika, which was based on the writings of a deceased person to allow the founder to get the sense of presence of a friend they once had. A Scientific American writer, David Berreby, compared grief bots to other ways that people memorialize their loved ones. Using a grief bot to converse with a specific identity of a loved one, doesn't create a new relationship with them. It's pure revelry in a memory. In cases of sudden loss, a grief bot may assuage the sense of sudden departure in a similar way to reading past letters from a departed loved one.
These themes of synthesized real-identities are coming together for me in light of the issue of “reanimated past” in the new patent published by Meta on reanimating images or content from dormant social media accounts.
"The company was granted a patent in late December that outlines how a large language model can 'simulate' a person's social media activity, such as responding to content posted by real people. 'The language model may be used for simulating the user when the user is absent from the social networking system, for example, when the user takes a long break or if the user is deceased,' the patent says." (Credit Business Insider's Sydney Bradley)
If this were a kind of auto-responder similar to the suggested text responses of a messaging platform for an active user, that would be a conventional and expected tool. To have it function without personal preference asserted by the depicted person, or their representatives, goes too far in my view. Would there be the equivalent of an organ donor's card for preserving and re-animating digital identity before a person's death that would govern what was acceptable re-depiction post-mortem? Without explicit consent of the source-human, a tool like this risks fictionalizing real people to becoming dishonest abstractions of themselves outside of their control.
Harrison Ford hiring a studio to de-age his face for a sequel of Raiders of the Lost Ark is acceptable in Screen Actors Guild agreements on AI use in cinema because Harrison is the one signing the contract for the film's release and he is an actor being modified by the generative AI. Having studios or platforms spit out synthetic resuscitations of inactive or deceased people without any review or consent from the person or their representatives isn't acceptable in my view. Expecting people to police their identity or their deceased loved ones' identities with legislated take-down mechanisms of the Digital Millennium Copyright Act is never going to keep pace with the amount of unauthorized likeness abuse that is going to proliferate in the coming decade.
The issue that I find provocative and concerning is the idea of re-animation of people who haven’t consented to the repurposing of their public statements. Allan Watts Organization is an interesting example. The organization licenses Watts's past recordings in new formats including an upcoming interactive AI. The organization is established in conjunction with the relatives of the Watts family. Watts published many recordings of his philosophical lectures during his lifetime. These are being integrated copyright-cleared by his relatives for inclusion in films, music and other new works that perpetuate his teachings to a new generation. I am ideologically for this kind of angle of preserving words as they were spoken by a public figure.
Then there is another side of this which is the repurposing of public statements that remix and alter the meaning of someone’s past statements. I came across this while listening to what seemed to be lectures on physics made by the scientist Richard Feynman. I was enjoying some of these lectures. The content was already familiar to me from my past physics studies.
Then another video surfaced, seemingly by the same people. My father pointed out that there was something wrong with it. Richard didn't sound like Richard from his Caltech lectures. In this video he seemed angry and insistent, very different from how he sounded in his lectures during his life. There were a couple of other odd things about it. The sentences in the video did not seem to be his literal words. In fact, they seemed to follow another book I had read once that was not written by Richard Feynman, but referred to various quantum mechanical experiments conducted during his life. "Had someone else borrowed Feynman's image and created a voice-synthesized version of him?" I wondered. Then the video started branching into topics I suspect Feynman did not have access to during his life and likely didn't lecture on. "Was someone trying to associate Feynman with their favorite cosmology theory?" I wondered further. Then fake-Feynman said that atoms were not like planets orbiting the sun in the solar system. And the video started showing planets in orbit continually to refer to atomic nuclei the rest of the video. That's when I realized that this was just careless AI slop. The channel owner (who obfuscates their own identity) had used an LLM to misappropriate Feynman's image and voice likeness to give the channel an air of authority and was selling ads against the videos for profit. The channel had just popped into existence a few months before and hadn't yet been found by the Feynman family for a DMCA take-down. The channel's host also admits the video is fake in its channel summary and included the image below in subsequent posts, perhaps so that the rest of their videos wouldn't be flagged to YouTube DMCA team. In some of their videos the disclaimer is hidden in the transition from pre-roll ads and channel static image of the cover image they'd uploaded showing Feynman. If I backed up the video to the end of the pre-roll ad, I could see this image below. And the anonymous profile owner's main page states blatantly that the channel only exists to appropriate Feynman's identity.
("This channel isn't officially connected to Richard Feynman or his estate. We're here to share his incredible way of explaining physics with a new generation who never got to learn from him directly. This isn't his voice — it's our tribute to his teaching style, created purely for education and inspiration. This voice is his AI voice clone. No impersonation intended, just deep respect for one of history's greatest teachers.") (Credit @imaginethePhysics)
So because they mean well forgery should be overlooked? Couldn't they have used their own authoritative voice to render an opinion on the topic? US copyright law allows for reference to others' work in fair use cases including parody. But this fake persona would likely not stand up in court if the Feynman estate could actually identify the channel publishers to serve them a cease and desist request. This seems very similar to the Scarlett Johansson lawsuit against OpenAI for creating their "Samantha" voice character inspired by her film Her by Spike Jonez about a virtual companion bot. But there was something more insidious about the Feynman videos. If they were literal recordings of things Feynman had actually said, I'd consider them from a different perspective. But because his altered voice print was overlaid on text that were not direct quotes of his lectures, they could make future listeners think Feynman said things he actually hadn't. Fictionalizing science is a dangerous approach, no matter what the anonymous forgers were intending.
When I found I was listening to a "Frankensteined" unauthorized replica of Richard Feynman in a "made-for-ads" video (this is the way Google refers to these kinds of content generators) I had to stop and look behind the veil. There are dozens of these sites in YouTube with various names that hide the people behind them, ripping off LLM audio samples and generating more channels for the Feynman estate to hunt down. Then, reading comments from the channel's authors, you find that there are other copycat channels they claim are copycatting them, who further misappropriate the fake-Feynman and create other channels built for ads. There are more fake Feynman channels on Youtube than their are real Feynman lectures!
Building monuments to people who’ve impacted our lives is a great thing. The celebration of Allan Watts’s public lectures in their original form is totally fine by me. But bending the meaning, as I expect is happening in the Feynman lectures, is a risky thing in my opinion. Watts was talking about very vague philosophical thoughts. Feynman was lecturing about very precise and experimentally verified science. Fabricating someone’s words in a chatbot can be used to bend perception and deceive future listeners about the literal truth or experimental science that was at the source.
Trying to grapple with my own objections and discomfort about these two similar cases, I realized that it was the context of delivery that matters most. If I had an LLM like Replika, that I knew to be a fake representation of a person that I seek out, I can accept the context of what it says as being an inference on a past textual database. Having words of a past-alive person written out as text and spoken with a simulation of their voice in a YouTube channel runs the risk of bending people’s perception away from what that real person actually said. Among viewers, it will likely be the misappropriated text that will be remembered and possibly later cited as truth, when it isn't.
There are matters of copyright infringement involved here too, as evidenced by the Feynman estate’s wrangling with the channel's violation of the Caltech license. The channel’s author is contesting that with an assertion that Feynman’s voice print is fair use and that they have the right to fuse them with published essays and lectures from his university teachings because the originals had been posted to the internet. I disagree that voice-print is something that can be re-used outside of consent of the speaker.
I realize we are at risk of losing influence of how our words are represented after our life span. This happened back when the MP3 revolution took place. Famous bands like Led Zeppelin couldn’t be heard on streaming music players because the rights for the replay couldn’t be granted by defunct past band. But cover artists were getting lots of listens and downloads because they were still alive and could grant streaming rights. It made me think that the future is going to be built on derivative misappropriations of the past as we are convinced by apps and tools to take content presented at face value without checking the validity of derivation.
Being lost into obscurity to the future may seem a sad thing for bands and people. So maybe the hope of others extending their stories into new platforms is a good thing to some readers. My concern is that when someone is speaking about their actual views, their words should be preserved as they spoke them. Novel view synthesis in certain contexts, like Gaussian Splats is a better approximation of original situational truth. We see the source better through the holographic lens of the Gaussian Splat capture. Bending the meaning of words with critical rightness/wrongness at stake, such as scientific topics, and borrowing the semblance of someone’s voiceprint to give it the air of authority is risky to both posterity and the person whose language is being forged.
The DMCA's take-down process is much slower than the GenAI-based myth making can propagate new moles to whack. Trust in sources can easily be thrown into doubt by a discerning listener. But not everyone is discerning in the social media and news context. New legislation is coming into effect in the EU to prohibit deepfakes by expanding what counts as copyright enforcement on personal identity in public view and past public appearances. But this is not an issue and a risk perpetuated by many people. It's a small group of actors behind the broad cloud of content generation they use to perpetuate their deceptions.
It is interesting to consider what are the right ways to learn about quantum mechanics. Is it bad to impersonate a famous scientist to explain scientific ideas? In my opinion the rules we learned about citation and plagiarism should apply even though the media and sources of expression are shifting in the AI era. Doctor Feynman's words should be quoted, or at least direct recordings that are public and licensed should not be altered to affect their meaning in any way. The Feynman family have better things to do than continually watching for forgers who are leveraging Dr. Feynman's voice sample as a mechanism to derive personal profit, even if the topics he spoke of are of general relevance to everyone on our planet. Truths should not be hidden behind anonymous accounts using fake voices otherwise they will become suspect.


No comments:
Post a Comment