The Shifting Scope of Assumed Privacy with LLMs

10 years ago a CEO standing on stage making a joke about someone’s web query would have been shocking instead of funny. This is because of major backlash that happened 20 years prior when AOL open sourced the query logs of 650000 random people to developers resulting in journalists using the query list to track down individuals with personally identifiable information in the logs. Those affected AOL users did not know they were in this now-public repository. But now our assumptions of scope of privacy are significantly shifted because of the era of social media that followed in the ensuing decade. The advent of micro-blogging along with vertical tools for yelping our food, foursquaring our shopping habits, tweeting our quips or instagramming our lifestyles expanded the scope of where the cameras and public visibility approached closer into our personal sphere. We came to be familiar with the concepts of privacy in a narrower scope of our private daily lives. But there is still confusion in tools like browsers where people assume their behaviors are not being broadcast in real time to multiple parties.

A few years ago a data leak happened with GPT exposing the conversations of its customers. OpenAI hadn’t made any assertions that conversations made in its interface would be private. But some users expected that typing into a question and answer dialog on the service was a personal dialog in a private space. Sam Altman, one of the founders of OpenAI and its current CEO, took care in his future product release teasers to encourage people to log out of GPT if they planned to type anything of a personal or confidential nature to dispel this assumption of privacy. With that series of disclosures we know that we can no longer assume that just because we are using an https protected query with an open window on the web that our words aren’t being blasted out to thousands of servers like a modern-day reprise of The Truman Show. The company has rolled out paid accounts that do offer privacy firewalls and sandboxing for individuals and businesses now, along with assertions that confidentiality can be protected in those paid account contexts. But advertising is about to be integrated into GPT deployments such that text you type into the window can be paired with targeting and profile cookies that enable ad serving on the GPT service or on other sites that support the GPT advertising targeting parameters, which are as of yet not announced.

If you haven’t been following the slippery slope of privacy scope degradation, please adjust where you assume your “assumption of privacy” begins for this new context. Social network activity is obviously public declarations, we can’t assume privacy there. While mail and messaging platforms may not have direct access to the messages you send, they may have the ability to leverage targeting parameters that come from AI summaries or overviews that are derived from those messages. LLM interfaces like GPT, Gemini and Claude give an illusion of one to one discussions that many mistakenly assume are as private as an SMS message, while they are not. The New York Times recently cited legal cases where defendants had disclosed details of an alleged crime assuming that GPT was a confidant which would keep the disclosure confidential under “attorney client privilege.” Judges had to clarify that GPT is not an attorney, even though it may quote like one. Smart speakers, smart cars and web-enabled cameras have all been subject to subpoena in recent years to disclose details within people's homes, which had previously been considered areas under the umbrella of assumed privacy implied by being inside a person's home. Yet the servers they broadcast to, are not. In this new phase of the AI digital age, these assumptions need to be adjusted. So we should take heed of the CEOs who warn us about assuming privacy when using their tools to discuss personal or confidential matters.

Proprietary and confidential “data exfiltration” is a risk not just to our sense of privacy, it’s a erosion in our assumptions of trust more broadly. As the AI diaspora increases, the vulnerability to disclosure vectors will be gradually reduced by virtue of there being a broader distribution of devices that will lessen the dispersion risk of any single attack or single common point of broad vulnerability. Industry developers are rushing to bring privacy-sandboxing, RAG databases or proprietary fine-tuned LLM models to individuals' devices and small businesses. It's best at present to consider any interaction with an LLM to be as public as an interaction on a social network until these firewalls and centralized networks of vulnerability are defended. And naturally, you can assume that any app built atop a leading LLM doesn't fundamentally adhere to the confidentiality and privacy policies of the underlying LLM vendor that powers them. So disclose carefully, as we never know how many thousands or millions of people we are talking to when we talk to what we assume is one machine.

ncubeeight

Search This Blog

The Shifting Scope of Assumed Privacy with LLMs

Comments

Post a Comment

Popular posts from this blog

Far-seeing Devices for Accessibility

The Momentum of Openness - My Journey From Netscape User to Mozillian Contributor

“Novel view synthesis” fine in photos or grief bots perhaps, but not for science bots