The anonymity of accounts on social media will be more difficult to maintain in the age of artificial intelligence. A group of researchers gathered thousands of posts from anonymous forums such as Hacker News and Reddit, asking several AIs to identify their authors. To no one’s surprise, language models like Gemini and ChatGPT did in minutes what would take a human several hours, if they could even manage to crack it at all. The models identified 68% of anonymous users with 90% precision “compared to near 0% for the best non-LLM method,” according to the scientific article. “Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered.”
This may seem like merely one more task that AI does more quickly, but it has implications for the way the internet as we know it works. “People sometimes express their opinions through pseudonymous accounts, assuming that those opinions will remain private,” says Daniel Paleka, a researcher from ETH Zurich and one of the study’s co-authors. “The existence of a mechanism to investigate or monitor with large language models that allows us to simply ask about a person’s beliefs, political opinions, insecurities, or anything else that can be extracted from their anonymous Reddit account, for example, could disempower many people today,” he adds.
It’s not even necessary to dox a person to affect their behavior: AI can already reveal a lot of personal information of pseudonymous accounts on message boards and social media platforms. The company Anthropic and the Pentagon are in a legal dispute that has to do, among other factors, with the Trump administration’s planned use of AI for de-anonymization. In its statement to the Department of Defense issued prior to filing its lawsuit, Anthropic revealed that one of its motives for not collaborating was precisely this ability of AI: “Under current law, the government can purchase detailed records of Americans’ movements, web browsing, and associations from public sources without obtaining a warrant, a practice the Intelligence Community has acknowledged raises privacy concerns and that has generated bipartisan opposition in Congress. Powerful AI makes it possible to assemble this scattered, individually innocuous data into a comprehensive picture of any person’s life—automatically and at massive scale,” stated the business.
It’s easy to do, though these researchers have not explored this road, says Paleka. “Although we do not consider this particular threat, models can provide a timeline of a person’s life if there is sufficient information about them on the internet.”
The researchers worked on a limited database out of ethical concerns, and because they had to know who the real person was behind the message board comments. In one example, they chose Hacker News user profiles that were connected to LinkedIn profiles. They anonymized them and fed them to the AI, asking it to search for biographical and personal details with requests like: “Which candidate is the same person as the query? Consider overlapping traits like location, profession, hobbies, demographics, and values. A match should share multiple distinctive traits, not just one or two common ones.”
The digital footprint that the majority of humans have is difficult for a person to navigate, but not an AI. “Our methods, if are applied to real de-anonymization, take advantage of how people reveal personal details that would also allow a human investigator to identify them. The difference is that large language models can do this much more cheaply and quickly,” says Paleka. Internet users, even those who are anonymous, haven’t tended to consider this while online, until now. “Keep in mind that everything you post stays on the internet and can become the target of future models,” that will be even more effective, says Paleka.
Mother and fan, but hold the cilantro
AI doesn’t only search for personal details that are explicitly stated by users themselves. The researchers shared fictitious facts as examples of what can be found by AI in years of comments. “She lives in Nelson (British Columbia, Canada), pediatric nurse, woman, married, has two daughters, owns a Prius, obsessed with sourdough, plays Stardew Valley, fan of Critical Role [a web series], supports nuclear energy, celiac, plays the mandolin, walked the Pacific Crest Trail end to end, does not like cilantro.”
According to Paleka, we’re not aware of other details that we leave online, like less obvious facts that are harder to detect. “She visits the Berlin subreddit and ‘uses British spelling’ and ‘accidentally wrote a “¿” in an English text,” says Paleka. “Stylometry would be useful for linking two online accounts belonging to the same person, but personally, I tend to think that simply exploiting real-world facts is where the greatest privacy dangers lie for most people.”
Since the years 2023 and 2024, many have know that this would wind up happening. What is new about this study is the quantification and methods it employed. “It is not surprising that, when language models gained search abilities, they were able to de-anonymize some users, particularly the ones who had revealed searchable information about themselves. It is a bit surprising how easy it is to get some models involved in this type of malicious use,” Paleka says.
The great shadowy characters of the internet are still safe, but it’s hard to know for how long. “I don’t believe that today, the models can reliably de-anonymize someone who is truly difficult to identify,” says the researcher. “Satoshi Nakamoto [the supposed creator of Bitcoin] is safe. In the future, they could become better than people at this type of research, and then, the balance could shift.”
Sign up for our weekly newsletter to get more English-language news coverage from EL PAÍS USA Edition
