If you spend any time socializing on the Internet, especially via Facebook, you're surely familiar with BuzzFeed, the website best-known for its “list” articles and various quizzes: Which spice are you? Which [character from popular book, movie or TV series] would you be?
Indeed, “BuzzFeed Quiz” has its own Facebook page with over 192,000 “likes.” Its most recent quizzes as of June 25 include which museum you should visit, what your favorite band in high school says about you (prediction: something flattering), and how likely you are to survive the “zombie apocalypse.”
In light of this, perhaps nobody should be surprised to hear what British e-commerce blogger Dan Barker announced on June 24: “BuzzFeed is watching you.”
How do they do that?
Barker identified two different ways BuzzFeed is doing that, which he labeled “The Mundane Bits” and “The Scary Bit.”
The “mundane” news is that, yeah — BuzzFeed is tracking you. Not that they're unique in this regard; the “Do Not Track” movement so far has proven spectacularly unpopular with advertising executives and the majority of websites and browsers.
Barker provided a screenshot of some code (which you probably won't know how to read unless you're very “good with computers”), explaining: “Here’s a snapshot of what BuzzFeed records when you land on a page. They actually record much more than this, but this is just the info they pass to Google (stored within Google Analytics).”
He then translated some of the code into English. Among other things, BuzzFeed is recording whether and how often you've visited their site before; whether you've connected Facebook and BuzzFeed; whether and how often you've shared BuzzFeed links via email, Twitter or other social media; which country you're in “and about 25 other pieces of information.”
Though all of this is, as Barker said, thoroughly “mundane” by Internet standards. The “scary bit” involves those ever-present quizzes:
Most quizzes are extremely benign – the stereotypical “Which [currently popular fictional TV show] Character Are You?” for example. But some of their quizzes are very specific, and very personal.
Here, for example, is a set of questions from a “How Privileged are You?” quiz, which has had 2,057,419 views at the time I write this. I’ve picked some of the questions that may cause you to think “actually, I wouldn’t necessarily want anyone recording my answers here”.
Among other things, those questions ask if you, the quiz-taker, have ever been treated or taken medication for mental health problems, suffered from learning disabilities, contemplated or attempted suicide, been raped or sexually assaulted, experienced racial discrimination, or felt dissatisfied with your gender or sexual identity.
As Barker wrote, “When you click any of those quiz answers, BuzzFeed record all of the mundane information we looked at earlier, plus they also record this:” followed by more code, an explanation of what it means and its implications:
In other words, if I had access to the BuzzFeed Google Analytics data, I could query data for people who got to the end of the quiz & indicated – by not checking that particular answer – that they have had an eating disorder. Or that they have tried to change their gender. Or I could run a query along the following lines if I wished:
Show me all the data for anyone who answered the “Check Your Privelege” quiz but did not check “I have never taken medication for my mental health”.
.... I suspect this particular quiz would have had less than 2 million views if everyone completing it realised every click was being recorded & could potentially be reported on later – whether that data is fully identifiable back to individual users, or pseudonymous, or even totally anonymous.
What do you think?
Barker's blog post got enough attention that within a few hours of it going up, a BuzzFeed executive named Dao Nguyen posted this in the comments:
…. we do not in fact record that it is “you” browsing the site. The string sent to GA is not your username but an anonymized string that is not linked in any way to your account, email address or other personally identifiable information. Also, about 99% our readers are not even logged in.
We are only interested in data in the aggregate form. Who a specific user is and what he or she is doing on the site is actually a useless piece of information for us. We know how many people got Paris or prefer espresso in the Which city would you live in? quiz, but we don’t know who they are or any of their PII.
Yet other commenters on Barker's blog did not seem reassured by Nguyen's remarks. One man posted this in response: “Theoretically, how hard would it be for someone at Buzzfeed to connect someone to their Buzzfeed quiz answers?”
Another person asked Nguyen “If the 'username' string is NOT associated with the individual’s account, then why is the same username string used for two different sessions?” and “Can we interpret your final paragraph as meaning that none of your data analysis requires the username string in order to give you meaningful results? If “who a specific user is” is “useless” to you, then why bother including the username string in the GA data at all? If it’s useless, why not remove it? Contrariwise, if it is unremovable, for what use is it necessary?”
Update -- BuzzFeed responds
We anonymize all usage data and have strict internal policies around only accessing data in the aggregate form.Background:
-About 99% our readers are not logged in, so we do not have a "username" or any PII (personally identifiable information) associated with those quiz takers. For the small number of people who are logged in, we anonymize the data like I mention above. All in all, all usage data is anonymized through this process.-It's actually against Google Analytics' terms of service to store any personally identifiable information (PII).-We are only interested in data in the aggregate form. Who a specific user is and what he or she is doing on the site is actually a useless piece of information for us. We know how many people got Paris or prefer espresso in the Which City would you live in? quiz, but we don't know who they are or any of their PII.After corresponding with Dan Barker, he recently shared some additional thoughts with the Independent here and amended his point of view:
Speaking to The Independent, Barker noted that despite the fact that data had been 'pseudonymised' (ie, assigned random user IDs) "from a technical point of view it would be really easy to link pseudonyms to real users, and is a fairly common practice."
Barker continues: "But BuzzFeed say specifically they do not and, as a fairly transparent company, I would be inclined to take their word for it. It's also worth mentioning that this is a total minefield and lots of website owners don't fully understand what data they're recording.
"For example, looking at an article elsewhere on The Independent, I can see the site loads 42 different third party tracking technologies, a few of which have assigned me a unique user ID in a similar way to BuzzFeed. I'd be amazed if most staff know that's happening, let alone readers."