Sunday, April 22, 2018

Data Mining for Fun and (Mostly for) Profit

So, big to-do over Localblox, a Washington state-based data mining firm that built 48 million personal profiles by scraping data from social networks such as Facebook, Twitter, Zillow, and the like. Much grumbling and gnashing of teeth over the "loss of privacy" and especially over the company's sloppy security: the collection of profiles, over 1.2 TB in size, was left unprotected on a public (though unlisted) Amazon storage server, where it was discovered by a security researcher. (Damn those pesky researchers, anyway!)

A couple of months earlier, it was Cambridge Analytica in the crosshairs of privacy supporters everywhere, after the data mining firm collected similar data and used it to help Republican candidates in their various bids for election. The catch there was that CA may have drifted into the realm of illegality by assigning non-US citizens to work on the campaigns. Technically, those non-citizens could in fact collect and analyze data, but the company was advised that it must not allow those non-citizens to "play strategic roles including the giving of strategic advice to candidates, campaigns, political parties or independent expenditure committees.” Which, of course, is exactly what they did. According to former CA employee-turned-whistleblower, Chris Wylie, ". . . there was no one American involved in [one campaign] . . . it was a de facto foreign agent, working on an American election."

Alexander Nix, CEO of Cambridge Analytica at Web Summit 2017
in Lisbon. Image used under the Creative Common Attribution 2.0
Generic license.
There's a big difference between these two scenarios. To take the second case first, had CA simply collected and analyzed data and then handed it over to a (in this case, Republican) campaign to use, no laws would have been broken; the data itself is out there for anyone to collect, aggregate, and use for any (legal) purpose. The company's mistake was in assigning foreigners to strategic duties that could affect a US election campaign. Some privacy supporters may have chafed at the use of PII (personally identifiable information) for purposes counter to their beliefs (read: we Democrats were not happy that the data was used to help Republican candidates) and without their knowledge or consent, but the reality is that you put that data out there. Why would various companies not collect and use it? (And let's be honest: Would the hue and cry among us libs have been quite as loud had the data been used to help Democratic candidates? I don't think so.)

Which brings me to the first case, one which involves a far more common—and perfectly legal—activity. We worry a great deal about privacy, but seemingly not enough to not place our entire lives out there on the Internet for the world to see. It's been said many times, but apparently it needs repeating: NOTHING YOU DO ON THE INTERNET IS PRIVATE. This is especially true when you post information that can be used against you or can be used to help people profile you and then use that profile to sell you things, including political candidates.

Alternatively, but quite commonly, such information is used by social engineers to scam you or talk you or one of your contacts into giving up names, passwords, and other data that ought to be kept confidential. Most so-called hacks actually start with a social engineering exploit, and most of those are predicated on data the scammer found on social media.
He looks so young and innocent, doesn't he? And he might
have been, back then. This is Mark Zuckerberg in his
Harvard dorm room back in 2005. Image used under the
Creative Commons Attribution 2.5 Generic license.

The thing is that a seemingly insignificant and innocuous Facebook, Instagram, or Snapchat post can tell marketers and thieves (one assumes that there is a difference) a great deal about you, your habits, your location, your typical travel plans, etc. I can trawl (not troll; well, I suppose I could do that, too) through Facebook and know who is on vacation, where they went, and when they'll return. I know what you like to eat. What you like to wear. I know your marital status—and if I dig a little, I might even be able to figure out if that marriage is in trouble. Got a public Amazon wish list? I know your hobbies, your future renovation plans, what musical instruments you play, what pets you have, and what kind of books you read. I also know if you're into essential oils and have an Apple Watch. (Why else would you list an Apple Watch Stand and a wireless charger on your wish list?) I know if you curl your hair, and whether you're a Mac person or a Windows person. (All of this goes double if you click the little Facebook "I just bought…" icon that pops up when you make an Amazon purchase.)

I have security tools that some may lack, but I'm no security whiz. Nonetheless, if your phone has its geolocation turned on, I can see many of the Facebook, Pinterest, Snapchat, and other posts you make, and I can track them by name, by date, by keyword, and by the location from which they were sent. To do this, I use a piece of software that's often used by law enforcement officers, but the truth is that anyone can buy access to it. But the reason the software can access the data to begin with is that we put that information out there to be found.

If I were smart enough to write an algorithm that could collect, aggregate, and collate all of this information, I could know all about you. (No worries; I'm not smart enough. Then again, I know people who are.) That's all that Localblox did: the company wrote software that scraped information from social media sites, aggregated it, and created code that pieced the information together and built individual portfolios. (And then they stupidly left it out there unprotected for a security researcher to find.)

If you lived in New York in 1948 and were one of the few people
who owned a television, this woman really wanted to sell you a
faucet aerator. Image in the public domain.
All of this data is valuable. Why do you think Facebook exists? It's not a charity. It's not out to make the world better a better place by enabling more people to communicate. (Though some might argue that such a thing could happen. If it did, it would be a byproduct, a happy accident.) Facebook's purpose—and the purpose of all social media—is to collect marketable data and sell it. To anyone: retailers, other marketers, even political parties. That's how they make money. LOTS of money.

Facebook has over 2 billion users. WhatsApp has 1.5 billion. Instagram has almost a billion. These are big numbers. And big numbers translate to big dollars. Dollars they make by selling information about us, because we were dumb enough to put that information out there to be sold. As others have said, when it's being given to you for "free" (and this includes network television), you're not really a customer—you're the product.