Skynet, Smugglers and The Gift of Fear: What we can learn from snap judgements, and machines can learn from us

So, in the day or two since I posted the piece about “Big Filter“, I’ve gotten several calls, comments and emails that all seemed to focus on the scary notion of “machines that think like us”.  Some folks went all “isn’t that what Skynet and The Matrix, and (if you’re older, like me) The Forbin Project, and W.O.P.R were on about?”  If machines start to think like us, doesn’t that mean all kinds of bad things for humanity? 

Actually, what I said was, “We have to focus on technologies that can encapsulate how people, people who know what they’re doing on a given topic, can inform those systems… We need to teach the machines to think like us, at least about the specific problem at hand.”  Unlike some people, I have neither unrealistic expectations for the grand possibilities of “smart machines”, nor do I fear that they will somehow take over the world and render us all dead or irrelevant.  (Anyone who has ever tried to keep a Windows machine from crashing or bogging down or “acting weird” after about age 2 should share my comfort in knowing that machines can’t even keep themselves stable, relevant or serviceable for very long.) 

No, what I was talking about, to use a terribly out-of-date phrase, was what used to be known as “Expert Systems”, a term out of favor now, but that doesn’t mean the basic idea is wrong. I was talking about systems that are “taught” how someone who knows a very specific topic or field of knowledge thinks about a very specific problem.  If, and this is a big if, you can ring-fence the explicit question you’re trying to answer, then it is, I believe, possible, to teach a machine to replicate the basic decision tree that will get you to a clear, and correct, answer most of the time.  (I’m a huge believer in the Pareto Principle or “80-20 rule” and most of the time is more than good enough to save gobs and gobs of time and money on many many things.  More on that in a moment.) 

A few years ago now, I read a book called “The Gift of Fear” by Gavin de Becker, an entertaining and easy read for anyone interested in psychology, crime fighting, or the stuff I’m talking about.  The very basic premise of that book, among other keen insights, is that our rational minds can get in the way of our limbic or caveman brains telling us things we already “know”, the kind of instantaneous, can’t-explain-it-but-I-know-I’m-right, in-our-gut knowledge that our rational brains sometimes override or interfere with, occasionally to our great harm.  (See the opening chapter of The Gift of Fear, in which a woman who’s “little voice” as I call it told her there was something wrong with that guy, but she didn’t listen, and was assaulted as a result.  Spoiler alert, she did, however, also escape that man, who intended to kill, her using the same intuition. Give it a read.) 

De Becker, himself a survivor of abuse and violence, went on to study the evil that men do in great detail, and from there, to codify a set of principles and metrics that, encoded into a piece of software, enabled his firm to evaluate risk and “take-it-seriously-or-not-ness” for threats against the battered spouses, movies stars and celebrities his Physical Security firm often protects.  Is this Skynet taking over NORAD and annihilating humanity? Of course not.  What is is, however, is the codification of often-hard-won experience and painful learning, the systematizing of smarts. 

I was thinking about all this in part because, in addition to the comments on my last post, I’m in the middle of re-reading “Blink” (sorry, I appear to be on a Malcolm Gladwell kick these days.)  It’s about snap decision making and the part of our brain that decides things in two seconds without rational input or logical thought.  A few years ago, as some of you know, my good friend Nick Selby of (among many other capes and costumes) the Police Led Intelligence Blog, decided he was so passionate about applying technology to making the world better and communities safer that he both founded a software company (streetcred software – Congrats on winning the Code for America competition this year!) and became a police officer to gain that expertise he and his partner would encode into the software.  He told me a story from his days at the Police Academy.  I may have the details wrong on this bit of apocrypha, but you’ll get the point. 

During training outside of Dallas, there was an experienced veteran who would sometimes spend time helping catch smugglers running north through Texas from the Mexican border.  “Magic Mike” I call this guy, I can’t remember his real name, could stand on an overpass and tell the rookies, “Watch this.”  He’d watch the traffic flowing by beneath him, pick out one car seemingly at random and say, “That one.” (Note that, viewed at 60 mph and looking at the roof from above, age, gender, race or other “profiling” concerns of the occupants is essentially a non-issue here.) 

Another officer would pull over the car in question a bit down the road, and, with shocking regularity, Magic Mike was exactly right.  How does that happen?!  And can we capture it?  My argument from yesterday is that we can, and should.  We’re not teaching intelligent machines in any kind of scary, Turing-Test kind of way.  No, it’s much clearer and more focused than that.  Whatever went on in Magic Mike’s head – the instantaneous Mulligan Stew of car make, model, year, speed, pattern of motion, state of license plate, condition etc. – if it can be extracted, codified and automated, then we can catch a lot more bad guys. 

I personally led a similar effort in Cyber space.  Some years ago, AOL decided that member safety was a costly luxury and stared laying off lots of people who knew an awful lot about Phishing and spoof sites.  Among those in the groups being RIF’ed was a kid named Brian, who had spent untold hours sitting in a cube looking at Web pages that appeared to be banks, or Paypal or whatever, saying, “That one’s real. That one’s fake.  That one’s real, that one’s fake.”  He could do it in seconds. So, we hired him, locked him in an office and said, “You can’t go to the bathroom til you write down how you do that.” 

He said it was no big deal – over the years he’d developed a 27-step process so he could teach it to new guys on the team.  Just one of those steps turned out to be “does it look like any of the thousands of fake sites I’ve gotten to know over the years?”  Encapsulating Brian’s 27 steps in a form a machine could understand took 400 algorithms and nearly 5,000 individual steps.  But… so what?  When weeks of effort was done, we had the world’s most experienced Phish-spotter built into a machine that thought the way he did, and worked 24×7 with no bathroom breaks.  We moved this very bright person on to other useful things, while a machine now did what AOL used to pay a team of people to do, and it did it based not on simple queries or keywords, but by mimicking the complex thought process of the best guy there was. 

If we can sit with Brian, who can spot a Phishing site, or De Becker who can spot a serious threat among the celebrity-stalker wannabes, or Magic Mike who can spot a smuggler’s car from an overpass at 70 miles an hour, when we can understand how they know what they know in those instant flashes of insight or experience, then we can teach machines to produce an outcome based not just on simple rules but by modeling the thoughts of the best in the business.  Whatever that business is – catching bad guys, spotting fraudulent Web sites, diagnosing cancer early or tracking terrorist financing through the banking system, that (to me) is not Skynet, or WOPR, or Colossus.  That’s a way to better communities, better policing, better healthcare, and a better world. 

Corny? Sure.  Naive? Probably.  Worth doing?  Definitely.  

 

 

“Big Filter”: Intelligence, Analytics and why all the hype about Big Data is focused on the wrong thing

These days, it seems like the tech set, the VC set, Wall Street and even the government can’t shut up about “Big Data”.  An almost meaningless buzzword, “Big Data” is the catch-all used to try and capture the notion of the truly incomprehensible volumes of information now being generated by everything from social media users – half a billion Tweets, a billion Facebook activities, 8 years of video uploaded to youtube… per day?! – to Internet-connected sensors of endless types, from seismography to traffic cams.   (As an aside, for many more, often mind-blowing, statistics on the relatively minor portion of data generation that is accounted for by humans and social media, check out these two treasure troves of statistics on Cara Pring’s “Social Skinny” blog.)

http://thesocialskinny.com/216-social-media-and-internet-statistics-september-2012/

http://thesocialskinny.com/100-more-social-media-statistics-for-2012/

In my work (and occasionally by baffled relatives) I am now fairly regularly asked “so, what’s all this ‘big data’ stuff about?”  I actually think this is the wrong question.

The idea that there would be lots and lots of machines generating lots and lots… and lots… of data was foreseen long before we mere mortals thought about it.  I mean, the dork set was worrying about  IPv4 Address exhaustion in the late 1980s.  This is when AOL dial-up was still marketed as “Quantum Internet Services” and made money by helping people connect their Commodore64’s to the Internet.  Seriously – while most of us were still saying “what’s a Internet?” and the nerdy kids at school were going crazy because, in roughly 4 hours, you could download and view the equivalent of a single page of Playboy, there were people already losing sleep over the notion then that the Internet was going to run out of it’s roughly four-and-half billion IP addresses.   My point is, you didn’t have to be Ray Kurzweil to see there would be more and more machines generating more and more data.

What I think is important is that more and more data serves no purpose without a way to make sense of it.  Otherwise, more data just adds to the problem of “we have all this data, and no usable information.” Despite all the sound and fury lately about Edward Snowden and NSA, including my own somewhat bemused comments on the topic, the seemingly omnipotent NSA is actually both the textbook example and the textbook victim of this problem.

It seems fairly well understood now that they collect truly ungodly amounts of data.  But they still struggle to make sense of it.  Our government excels at building ever more vast, capable and expensive collection systems.  Which only accentuates what I call the “September 12th problem.”  (Just Google “NSA, FBI al-Mihdhar and al-Hazmi” if you want to learn more.)  We had all the data we ever needed to catch these guys.  We just couldn’t see it in the zetabytes of other data with which it was mixed.  On September twelfth it was “obvious” we should have caught these guys, and Congress predictably (and in my opinion unfairly) took the spook set out to the woodshed perched on the high horse of hindsight.

What they failed to acknowledge was that the fact we had collected the necessary data was irrelevant.  NSA collects so much data they have to build their new processing and storage facilities in the desert because there isn’t enough space or power left in the state of Maryland to support it.  (A million square feet of space, 65 megawatts of power consumption, nearly two million gallons of water a day just to keep the machines cool?  That is BIG data my friends.)  And yet, what is (at least in the circles I run in) one of the most poignant bits of apocrypha about the senior intelligence official’s lament?  “Don’t give me another bit, give me another analyst.”

It is this problem that has made “data scientist” the hottest job title in the universe, and made the founders of Splunk, Palantir and a host of other analytical tool companies a great deal of money.  In the end, I believe we need to focus not just on rule-based systems, or cool visualizations, or fancy algorithms from Isreali and Russian Ph.Ds.  We have to focus on technologies that can encapsulate how people, people who know what they’re doing on a given topic, can inform those systems to scale up to the volumes of data we now have to deal with.  We need to teach the machines to think like us, at least about the specific problem at hand.  Full disclosure, working on exactly this kind of technology is what I do in my day job, but just because my view is parochial doesn’t make it wrong.  The need for human-like processing of data based on expertise, not just rules, was poignantly illustrated by Malcolm Gladwell’s classic piece on mysteries and puzzles.

The upshot of that fascinating post (do read it, it’s outstanding) was in part this.  Jeffrey Skilling, the now-imprisoned CEO of Enron, proclaimed to the end he was innocent of lying to investors. I’m not a lawyer, and certainly the company did things I think were horrible, unethical, financially outrageous and predictably self-destructive, but that last is the point.  They were predictably self-destructive, predictable because, whatever else, Enron didn’t, despite reports to the contrary, hide the evidence of what they were doing. As Gladwell explains in his closing shot, for the exceedingly rare few willing to wade through hundreds or thousands of pages of incomprehensible Wall Street speak, all the signs, if not the out-and-out evidence, that Enron was a house of cards, were there for anyone to see.

Jonathan Weil of the Wall Street Journal wrote the September, 2000 article that got the proverbial rock rolling down the mountain, but long before that, a group of Cornell MBA students sliced and diced Enron as a school project and found it was a disaster waiting to happen.  Not the titans of Wall Street, six B-school students with a full course load. (If you’re really interested, you can still find the paper online 15 years later.)    My point is this – the data were all there. In a world awash in “Big Data”, collection of information will have ever-declining value.  Cutting through the noise, filtering it all down to which bits of it matter to your topic of choice; from earthquake sensors to diabetes data to intelligence on terrorist cells, that will be where the value, the need and the benefits to the world will lie. 

Screw “Big Data”, I want to be in the “Big Filter” business.

Mad Magazine, the NSA and Chinese Army Hackers

A quick follow up to yesterday’s post, continuing the “Jeez, you just can’t keep a good secret anymore” meme for the week.  If you follow politics or business news you may have seen lots (and lots and lots) of headlines lately regarding US economic losses, political wrangling and business executives’ hand-wringing over enormous, far-reaching and, by all accounts, incredibly effective Chinese hacking and cyber penetration of American companies, research labs and government agencies.  (Reading like a list of B-grade spy movies, feel free to read about “Operation Shady Rat” or “Byzantine Foothold” for some eye-opening facts and figures if this stuff isn’t your normal beat.)

 

Recently, there was great sturm und drang after the folks over at Mandiant produced a very detailed and revealing public report about just how big, bad, widespread and effective these efforts have been (which wasn’t entirely news to those in the know), and much more interestingly, great specifics on how it was done, and by whom, (which was).

 

A division of the Chinese People’s Liberation Army known by the not-entirely-inspirational moniker of “PLA Unit 61398” has since been the topic of much discussion in the press, the government and the security community.  (Not that a sexy moniker is all that important I suppose.  I hear it’s a great place to work with great benefits.  You can read one of their recruiting notices here if you’d like – see aforementioned “Jeez, can’t anybody keep a secret anymore?” discussion.)

 

Not to be outdone, (and in a piece that made me feel a bit like I was seeing a media version of the old Spy vs. Spy cartoons) FP just published a story headlined “Inside the NSA’s Ultra-Secret China Hacking Group”.  When the article includes a description of the inside of the building and the door into the room housing said “Ultra-Secret” unit, I’m pretty sure the folks who work there had a pretty significant hand in un-secreting it.

 

Still, given that the Chinese have long said they have their own mountains of data that we’ve been doing the same to them, perhaps this was just a timely PR use of information that, like Unit 61398, was about to enter the public conversation anyway.  The more I think about it, the more resonant that old cartoon strip seems.  They do it to us.  We do it to them.  Both sides know it, and the game goes on.  My guess is that what is a little bit different now is that both sides have to learn to play a game of shadows on a field that’s far more brightly lit than ever before.

Big Ears, Little Ears: One article, three layers of blown secrecy, and how Edward Snowden proves my point

Well, I haven’t had much time to write here for quite a while, but the Edward Snowden affair – and more specifically this piece in the Guardian – were such a terrific display of the Digital Water concept and “a world awash in data” that I couldn’t resist, despite my current schedule.  This story is kind of a delicious “triple play” on the concept.

I suppose before I dive in I should probably comment on using the word “delicious” in this context since I know there is an awful lot of outrage and shock on all sides of this debate.  Some are appalled by Snowden’s revelations, i.e. the supposed extent of the NSA’s electronic eavesdropping on everyone and everything including American citizens.  Others are appalled by Snowden’s actions and consider it nothing short of capital treason.  Those two viewpoints need not even be fundamentally in conflict – I’m sure there are folks out there who are both appalled by the NSA’s supposed activities and would like to see Snowden executed for treason.

I confess that, on the first point – the extent of the data collection and the agency’s capabilities – I myself am relatively unfazed. I’ve been in the Open Source Intelligence business for almost 15 years.  Given the shock many people express at what I could find out about them with nothing but a laptop at a Starbucks, I just can’t be wowed by what must be possible for a huge entity with a mania for secrecy, almost no oversight and an 11-digit budget.  The Echelon, or “Big Ear” controversy of the late 1990s(!) outed many of these supposed capabilities, and anyone who has even flipped through a James Bamford book would probably be slightly less bewildered at the ability (though perhaps not at the willingness) of NSA to do the things alleged. Anyway, wherever you stand on the particulars of the Snowden case, this article in the Guardian (which originally broke the story in an earlier piece) illustrates exactly the kind of world I have been trying to noodle over with this blog.  Here’s the “can’t anybody keep a secret any more?!” meme hat trick for this one little Web page.  Ready….

1. The NSA – The most obvious.  If you take him at his word, “The NSA has built an infrastructure that allows it to intercept almost everything. With this capability, the vast majority of human communications are automatically ingested without targeting. If I wanted to see your emails or your wife’s phone, all I have to do is use intercepts. I can get your emails, passwords, phone records, credit cards… The extent of their capabilities is horrifying.”  While we can argue the legal and moral issues, as a technological matter, this hardly should be a shocker given that we live in a world where your department store can tell when you’re pregnant (even if your parents can’t yet).   So – Level 1: John Q. Public can’t really keep a secret in the digital world.  Almost anything you say, send or type outside a locked, airtight room can be captured, analyzed and recorded if someone deems you interesting enough. 

2. Edward Snowden – So the NSA is, by its very nature, ultra-secretive, institutionally paranoid and famously tight lipped (Jim Bamford’s books notwithstanding). Yet every organization is made up of people, and like any group of the NSA’s estimated 40,000 employees, they will hold a diversity of views.  Now by all accounts to date, Snowden was a patriotic, smart kid who joined the Army Reserve and worked for the CIA.  He obviously had been scrutinized, checked out and picked apart.  You don’t get to play inside The Puzzle Palace if you’re an anti-government radical.  Yet what Snowden saw working as an NSA contractor motivated him to leak, speak, and flee the country.  Level 2?  For all the supposedly terrifying ability to spy that Snowden witnessed, one insider with a moral objection meant they couldn’t keep their secrets secret either.

3. The guys at the airport – My absolute favorite (and why I found this page so delicious).  So in this sometimes-bizarre corner of the world here inside the DC beltway, it is not at all uncommon for lots of people with plastic ID badges on lanyards to be overheard talking about the sorts of things that, in most of the country, would seem at home only in a Tom Clancy novel.  You can walk through certain shopping mall food courts at lunch  and hear phrases like “I’m cleared up the wazoo – TS-SCI with lifestyle poly plus some special stuff” or “sure, anybody can read a license plate from outer space, but we can do it at night!”.

Like cars in Lansing or Dearborn, surveillance and Intelligence and secret-squirrel military programs are just kind of the local business, and this is a factory town.  A lot of people here take this stuff veeeery seriously.  So it is not entirely remarkable when the guys at the bottom of the page opine that Snowden, that dirty, rotten, no-good treasonous so-and-so ought to be “disappeared”.  The part I love so much was the extreme low-tech surveillance system that outed their conversation.  They said it out loud and in public, and a “Little Ear” (you know, the biological one attached to the guy sitting across from them) in the airport captured it.  He then used a few hundred bucks worth of smartphone to record part of the conversation and Tweeted about it to the whole world.

So-   Quis custodiet ipsos custodes?   Apparently any employee with a conscience or every jackass with a cell phone.  I think that’s probably reassuring, but I have to think some more about it.  The world really is full of dangerous people who hate us.  Meanwhile – my own personal take on the Snowden thing?  (I’m speaking technologically here, I leave the constitutional and legal questions for others to debate.)  IF you matter enough to someone, there are no secrets.  Most of us just enjoy security through obscurity.  The only reason our privacy is safe for most of us is we’re utterly uninteresting.  You may not like it, but information and technology are inextricably linked.  The capability to do what NSA does can’t be uninvented.  We can do it… so can other countries. We can only decide as a society whether we can strike the appropriate balance between protecting ourselves from those without and those within.

IMO, China’s welcome to lead the world in some things…

A week or so ago, I noted, via an awesome slide from Bit9 Security, that Chinese hackers are just workin’ stiffs like the rest of us.  Then I had a quick piece that even here in the West we see increasing indications they face some of the same concerns we do with regard to the trouble of keeping information bottled up.  (This was further emphasized today by the stories, backed by pretty strong evidence, claiming that a hacker going by “Hardcore Charlie” has penetrated China Electronics Import & Export Corporation or “CEIEC”, China North Industries Corporation, WanBao Mining, and others.)

Well, today, (OK it was actually Friday, but apparently I forgot to hit “Publish” before I sat down to dinner on Friday) another in the trickle of “China has now surpassed the US” stories, and this one they’re welcome to.

The Anti-Phishing Working Group reported today that China’s Taobao.com e-commerce site “Surpasses PayPal as the World’s Most Phished Brand“. Seems not even the (I should say alleged) world leaders in the theft of sensitive information are immune to the even the simplest forms of stealing sensitive data. This includes both intentional dOxxing like Hardcore Charlie, and the inadvertent revelations that simply can’t be stopped in world full of camera phones and Twitter (and Weibo) accounts.  (See the TV documentary that caught Chinese army personnel using click-to-play Cyber attack tools in the background as a fun example.)

Being trained in macroeconomics and generally favoring the Darwinian benefits of competition, I have to say this is one crown I’m happy to hand over.

Thanks again to the APWG for some very useful stats and reporting in today’s release.  Full report is at:

http://apwg.org/reports/APWG_GlobalPhishingSurvey_2H2011.pdf

Disclaimer: The views expressed on this blog are mine alone, and do not represent the views, policies or positions of Cyveillance, Inc. or its parent, QinetiQ-North America.  I speak here only for myself and no postings made on this blog should be interpreted as communications by, for or on behalf of, Cyveillance (though I may occasionally plug the extremely cool work we do and the fascinating, if occasionally frightening, research we openly publish.)

Social Media and the Military – keeping secrets keeps getting harder

I work with a group of fantastic Open Source Intelligence (OSINT) analysts.  One of them, who both reads this blog and knows I’m a pilot/airplane junkie, sent over this link under the heading of “Digital Water in China?”.  It talks about how, days before it ever made the Western press, the first confirmed sighting/evidence for a Chinese fifth generation fighter came not from the massive US intelligence apparatus but from a cell phone camera hung out a car window and posted to a Chinese military fanboy forum.

Now I recognize that China has an infamous, massive and essentially limitless-budget Web censorship program, which might well lead one to conclude that this evidence was found online because it was allowed to stay online. China decided it was time to let the world know so they intentionally let the drip-drip-drip start ahead of the (blatant thumbing-of-the-nose) first flight while Defense Secretary Robert Gates was in town.

Still, I happened to get this email the same week that linkedin discussions introduced me to both www.nosi.org (a naval OSINT blog maintained by, of all people, a physician) and osgeoint.blogspot.com, a blog both discussing and analyzing publicly available geospatial intelligence.  There are many more like these of course, but it’s still amazing that on any given day you can now read posts by people who (for free by the way) identify ships, spot aircraft and analyze other military assets from Google earth or satellite imagery. We can learn about ship construction from employee’s blogs, twitpics from dog-walkers and minutes from town meetings.  And let us not forget the first person to (albeit unknowingly) inform the world about the raid that killed Bin Laden – a Pakistani programmer up late writing code who Tweeted about the ruckus happening a few hundred yards away.

Look down the road another ten years at everything from augmented reality goggles to the questions raised for Law Enforcement and espionage by Facebook’s facial recognition.  I don’t know exactly what will and won’t be possible, but it certainly seems to me that keeping ANYTHING, from Special Ops that last an hour to weapons programs that run decades, secret is going to get a lot harder.  From the intentional  wiki-leaking to the inadvertent disclosure, the Digital Water is pushing and probing, finding its way out the cracks and crevices.  I suppose I take some comfort from the J-20 Stealth Fighter story at least in knowing our likely adversaries will have to tangle with the same problems.

Disclaimer: The views expressed on this blog are mine alone, and do not represent the views, policies or positions of Cyveillance, Inc. or its parent, QinetiQ-North America.  I speak here only for myself and no postings made on this blog should be interpreted as communications by, for or on behalf of, Cyveillance (though I may occasionally plug the extremely cool work we do and the fascinating, if occasionally frightening, research we openly publish.)

How to Hack Like Homer Simpson…

A few weeks ago, I gave a talk to a room full of police chiefs. I was talking about the goods, bads and unknowns of Social Media use by and for Law Enforcement (#LESM or #SM4LE).

One of the slides looked like this:

Image

It shows how, unless you explicitly change the default settings, in many cases everything from Tweets to photos are tagged with a variety of metadata.  In some cases this can include geotags for the location of the device that produced the photo, tweet or update, the model number and make of the camera or phone, etc.

I suppose if you flip the “goods” and the “bads” I could have given the same speech to hackers, but of course they are way to tech savvy to need any such guidance.

Well, most of them. There’s always the exception

http://www.informationweek.com/news/security/government/232900329

I couldn’t help but smile.  A hacker implicated in the recent Texas DPS breach, in painfully cliche fashion, decided that a bit of geek chest thumping was in order.  In a bugs-bunny-esque “you’ll never catch me coppers! Mwah hah hah!” moment, he decided to post pics on Social Media of his girlfriend holding signs taunting law enforcement.

The only problem?  Hacker-genius-computer-expert guy neglected to remove the geotagging from the photos, which were taken in her back yard. Police took the arcane and Star-Treky step of reading the lat/long coordinates on the files and looking them up on a map.

What I wouldn’t have given to be a fly on the wall when he was told how they got him.

Image

%d bloggers like this: