In a long post this weekend at The Washington Post, Barton Gellman added some protein to his big story on the NSA. He also directly addressed my and others' criticism, writing:

Ambinder based his conclusion that our story was 'a bust' on incorrect assumptions about our data set and erroneous descriptions of the systems the NSA uses to intercept and process communications. [The Washington Post]

First: an apology to Bart. I went for the flashy headline. (Sometimes, the editors at The Week write them, but I can't blame "bust" on them: it was my word.) I didn't think the article adequately captured the complexity of intelligence collection and I did not think it warranted the implications that many drew from it. But "bust" is a slur of sorts, and the story wasn't a bust. I was being churlish and snarky, and I'm sorry for that.

Now, back to the NSA and the meat of his criticism:

The point of his piece, as I understand it, was to demonstrate how the NSA's "collect it all" policy vacuums in an enormous amount of Americans' content inadvertently, that the NSA's minimization processes are incomplete and insufficient, and that the NSA keeps a disturbingly large amount of "raw" unminimized, incidentally collected data in its PINWALE internet content database, allowing analysts to search through it later without any special permission.

As Gellman says, "There is no way to prevent incidental collection, but policy choices decide how much of it will happen and what the NSA and other agencies are allowed to do with its fruits." [Italics my own.]

So far, Gellman and I are on the same page. In order to collect on foreign targets, the NSA has to collect a lot of completely useless communications too, and a lot of the useless communications are to, from, or about Americans in some way. This is just the way bulk collection works, whether its via content providers through file transfers, (PRISM), or off directly from the backbone of the internet itself (Upstream). Under section 702 of the FISA Amendments Act of 2008, the NSA collects intelligence on foreign targets who communicate on services owned by U.S. companies or whose communications transit the United States.

As I've written before, the NSA does itself a huge disservice when it tries to make a bright-line distinction between foreign and domestic when it talks about collection. In order to collect on foreign targets, it must collect on many domestic targets as well.

Unanswered question one: when a company like Microsoft gives the NSA e-mails or chat room logs under the PRISM program, what exactly does that mean? In other words, if the NSA is certified to collect information on Russia, does Microsoft simply provide the NSA with every single communication originating from customers it knows to be based in Russia?

In all the Snowden releases, we have no idea. We don't know, really, how narrow or broad the requests are, and we don't know how the digital exchange of data works. What appears to happen is that the NSA develops a lead on something, uses the certification as justification to ask for a subset of Russian communications, gets them, and then analyzes them. Neither the FISA court nor Congress see these requests before they're executed. Should they? What difference would it make to the enterprise if the FISA court saw the requests themselves? How can a judge independently decide how narrow or broad an intelligence request should be?

The company should absolutely be able to challenge the scope of the requests and then ask another branch of government to get involved. But so far, I don't think anyone has directly proposed allowing the FISA court to approve each and every PRISM request.

Setting aside the policy question, which is whether the NSA can or should significantly narrow the classes of foreign targets subject to surveillance under section 702 authorities, Gellman makes several points about what the NSA does with its incidentally collected data.

Specifically, after pointing out an admittedly confusing reference I made to the the type of certification that Section 702 collection requires, Gellman takes issue with my description of the process of minimizing communications.

Everything in the sample we analyzed had been evaluated by NSA analysts in Hawaii, pulled from the agency's central repositories, and minimized by hand after automated efforts to screen out U.S. identities. I describe the data more fully near the end of this post.

Had our sample not been evaluated, far more than 90 percent of the people in it would have been non-targets. Had it not been minimized, we would have found far more Americans than we identified on our own. [The Washington Post]

The NSA tries to minimize this content automatically, by screening out IP addresses and other metadata identifiers known to be domestic. They do not immediately destroy, as my garbled syntax implied in the earlier post, automatically minimized content. As I understand the process, automated minimization flags the content of those suspected to be U.S. persons for analysts. That's one layer. Then the analysts themselves screen or evaluate the content — flagged and unflagged. That's not a new practice; the minimization procedures have not significantly changed since 1998, although the entire telecom structure and the nature of intelligence collection evolved disruptively. The NSA got itself into trouble because the "Upstream" data it took in under Section 702 included transactions to and from Americans who had no connection to the target other than an incidental sharing of the packets used to actually transmit the digital content bits. We don't know whether the communications Gellman analyzed are from "Upstream" devices, or from the PRISM program, or, if an admixture, what percentage of each? We also don't know how many of the communications were analyzed under actual FISA orders on individuals, although Gellman implies that some FISA content was included in the Snowden tranche.

What about the targets themselves? Were they legitimately chosen? The story suggests that that they were.

There is nothing in Gellman's original or follow-up pieces that would suggest that the NSA analysts were looking at the communications to and from Some Random Guy.

Indeed, the original article contained hints that the trove of data included several critically relevant foreign intelligence information before it brings us the story of an Australian woman whose plaintive chats with a target were collected by the NSA. Gellman's point is that the woman is innocent, that the chats had nothing to do with foreign intelligence, and that her intimate feelings are now stored in PINWALE for any NSA analyst to query down the line. That is, indeed, disturbing. What the NSA ought to do with the stuff it doesn't need is a question that Edward Snowden has forced the intelligence community to grapple with in a meaningful way, especially because it can include a lot of stuff we find personal and sacred. This is beyond dispute.

But — just as relevant to this discussion is the fact that the targets were valid. Joshua Foust tweeted on Saturday about the example Gellman used:

Later, he likened the chat room messages to a mob wife being recorded talking to her capo husband.

Yes, it's highly unfortunate and totally not fair to her that the government intrudes into her private life without her knowing, but it is not surprising, nor is it avoidable. Indeed, for incidentally collected American targets, Gellman acknowledges how carefully the NSA minimized the identity of Americans. A layer of privacy protection is built into a process that, by its nature, violates privacy.

As targets become harder to track, the NSA collects more and more to figure out where they are. Then analysts have to look at the communications to or from the targets to determine whether they're useful. Then they have to determine whether U.S. persons are part of the relevant or incidental communications. Then they have to analyze the content itself. Then the content is stored away.

How do they determine whether a target is foreign? Because the world is so interconnected, it's becoming tougher to figure this out. A foreign IP address? A target who writes in a foreign language? The analyst of course has the discretion to determine, based on the content, whether the targets are legitimate. Earlier stories have described some of the ways an analyst can make this determination. They seem reasonable because there exists no technological or analytic rubric that works better.

Gellman does not ascribe to the NSA the motives that, say, whistleblower William Binney does. Binney thinks that NSA's goal is "total population control." Gellman judiciously avoids making conclusions about the intelligence collection process, and while the headlines and structure of his articles suggest that the NSA does not have to be doing what it does in order to collect on foreign targets today, Gellman does not say so.

The policy debate lacks this dimension. But it is the most important dimension. The Privacy and Civil Liberties Oversight Board, the same entity that found the NSA's bulk collection of domestic telephone metadata to be relatively useless, found that Section 702 collection provides extremely valuable intelligence, even while noting that the incidental collection and storage of American communications — the collection described in Gellman's piece — complicates the question of its execution.

The PCLOB has a bunch of ideas for reform, and I especially like the ideas that would give Americans information to act on, like providing the public with the number of Americans whose communications were incidentally collected. None of the reforms suggest that the NSA end bulk or large-scale collection of data. Incidental collection on Americans does not complicate the program's value.

The NSA ought to get rid of stuff it doesn't need. Can it reasonably collect the stuff it DOES need without bulk collection that incidentally includes a lot of U.S. persons' communications?

I would like to say that, regardless of whether it can, it should try, because U.S. persons' collection is a de jure violation of my privacy. Knowing about it makes me uncomfortable. But that's a pat answer, and not particularly useful for those charged with actually doing the collecting.

In sum, the Snowden documents reveal (or confirm) only that the NSA has pretty broad capabilities to pull information from pretty much wherever it wants. They suggest that the NSA finds itself drowning in irrelevant communications and has made it a priority to send to human eyes only the communications that are relevant for intelligence gathering purposes. They suggest that the NSA's complicated minimization system is taken seriously even while its execution is imperfect. They suggest that policy-makers wanted more from the NSA than the NSA could reasonably deliver, and in the efforts to catch up, the NSA often got out ahead of its skis. They show that the NSA takes privacy seriously, though not nearly as seriously as perhaps they should or can. But evidence of an Orwellian secret surveillance state? If that exists, it's not in the documents Snowden gave Bart Gellman.