Information

This article was written on 22 Jan 2014, and is filled under Science / Technology.

Seeing Spam: On the Logic of Permeable Binaries

Martin Wagner

On Christmas Day 2013, the New York Times printed “An Ode to Spam” by long-time columnist Gail Collins. In her article Collins presents some of the gems that she found on her occasional visits to the spam folder. I very much sympathize with Collins’s interest in her junkyard of unsolicited messages. There is a lot to be said about the poetry of spam—about the artistry of its bad English, its conmanship, and its parody of communication in the Digital Age. But instead of analyzing the content of the messages overflowing from our spam folder, I want to use this blog post to point our attention to the technology of electronic text filtering, which alone makes it possible for us to speak of spam as a unified category.

What’s so interesting about the filter is that it creates spam through the combination of two seemingly contradictory techniques. On the one hand, spam exists only because the filter inaugurates a strict binary system in which every given message is defined either as spam or as ‘ham,’ i.e. ‘legitimate mail.’ At the same time, however, we only encounter spam because the filter is prone to failure. Indeed, the filter is centrally constructed around the possibility that its decisions will be revoked by the user. The probabilistic mechanism of the filter never operates with absolute certainty and, therefore, must allow for mistakes. Were the filter perfect, we would never encounter spam—neither in the main inbox nor in the spam folder, which only exists so that we can search it for falsely classified legitimate mail.

Spam, in short, only exists because of the permeable binary system inaugurated by the filter. I define a permeable binary system as a system that enforces a relatively strict classification into two opposed categories while simultaneously allowing for an occasional mistake in the classification, such that there always remains a slight chance that each entity in the system belongs to the opposite category.

Importantly, permeable binaries differ from other theoretical models that either transcend a binary logic (dialectics) or question binary constructions (deconstruction). A permeable binary does not transcend or question the distinction between its respective categories. In a permeable binary system the very existence of the binary relies on the slight and unlikely possibility of the failure of the binary distinction.

The purpose of the present post is to construct in the permeable binary system of the spam filter a paradigm for a technology of classification that is mutatis mutandis in operation in a wide range of cultural phenomena, from the distinction between fiction and reality in the modern novel, to the gender divide in our contemporary society.

Before Spam

In a certain sense, the phenomenon of email spam is nothing new. In his recent book The Tyranny of E-Mail, John Freeman tells the story of a group of British entrepreneurs who founded the British-American Claim Agency in New York City in 1886. The businessmen hired 14 women to type up letters that were sent to thousands of private individuals around the United States. The recipients were encouraged to hire the Agency to look up claims to estates of long-lost relatives in England. According to these letters, inheritances worth a total of ₤77,693,769 waited for their heirs in America. The fees that customers had to pay to reach out to possible inheritance leavers or their representatives ranged from $2 to $24.75. As the New York Times later reported, the agency’s income soon averaged $400 to $500 a day. Needless to say, the British entrepreneurs never intended to do any research and, ignoring all complaints from their customers, kept the money for themselves.

The Invention of Spam by the Filter

The striking similarities between the content of the letters sent by the British-American Claim Agency and some modern day spam messages, however, should not blind us to the fact that spam is structurally a fundamentally new phenomenon. The difference between the spam of today and fraudulent mail in the Victorian age is not merely the scale involved, although scale surely matters. The seven trillion spam messages sent out in 2011 alone, equaling about 85 percent of all email communication—the numbers have gone down since—far outreach the production of scams and advertisements in ordinary mail at any single point in history. More important, however, is the introduction of an electronic text filter that governs our encounters with spam. The way we experience spam through the filter differs fundamentally from mass mailings of the nineteenth century, as well as from the early electronic ‘spam’ of the seventies and eighties, which predates the filter. With the invention of the automated filter, our experience of spam is itself filtered through a binary classification of email into spam and ham. For the first time in the history of mail, the plurality of different forms of mail such as private mail, business mail, political and religious brochures, advertisements, and scams is divided into a dualistic system. To be sure, in traditional mail there are already different categories, including bulk mail, first class, and priority mail. In the end, however, all snail mail is deposited in the same physical mailbox.

Looking in dictionaries for a general definition of spam, one finds in most cases the terms “unwanted” or “unsolicited” and “mass mail” or “bulk mail.” But these attributes—especially the central category “unsolicited”—are highly problematic. It would be a stretch to say that all messages that we accept as legitimate are solicited in any meaningful way. Think of an email by an old acquaintance you have not spoken to in years—and don’t care much to speak to, for that matter. Think of the request from a colleague, sent out to all his co-workers and asking to do his or her work. It seems odd to call these messages solicited. But neither would any electronic filter nor we normally define them as spam.

Spam, I argue, is fundamentally defined not by any particular intrinsic quality of messages, but by an institutional or media-specific practice that enforces a distinction between spam and ham. Were it not for the filter, we would not experience spam messages as spam, but, instead, as advertisements, scams, or, even more general, simply as more or less interesting messages.

However, we do not perceive spam simply because of the binary divide between spam and ham, but precisely because this binary divide is prone to failure. Because of this possible failure, we check the spam folder for legitimate mail and have to expect spam in the main inbox.

False Positives

The key to understanding why the filter does not designate messages unambiguously as either spam or ham lies in the phenomenon of false positives. False positives are legitimate messages falsely classified as spam; they are the opposite of false negatives – spam messages falsely classified as legitimate mail.

All talk about the great dangers of spam notwithstanding, email providers and software engineers generally agree that false positives inhibit efficient email communication more than false negatives. The user would rather see a spam message in his or her main inbox than miss a legitimate email from a friend or business partner because it has been wrongly classified as spam. False positives are crucial to understanding the phenomenology of spam, for it is the effort to avoid false positives that accounts for the fact that the user encounters spam at all. Were it not for the problems of false positives, spam would, phenomenologically speaking, not exist—neither in the main inbox nor in the spam folder.

In order to avoid large numbers of false positives, the spam filter is programmed such that it tends to err on the side of classifying emails as ham rather than as spam. As a consequence, false positives are less likely, but more actual spam (false negatives) comes into the main inbox. Understand that most spam filters are based on a mechanism of probabilistic reasoning. For any incoming message, the filter calculates its probability of being spam. Based on the score assigned by the filter, the message is directed either to the spam folder or to the main inbox. The law or theorem of probability used in the process is “Bayes’s theorem”—named after the English mathematician Thomas Bayes (1701-1761)—and spam filters of this type are called “Bayesian filters.”

In its most naïve form, the mechanism of a Bayesian filter is fairly simple. In what is called the “training phase,” the filter is provided with a certain number of emails that have been classified as either ham or spam. Based on this corpus of pre-categorized messages, the filter learns to calculate for every single word that appears in the corpus how likely it is that an email containing this word is spam. Each word is assigned a specific score of ‘spamicity’ that corresponds to the likelihood that the message containing this word is spam. Viagra, for instance, traditionally has a high spamicity, while the spamicity of proper names is very low. Based on this dictionary of spamicity, it is possible to calculate the probability that a new incoming message is spam simply by combining the spamicity of each known word in the entire message. The filter then directs the message to the inbox or to the spam folder based on the score that the message receives.

Spam filters typically include several mechanisms to create a high threshold for a message to be treated as spam. First and most importantly, messages are not redirected to the spam folder once the likelihood to be spam transgresses the 50 percent mark. Instead, messages must attain a score as high as 99.9 percent to be classified as spam. The result is that there are few false positives, but more false negatives – more spam in the inbox.

The second and more important way in which the concern for false positives creates spam is that false positives are responsible for the existence of an accessible spam folder. Were it not for false positives, we would not need to check the spam folder at all. If false positives were never to occur, email services would not need to provide a spam folder and could, instead, simply suppress all spam completely so that spam would never reach the user in any form. But even today, over 15 years after the first Bayesian filter was presented to the scientific community in 1998, filters still produce false positives, and the spam folder still exists precisely so we can check it for false positives. Again, were it not for false positives there would be no spam folder and the binary between spam and ham would, phenomenologically speaking, not exist. Even if false negatives were still to occur: in the absence of a visible spam folder filled with false positives, the binary of spam and ham would lose its guiding importance. False negatives would simply appear as more or less annoying messages in our inbox, but not as ‘spam.’

Spam as Paradigm

The binary of spam and legitimate mail that the user encounters is thus itself the product of the binary filter’s failure to implement its own binary system, namely its failure to keep spam fully separate from ham.

As I have suggested in the beginning, the idea of a permeable binary challenges traditional ways of thinking in and beyond binary oppositions. Thinking in terms of a permeable binary doesn’t question the existence of binary divisions. But it acknowledges the seemingly paradoxical fact that the binary divide depends on incidences of its failure.

Permeable binaries, I argue, exist not only in the spam filter. It may be fruitful, for instance, to rethink the gender divide in terms of a permeably binary. That would imply that the binary of masculinity and femininity is upheld precisely by the knowledge that this binary can never fully be stabilized. In other words, cases where the gender cannot immediately or unambiguously be recognized do not primarily undermine the distinction between male and female; they reinforce and motivate the discourse that applies the division. One may think here of parents who dress their babies from the first day in blue or pink.

Another example for permeable binaries may be found in theories and practices of fiction. Pere Borell del Caso’s famous 1874 trompe-l’oeil painting Escaping Criticism is a point in case.

Pere Borrell del Caso, Escaping Criticism, 1874. Courtesy Banco de Espana, Madrid.

Pere Borrell del Caso, Escaping Criticism, 1874. Banco de España, Madrid. (from wikipedia)

The painting shows a boy climbing out of the picture’s frame. With one foot the boy is already outside the frame while the other foot is still deep within the painting. I read this painting as an explanation of how a permeable binary works. The boy’s crossing of the boundary of the frame doesn’t question the distinction between inside and outside—between artwork and reality; it establishes this distinction. It is the boy’s transgression of the limit of the frame that orients us around this limit and allows us to designate individual points of the painting as being inside or outside the frame.

-Martin Wagner

Martin Wagner is a Ph.D. candidate in the German Department at Yale University. His dissertation analyzes procedures of observation in European narrative texts from the eighteenth and nineteenth centuries. His publications include articles on Goethe, Schiller, Büchner, and Orhan Pamuk.

Comments are closed.

Information

Seeing Spam: On the Logic of Permeable Binaries

Recent articles

Recent comments