Death & Auto-Tune:
Presented at the American Anthropological Association, Denver CO, November 2015
“Everybody’s wanting a piece of Michael Jackson
Reporters stalking the moves of Michael Jackson
Just when you thought he was done
He comes to give it again
They could print it around the world today
They want to write my obituary”
(Michael Jackson – “Breaking News”)
Michael Jackson has released eighteen new songs since his death in June 2009. Several of these have been the subject of controversy. Many of Jackson’s family members suspected early on that some the vocal performances on the record were those of an imposter named Jason Malachi. Leaked vocals-only versions of the tracks in questions began making their way around the internet – the acapellas from the song “Breaking News” in particular faced intense scrutiny from Jackson’s fans and family members alike. Skeptics pointed to specific sonic qualities of the recording: First, the signature Jacksonian non-verbalisms (Owww! Unhh! ) seemed repetitive and out of place, as though they were dropped in from other, more genuine recordings. Second, whereas Jackson would clap his hands and stomp his feet during vocal takes, here there is just silence between the lyrics. Third, the vibrato was fast and prominent, similar to that of Malachi, whereas Jackson’s vibrato was generally slower and less pronounced. Upon hearing this, Jackson’s nephew TJ (son of Tito) wrote on twitter: “Sounds like Jason Malachi to me too. The vibrato is a dead giveaway that it's not my uncle.”
In response, Michael producer Teddy Riley claimed in an interview that the voice was, in fact, “the true Michael Jackson” and that if it sounded strange it was because they had to use pitch correction software to bring certain recordings up to scratch
“When they first had it they did the processing and they did Melodyning because there were a lot of flat notes, um, because Michael was going in to do the the ideas, you know. He would never consider it being a final vocal. But, because he’s not with us he cannot give us new vocals, so what we did was we utilized the Melodyne to get them in key, and uh we didn’t go with the uh Auto-Tuning, we went with the Melodyne ‘cause we can actually move the stuff up. Which is the reason you know why some of the vibrato sounds a little, you know, um, off, or, uh processed, over processed. Um, we truly apologize for that happening but still in all you’re still getting the true Michael Jackson.”
Conspiracy theories still abound, but for my purposes I want to bracket the imposter question and attend to Riley’s apology at a dramaturgical level. In particular, we can see that he deploys an admission of artifice – the use of pitch correction - in order to secure a deeper claim towards authenticity – that this is really Jackson’s voice. Riley’s working relationship with Jackson, mediated through vocal processing techniques, thus came into conflict with familial kinship as a criterion for knowledge about “what is going on inside of Michael” and being able to say whether a recording is a true expression of that interiority. Later in the interview, Riley situates the decision to tune Jackson in terms of their own personal relationship (here we may think the “personal” in terms of per-sona, or sounding-through, dialectically masking or amplifying a voice), saying that following his passing
“I kinda felt like I didn’t want to go on with life. Because the greatest man of my life, you know, passing away you know. He was hope for me. You know. He was hope for me and he still is, you know, ‘cause he’s given me hope, he’s given me a second life in the music industry.”
Though his intention was to return the favor by giving Jackson a second life in the music industry through an act of digital repair, in doing so Riley came up against a set of strong associations between vocal tuning and death, artificial life, and other varieties of otherworldliness. Auto-Tune, the most iconic digital pitch correction tool, had in fact been reported dead a mere three weeks before Jackson’s passing, in Jay-Z’s hit single “Death of Auto-Tune” or simply “D.O.A.” Musician and producer Brian Eno had recently likened the Auto-Tuned voice to that of an angel, a voice that is “too perfect” and “moves in strange ways.” Phillip Smyth, somewhat more recently, has examined the popularity of the auto-tune effect among the producers of “nasheed” youtube videos for groups such as the Islamic State and Al Qaeda, organizations with strong symbolic resonances of death and otherness for the western popular imagination..
Prior to the tuning controversy, Riley was himself a well-established connoisseur and trendsetter of technologically mediated voices in pop music. He is widely credited with creation of “New Jack Swing” a 1980s subgenre that made prominent use of vocoded voices and that Riley claims “gave R&B a new lifeline.” T-Pain, whose use of Auto-Tune would largely define the sound of mid 00s hip hop, claimed a techno-aesthetic kinship with Riley, occupying his Messianic alter ego “Teddy Pain, the son of Teddy Riley” (T-Pain, “Karaoke” off Thr33 Rings - 2008). In a 2008 interview with the New Yorker, Faheem Najm (T-Pain’s legal name) noted that he decided not to use Auto-Tune on a song about his son. For both Riley and T-Pain, vocal manipulation technologies served as key resources for the material-discursive making and unmaking of affective life
This context helps us understand why Riley was so careful to specify that he had used Melodyne, a newer, less notorious and putatively more precise tuning tool widely regarded has having a “cleaner” sound, instead of Auto-Tune, which had already become strongly associated with the robotic vocal aesthetic made famous in the late ‘90s work of Cher. In this story we can see how the practice of digitally correcting vocal performances in particular appears as one of the several ways of socio-technically managing the affective and perceptual boundary between life and death in the recording studio. Experiences and performances of deadness and liveness, as Sterne, Stanyek and Piekut, and others have noted, have long been key material-discursive resources for recording engineers (Piekut and Stanyek 2010; Sterne).
“Barry,” a Los Angeles-based recording engineer specializing in the production of advertising music, describes his work as having “a certain aspect of simulating life to it where there really isn’t.” He is referring specifically to the common practice “ref work” or working from reference recordings – say another song with a particular vocal sound that a client wants their own record to resemble. This entails a mode of listening which reverse-engineers performances into their component techniques. Barry relates, with ambivalence, that he’s gotten good enough at ref work that he hasn’t really been able to “listen to music” since he was 19, at which point he started listening to recordings as assemblages of production choices. Bob, a mastering engineer, similarly describes himself as not a fan of music, but a fan of production.
Anthropologist Rachel Prentice has written about the tension surgical students experience during dissection between experiencing cadavers both as assemblages of body parts and as former living people. Through “tactical objectification” and cultivation of affective relations with patients, surgeons must learn to “mutually articulate” their own bodies with those of their patient. Indeed, surgical metaphors are commonplace in the recording studio. This is largely an echo of earlier days of the profession which frequently involved the literal cutting and splicing of magnetic tape, though the surgical imagery discloses the decidedly somatic qualities that engineers perceive within vocal recordings. The tape has been replaced by the scrolling audio regions of the computer’s protools display, though cutting up and carefully reassembling the digital inscriptions of embodied performances still makes up the a large part of an engineer’s day to day work. Carl, who runs studio A, describes a colleague skilled in vocal editing as a “surgical ninja.”
Within the pop music production community, the vocal track is generally considered the most emotionally expressive and thus most important part of a pop music recording. It generally occupies the loudest, most stereo-central place in the mix, and it receives the greatest amount of attention in the production process. Digital vocal tuning’s popularity is accountable in large part because it allowed an unprecedented degree of control over the vocal performance and thus intensified this pre-existing conceptualization of the voice as a the quickest route to the singer’s personality. A widely held rule of thumb is that the most emotionally compelling, though technically flawed performance usually appears in the first couple of takes, with emotion decreasing and pitch-accuracy increasing with each subsequent take. It was on the basis of this inverse framing of the relation between emotion and pitch accuracy that Auto-Tune in particular marketed itself: don’t worry about the pitch, just record a good, emotionally expressive take and then tune it up quickly after the fact. Auto-Tune thus remains primarily a labor-saving technology: a species of fixed capital (dead labor) meant to substitute for the living labor of the singer and/or engineer, so as to extract relative surplus value from a recording session (until, of course, everyone starts using Auto-Tune, the socially necessary labor time contracts, and the relative surplus value evaporates.) Auto-Tune’s slogan is “it’s about time” - I’d offer a Marxian elaboration: “it’s about socially necessary average abstract labor time.”
This narrative, of the inverse relation between emotion and technical accuracy, is reproduced in even the most mundane forms of studio work. As a de facto intern (they had no use for ethnographers) I had to check microphone signals by repeating the same phrase over and over again. Check check check. for what seemed like several minutes – while they tinkered with the signal chain in the other room. A repeated word or phrase – even as simple as the word “check” – quickly loses its meaning, and as a speaker you begin to lose your sense of agency in conveying that meaning. Check check check check check check check. Soon it breaks down into its component sounds, its component production choices, which come to seem insurmountably arbitrary: check check check – am I even vocalizing the K sound, or simply implying it? – check check check – I could just as easily be saying ka-che ka-che ka-che – why am I even doing this in the first place? The phon-etic and phon-emic decouple over time, shaken loose through repetition. When you are checking a mic your task is a purely phonetic one – making the sounds, preferably with as little variation and meaning as possible. More precisely, the meaning of the mic check, that it is supposed to be uniform and continuous, lies outside of whatever is being said. The same sort of thing happens when you’re recording, for example, several takes of a single verse of a song – it becomes harder to even notice the words as words, let alone mean them. This is the practice that the emotion-accuracy indifference curve as a professional ideology both produces and is produced by.
Wittgenstein wrote about the "deadness" of a "sign by itself” and the process by which it can come to life “in use” (Wittgenstein 1998 - Philosophical Investigations section 432). A repeated sound, from either the body of an intern or a pro-tools session set in a loop, is both more and less than its component elements. Spoken phrases or random sounds reveal inadvertent melodies, musical phrases break down into their component sounds – they become collections of production choices. Auto-Tune’s inverse law of emotion and technical accuracy is not so much an accurate description of reality of how voices and emotions work as it is a collectively held heuristic - a particular rule for generating and interpreting vocal recordings in practice.
Exceptions to this rule are common, however, and the negotiation of these exceptions is in large part how qualities of emotional expressiveness and in-tuneness get articulated in practice. Engineers take great care in tuning voices so as to preserve what they regard as the emotional qualities of the performance. The key thing is that what counts as “in tune” is always changing from case to case – it needs to be negotiated and performed relationally between the engineer and the client. They need to be tuned-in to one another. Grammy-winning engineer “Lucille” describes the “tuning in” process in terms of learning the client’s language and inviting them to approach the performance in terms of feeling rather than technical precision:
“you have to understand the language that of whoever you’re working with. They’re gonna be like ‘my vocal sounds dead’ or something. Maybe they want me to brighten it. Or maybe it’s overcompressed. But you kind of also have to tailor your language to that person. Especially like when you’re in talkback… I really enjoy doing vocals because I can’t sing but I think the way I describe things to people because I can’t sing is like they really are receptive to it ‘cause I kind of have to describe things in terms of feeling, or like ‘try to sing it like you’re talking’ or like stuff like that.”
Lucille’s lack of experience with the technical language of skilled singing becomes a resource in this case because it allows the artist to both root their performance in terms of their own technical skill (set in contrast to Lucille’s performance of a relatively unskilled perspective) while simultaneously providing the performance with an emotionally legible, everyday scenic contexture.
Engineer “Harold” specializes in the production hip hop and death metal – genres with radically different vocal aesthetics, each of which complicate the Auto-Tune framing. The traditional vocal style of death metal, which ranges from a gutteral demonic grunt to a choked pig squeal, is almost exclusively non-periodic in character. This means that it has no real pitch regularities and is thus illegible to the pitch detection and manipulation algorithms of most vocal tuning software. The imaginary economies of voice and emotion inscribed within Auto-Tune, similarly, go out the window.
Death metal vocals in fact require careful technique so as not to damage the vocal cords. When tracking bands like this, Harold – a metal vocalist himself – sits quietly at the recording console while the singer is cordoned off in the vocal booth on the other side of the studio. They do a take, and after a quiet pause Harold sets up another take and hits record again. I ask him about that moment afterwards and he explains:
“Sometimes it’s just that breath that you take that you’re breathing for the other person in there, and it’s just this kind of symbiotic energy thing that’s happening. And I think that flow and kind of being really in tune with their energy helps.”
Being in tune in this way is equally important once the editing process begins. Harold explains the careful practices of stage management required to avoid alienating the client from their performance:
“Well, some people want it. Some people don’t want it at all…. Some people want to see you use the tools to fix it, but don’t want to see it happen. Some people want a little bit of it fixed. It goes all over the place”
Hip Hop verses are similar in that they usually don’t require pitch correction (unless, as is increasingly the case, it’s being applied as part of the vocalist’s persona), but being in tune with the performer in this context involves a very different mode of care:
“hip hop sessions are more psychological, it’s more like a therapy psychology session in that you’re taking these nuances and kind of creating this atmosphere to create, for everything to be fine. Where you’re doing the same thing in these metal productions, but it’s a very high paced, very technical thing.”
Cultivation of particular atmosphere is extremely important in the studio. Tara Rodgers has noted the importance of maritime imagery and navigational metaphors in the technological worlding of sound recording, and sure enough Carl often compares his studio to a pirate ship. A well-known engineer named Eric Sarafin likes to refer to the control room as “the womb” – a plush, dimly lit space, filled with racks of warm, blinking equipment and humming banks of speakers. Studios, like grocery stores or casinos (Schull), are typically free of windows and visible clocks. Carl believes in burning sage exactly 45 minutes before the beginning of his sessions. Meals are ordered in and retreived by the silent agency of studio interns. All these practices are directed towards the goal of “making everything fine” for the client and facilitating the capture of a lively performance.
This work occasionally occurs at the level of the materiality of studio walls. On a quiet day in studio A, without any sessions booked, Carl and I get to work upgrading the acoustic paneling in the vocal booth. Its walls are currently lined with a fiberglass material designed to absorb echoes that might color a vocal take. Carl paces the length of the booth, calling out periodically to locate the more or less reverberant spaces – the room’s “live” or “dead” zones. Towards the back of the booth he stops and notes that it is far too dead – “try it, it hurts” he says. I stand in his place and feel a sort of dull disorienting pain at the lack of ambience. We spend the rest of the day installing wooden panels to liven up the dead zone.
I later learn that the designers of Auto-Tune, before they began making vocal correction and tuning software, had helped pioneer a technique referred to as “room correction” or “tuning the room.” By using a special microphone and software package, room correction analyzes the reverberant properties of a room and designs compensating filters that could be applied to the studio’s speakers in order to “cancel out” the room’s acoustic signature. Rendering it undead, so to speak. It was meant to allow engineers to mix recordings from the most neutral acoustic position possible - a sort of acoustic “view from nowhere”. I reflect that Auto-Tune carries a similar spatial logic over to the vocal tract: it tries to separate out the irregularities from the singer’s voice so as to make it available for correction. A voice from nowhere. I run this thought by Carl and he thinks about it, but doesn’t really say anything.