Ashley T. Rubin - Rocking Qualitative Social Science_ An Irreverent Guide to Rigorous Research-Stanford University Press Flipbook PDF

49 downloads 121 Views 19MB Size

Recommend Stories

Qualitative research: An ongoing methodology on social facts

a practice to an enlightened social optimism

University of Calgary Press www.uofcpress.com CLERICAL IDEOLOGY IN A REVOLUTIONARY AGE: THE GUADALAJARA CHURCH AND THE IDEA OF THE MEXICAN NATION (17

Autobiographic narration for teaching Nature of Science: an approach to authentic science based on an investigation about dengue

FORUM: QUALITATIVE SOCIAL RESEARCH SOZIALFORSCHUNG

Story Transcript

RO CK I NG QUA L I TAT I V E S O CI A L S CI E NC E

This page intentionally left blank

ROCKING QUALITATIVE SOCIAL SCIENCE An Irreverent Guide to Rigorous Research

ASHLEY T. RUBIN

Stanford U niversity Press Stanford, California

Stanford University Press Stanford, California ©2021 by the Board of Trustees of the Leland Stanford Junior University. All rights reserved. No part of this book may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and recording, or in any information storage or retrieval system without the prior written permission of Stanford University Press. Printed in the United States of America on acid-free, archival-quality paper

Library of Congress Cataloging-in-Publication Data Names: Rubin, Ashley T., author. Title: Rocking qualitative social science : an irreverent guide to rigorous research / Ashley T. Rubin. Description: Stanford, California : Stanford University Press, 2021. | Includes bibliographical references and index. Identifiers: LCCN 2020052613 (print) | LCCN 2020052614 (ebook) | ISBN 9781503611399 (cloth) | ISBN 9781503628236 (paperback) | ISBN 9781503628243 (ebook) Subjects: LCSH: Social sciences—Research—Methodology. | Qualitative research. Classification: LCC H62 .R735 2021 (print) | LCC H62 (ebook) | DDC 300.72/1—dc23 LC record available at https://lccn.loc.gov/2020052613 LC ebook record available at https://lccn.loc.gov/2020052614 Cover design: Kevin Barrett Kane Cover image: Mat Reding Text design: Kevin Barrett Kane Typeset at Stanford University Press in 10/14 Minion Pro

For all the Dirtbaggers

This page intentionally left blank

Table of Contents Acknowledgments ix 1 Introduction to Dirtbagging

1

2 Topo: What Exactly Are Qualitative Methods?

13

3 Picking Your Proj: Identifying Your Research Question

35

4 On Belay: Connecting Your Work to an Anchor

59

5 Mapping out the Route: How and When Research Design Matters

84

6 Starting on the Right Foot: Making and Justifying Your Case Selection

108

7 Flaking out the Rope: How to Check Your Sample

138

8 Bivvy Time: The Fieldwork Model of Data Collection

164

9 The Crux: Content Analysis, Analytic Memos, and Other Tricks

179

10 Placing Pro: Making Causal Claims with Qualitative Data

208

11 Living on the Sharp End: Dealing with Skeptics of Qualitative Research

231

12 The Sweeper

251

Notes 257 Bibliography 269 Index 279

This page intentionally left blank

Acknowledgments

In writing this book, more than anything else I’ve written, I have been overwhelmed by how much my thinking has been shaped by other people—by my professors in graduate school, my colleagues, my students, and the studies I have read. I frequently had to think carefully about whether my ideas were original—or rather in what ways they are original—when I am so much the product of my training, the people who trained me (formally or informally), and the excellent research that I admire and aspire to emulate. Although the rock climbing imagery is all mine, I owe much of my inspiration to Calvin Morrill for instilling in me these ideas (although he should get none of the blame should this work not be well received). Cal was my primary qualitative methods teacher in grad school and, along with Malcolm Feeley, one of my most helpful graduate (and, later, dissertation) advisors. During my graduate career, there were several moments when his intervention made the difference between my sinking and swimming. More than just being part of the village that got me through graduate school (and then multiple stints on the job market), Cal’s training continues to shape the way I think about things. While many qualitative scholars are open to a range of approaches, Cal imparted a Big Tent ethos that makes room for everyone as well as a commitment to rigorous research when doing methods that some folks might perceive as unrigorous. These themes were captured in a three-hour workshop hosted by UC Berkeley’s Center for the Study of Law and Society (at Berkeley Law) in 2007 in which he offered an overview of qualitative fieldwork. After drafting this book, I rewatched the recording of this training session1 (which I had attended in person early in graduate school) and realized how much of his approach had shaped the way I approach qualitative methods. As a not-quite- historian, sometimes labeled a criminologist and sometimes a sociologist, ix

xAcknowledgments

interdisciplinary-trained sociolegal scholar, I never really belonged anywhere. So I both gravitated to and appreciated this Big Tent, live-and-let-live approach that embraces flexibility. More than this ethos, however, Cal also introduced me to the tools I would need for doing the type of qualitative research I wanted to do. Earlier in my grad career, I had formulated ideas about how to do qualitative research—primarily as someone using archival and other historical documents but without much formal training in historical methods. I took a historical methods class only to learn that methods had a different meaning in history—something closer to what I would call theory or even epistemology. Aspiring to be a methodologist, I took as many methods and research design classes as I could, but most of these were quantitative. For a long time, I didn’t have the tools to do the type of research I wanted to do—at least, not well. Cal’s ethnography class gave me my first rack (like a climber’s utility belt) for doing solid qualitative research. While Cal had the biggest impact on how I view qualitative methods, my methods training has also benefited from other mentors. In particular, Justin McCrary, Rob MacCoun, and Kevin Quinn were my statistics and quantitative research design professors in grad school who helped me develop a keen interest in causal inference that has shaped my overall approach to methodology. I also wish to gratefully acknowledge the training I received through the Institute for Qualitative and Multi-Methods Research at Syracuse University. I attended their two-week methods camp (primarily organized by Colin Elman) in summer 2011 and was given a crash course in how qualitative political scientists, using a variety of methods and working in very different literatures from my own, pursue and evaluate research. This training program, led by multiple scholars including Andrew Bennett, David Collier, James Mahoney, and Jason Seawright, alerted me to qualitative techniques for causal inference of which I was previously unaware—and that I used in my dissertation and later—as well as a vocabulary for defending qualitative research against skeptical quantitative scholars—another rack that continues to be useful to me and now to my students. Finally, I wish to recognize the influence of Kristin Luker (Krista) who was both one of my methods professors in graduate school and wrote the book that was the inspiration and model for this one. Although her book, Salsa Dancing into the Social Sciences: Research in an Age of Info-Glut, came out in 2008, basically in my first year of graduate school, I only came across it while starting my second tenure-track job when I was tasked with teaching qualitative

Acknowledgmentsxi

methods in the sociology department at the University of Toronto. I had previously taught qualitative methods in a criminology department, where I had very little oversight or meddling with my syllabus and where I had opted to teach in an explicitly interdisciplinary way, blending the tools and techniques from a variety of fields that could be useful to criminology grad students. But teaching qualitative methods in a sociology department—when I had no formal training in sociology (I was trained by a variety of scholars, including sociologists like Cal and Krista, in an interdisciplinary graduate program)—I was worried I would miss something crucial my sociology students would need. So when one of my colleagues suggested I use Krista’s book, I eagerly agreed. Before the semester began, I read the whole book, feeling bouts of nostalgia as I recognized Lukerisms that she had shared when I took her class on research design. But while reading the book brought up warm feelings from remembering Krista’s comforting manner, I also found myself frustrated at times, thinking, “Huh, I don’t actually agree with that.” When I held seminars with Krista’s chapters as required reading, I found myself clarifying some of her statements as one approach, but not the only approach, or as good advice, but not the only good advice. This feeling was not unique to Krista’s book by any means. There were several methods chapters or articles I would assign and then find myself contextualizing the reading, trying to soften the sense in which it felt like it was the Only Right Way. For the most part, it was only the ethnography texts where I was less critical, in part because those texts were written in a way that seemed more flexible and less hegemonic in their outlook, recognizing that theirs was one among many techniques for engaging in qualitative social science. It was out of this frustration with certain methods texts and a growing awareness that ethnographers’ approach to methods could be extended to non-ethnographic research projects that I began writing this book. In the process, I found myself both intentionally and unintentionally copying Krista’s style and strategies. In many ways, then, this book is very much inspired and shaped by her book.2 Once this project was underway, the proverbial village was out in full force. I owe the biggest debt to the many people who took the time to read and/or discuss Rocking Qualitative Social Science while I was writing, editing, revising, and rewriting. Early on in the process, my friends/colleagues Beth Hoffman, Jooyoung Lee, and Keramet Reiter read my proposal and gave me helpful feedback for its revision. I also mentioned the project to a number of friends/colleagues (including Marianne Quirouette, Hadar Aviram, and Noga Keidar), almost all of

xiiAcknowledgments

whom were very supportive and encouraging about the project. One of these mentions led to a wonderful, in-depth conversation with Josh Page who was a major supporter of the idea. His enthusiasm for the project so early on gave me some much needed confidence at a time when I was still hesitating to tell people I was working on this project. All of this early support was immeasurably helpful because the initial idea for the book—and that I would be the one to write it—seemed a little crazy, but no one really seemed to think so (or if they did, they kept it to themselves). I continued to benefit from numerous conversations with colleagues (including Katie Young, Jennifer Darrah, David Dagan, and Veronica Horowitz) and students (including Grace Tran and David Riedman), whose interest in and support for the project helped to buoy me up when doubt crept in. I also owe a big debt of gratitude to five amazing undergraduates from the University of Toronto–Mississauga: Rumsha Daimee, Quianna Lim, Fatima Minhas, Nabiha Rasool, and Elli Manzella. As part of UTM’s Research Opportunity Program, they helped me with several research projects in 2018– 2019, part of which involved reading completed chapters of my manuscript to identify what parts didn’t make sense, were boring, seemed overly irrelevant to an undergraduate audience, or were generally inaccessible in some way. Their feedback was incredibly valuable and has made the book much more accessible and readable. Additionally, I owe another big debt of gratitude to three graduate students: Isabel Arriagada (University of Minnesota), Jennifer Peruniak (University of Toronto), and David Riedman (University of Hawai‘i at Mānoa) who also read my manuscript. Their feedback was aimed at ensuring the book would be suitable and useful for graduate students, as well as providing substantive and stylistic points. I am indebted to them for taking the time to read the entire manuscript and give me feedback, especially amidst the busy demands of grad school. I am speechlessly grateful to my U of T colleagues Ellen Berrey, Jerry Flores, Philip Goodman, Neda Maghbouleh, and Gail Super for participating in my book workshop. They read the manuscript in its entirety and then spent a whole day discussing the book with me and strategizing ways to make it better. It was super fun and helpful, and I look forward to returning the favor! I am also grateful to the University of Toronto–Mississauga Sociology Department for providing funds that made the workshop much more enjoyable because we had good food to eat while discussing my book all day.

Acknowledgmentsxiii

Stanford University Press arranged for three very useful, constructive reviews of my manuscript from anonymous reviewers who could see things I was not able to and suggested helpful directions to push the manuscript. Their thoughtful feedback is reflected in the final manuscript. Additionally, their comments were supportive and encouraging, which again was crucial for me. Finally, I am so thankful to the two Stanford University Press editors with whom I worked on this project: First, Michelle Lipinski, for her enthusiasm for the project, her encouraging feedback on my proposal, and her helpful comments following the initial review of my manuscript, as well as her support along the way. Second, I am grateful to Marcela Cristina Maxfield for helping me get across the finish line—during a global pandemic—with her support, edits on the revised manuscript, and help working through reviewer comments. I’m also indebted to the whole team at SUP who helped with this process, including Stephanie Adams, Sunna Juhn, Emily Smith, Nora Spiegel, and Kate Wahl. Last but not least, a huge thanks goes to Jennifer Gordon for her fabulous and careful copyediting of the final manuscript! I worked on this book for two years, during which time my family heard a lot about “Dirtbagger methods.” I am grateful to my parents, especially my mom, who read some of the early chapters and gave me feedback, and to my husband for his multifaceted support for the project, including repeatedly offering to photograph me climbing for the cover (I declined).

This page intentionally left blank

RO CK I NG QUA L I TAT I V E S O CI A L S CI E NC E

This page intentionally left blank

1

Introduction to Dirtbagging Dirtbag A climber who lives modestly and often itinerantly, supporting themself through odd jobs in order to maximize the amount of time climbing. Wikipedia (2020), “Glossary of Climbing Terms”

There’s No One Right Way A few months after I had started climbing at a local gym, I was stuck on a 5.9 route. Climbing routes are graded on their level of difficulty; intermediate begins at 5.9 (with increasing levels of expertise from 5.10a to about 5.13d).1 If I could send this one route (i.e., climb in one go without falling), I would no longer be a newb. But, after weeks of working this route, I kept getting shut down at the same place, the crux—the most difficult section of a climb. One day I was talking with one of the gym staff about it. On the one hand, I was kind of stoked about this move lower down on the route that always felt really snazzy. I would swing both of my feet off the wall and get my left foot onto a small foothold and then get my right foot onto another while holding on to a hornlike handhold on an overhang above. But, on the other hand, I’d recently seen a number of other, more advanced climbers doing this route, and no one seemed to do it like I did. So in telling the staff member about the moves I used, I nervously added, “I know this isn’t really how you’re supposed to do it, but 1

2

Chapter 1

I can’t seem to do it any other way.” But the staff member assured me, saying that there’s no one right way to get up the wall, generously adding that he was curious to see me do it. A few days later, I got past the crux and climbed my first 5.9 route. Okay, this is kind of a weird way to start a qualitative methods book, but this story—like so much about rock climbing—is a useful parable for thinking about methods.2 As in rock climbing, there is no one right way to do qualitative methods. As I’ll say again and again, there are certainly wrong ways to do them, but there’s also a plethora of right ways. All too often, people get pretty judgy about methods, particularly when someone does something in a different way—either a different way from how the critic was trained, a different way from how the critic typically does things, or generally a different way from what the critic considers The Right Way. This narrow-mindedness about methods contributes, I think, to our anxiety about methods. In fact, I’ve been struck by a frequent gap between what people think is The Right Way—as manifested in books and articles about how to do qualitative methods, peer reviewers’ critical feedback, audience members’ skeptical questions at research presentations, or general conversations with other academics—and how a lot of people actually do qualitative methods. Certainly many scholars follow The Right Way, but many others do not. Their deviation is particularly evident in numerous methods sections and methodological appendices. More striking, I’ve had many conversations with colleagues—including both junior and senior faculty—who have said things like, “I know this isn’t really how we’re supposed to do it, but it’s what I did.” The differences they describe seem to be fairly consistent across a range of people (suggesting their choices are not aberrational) and are always expressed with a dose of shame because they’re not doing it The Right Way. This insecurity about otherwise fantastic methods is problematic and ultimately unnecessary. It’s one of the many areas where we can learn from climbing.

Different Climbing Styles When pro-climber Chris Sharma was doing sport climbing competitions in his teens, he used a unique climbing style. He would swing his body and use the momentum to help push himself up the wall, “much like a monkey swinging between branches.” Today, this is known as dynamic climbing, in contrast to static climbing, which involves slower, more controlled movement. Initially, people criticized Sharma’s style because static climbing had long been the

Introduction to Dirtbagging3

norm. As his friend, fellow pro-climber Tommy Caldwell, remembered, “[T] he old-schoolers chastised him for poor footwork and lousy body control” (Caldwell 2017, 73). But as Sharma increasingly crushed it at comps, people eventually shut up. In fact, dynamic climbing became, and still is, pretty popular among competitive climbers.

As with climbing, social scientists have different approaches to qualitative research. Some of these differences have to do with people’s individual strengths, personalities, and preferences, while some differences relate to the challenges of a particular project. Some people are extroverts and exceptional conversationalists who excel at interviewing—the prospect of which might terrify many an introvert. Some people have an obsessive attention to detail and enjoy reviewing page after page of archival documents, which would make others sneeze or itch. Likewise, some research questions demand a particular method that the researcher has never before pursued and needs to learn (or get a coauthor) in order to conduct. Give ten people the same transcript of an event, moreover, and they may each focus on different elements—the role of gender, the role of race, how power operates, the social constructedness of a situation, the cultural meanings of exchanges, speech patterns, and so on—and they all may produce useful, insightful accounts. The multiplicity of valid approaches may be overwhelming, but it’s also liberating. Too often, advice about qualitative methods is too narrow or rigid and ultimately constraining. This book is about removing those artificial constraints and letting loose but doing so in a way that is still rigorous. Indeed, overly rigid advice gives way to overly rigid criticism of anything that exceeds the prescribed bounds. The central message of this book is: Rigorous research does not have to be rigid.

Dirtbagging: The Alternative, Inclusive Approach This book describes an approach to qualitative methods that is less rigid, arguably more creative, and generally more rewarding than some of the other, more mainstream approaches out there. By mainstream, I basically mean quantitative methods and those qualitative approaches that look at quantitative methods as a model for how to do research. Or, more generally, those approaches that look to the scientific method as the ultimate model of doing research.3 I call these approaches mainstream (or conventional) because these seem to be

4

Chapter 1

the dominant approach when it comes to how qualitative work is evaluated— despite the multiplicity of valid approaches. In fact, there’s a pretty big mismatch between mainstream ways of thinking and certain types of qualitative projects. For example, mainstream approaches tend to rely on a series of inviolable rules about how to do research. In a lot of cases, these inviolable rules make sense for quantitative scholars and for some qualitative or mixed-methods scholars. But for the rest of us, that script doesn’t really fit what we want to do, the type of research we want to pursue, the type of articles and books we want to produce, or just our general orientation to research. We want to do things differently. Borrowing rock climbers’ most cherished sobriquet (for reasons I explain below), I call this other approach the “Dirtbagging approach” to qualitative social science. I’m saying “approach,” but it’s not really a single approach. It’s more a collection of options in the research process, or even an attitude toward research, that gets left out of mainstream guidebooks to qualitative methods. A scholar working within this approach often starts with a broad research question that gets refined in the field—whether that is in the archive, in a café reading online forum comments, at a desk interviewing a stranger, or embedded in a unique social space for long-term observation. Their4 research question evolves over time or leads to important insights beyond their original question. In some cases, their final research question emerges after data collection is complete. For this scholar, data analysis is a deeply personal process that requires following their intuition, puzzlements, and even emotional reactions to their data—but in a systematic and rigorous way that allows for useful, compelling, and generalizable insights. Such scholars analyze familiar objects in novel ways, develop exciting new concepts and theoretical frameworks, or constructively challenge the norms of their (sub)field. Ultimately, this type of research yields opportunities for generating important, creative, even paradigm-shifting insights for one’s field of inquiry. There is no real defining feature of this approach beyond the idea that it diverges from The Right Way. A key theme within this approach is flexibility, in opposition to the rigidity demanded by some other, more mainstream approaches. In fact, one of my goals with this book is to help dismantle the idea of The Right Way by presenting the range of good options. In the process, I borrow from several different disciplines, methods, and approaches, from ethnography to econometrics, using what’s helpful and leaving behind what’s not.

Introduction to Dirtbagging5

As a consequence of this eclecticism, every substantive label I (and later my friends) could think of to characterize this approach fell short: inductive, micro-level analyses, case study, Grounded Theory, ethnographic, theory generating, historical, and so on. The approach I describe can apply to these methods and techniques, but it’s not limited to them. For example, I often draw on the intuition and techniques ethnographers and historians use; but they don’t hold a monopoly on these ideas. People doing interviews, reviewing contemporary documents online, or pursuing multi-methods projects also use (or can benefit from) the techniques I’m going to describe. Relatedly, a lot of stuff I’m describing can be put under the “inductive” banner (essentially moving from your data to theory)—but I’m also describing stuff that’s deductive, too (moving from theory to data—don’t worry, I’ll explain what this means later). So, in the interest of methodological inclusiveness, I ultimately decided to skip these various (more methodologically substantive but exclusionary and narrow) labels and go for something a little more radical.5 Early on in this project, I adopted the Dirtbagging label as a shorthand to refer to my approach. It was a term used by mid-twentieth-century US-based rock climbers who committed themselves fully to a life of climbing, doing whatever it took to get by—including sleeping under the stars and scrounging for food rather than working a day job that would keep them from climbing full time. These “dirtbags” followed a countercultural ethos that was rejected and policed (literally) by mainstream society, a useful analogy for the type of qualitative social science that is likewise rejected and policed by mainstream scholars. It evokes a certain sense of deviance that I have often felt doing my research—that feeling of doing it wrong, of breaking some rule, of being judged for something that feels natural and necessary but that is not The Right Way. It’s definitely possible to overstate the countercultural characteristics of the approach to qualitative research I’m describing. As I’ve mentioned already, it turns out this approach is really common among people who actually do qualitative methods. And if so many people actually do this approach, how countercultural can it really be?6 In some circles—some friend groups, departments, subfields, and disciplines—the approach I’m describing is both common and normalized. In fact, some of my friends who read this book before it was published suggested this approach isn’t deviant at all. They were lucky. They were trained and/or worked in departments where this approach was the norm—or if not the norm for the whole department, it was the norm for the advisors who

6

Chapter 1

trained them. There are lots of departments and advisors like that. There are also lots of departments and advisors not like that. And when working in those contexts, you can really start to feel like you’re a rule breaker, even if you didn’t intend to be. Being a Dirtbagging social scientist is a bit like being a rock climbing hobbyist in the late 2010s: Rock climbing has also gone pretty mainstream (see the explosion of climbing gyms, the widespread release of two climbing documentaries in 2018, and the 2020 Olympics’ inclusion of climbing), but some people still look at you a little suspiciously if you say you climb. It’s that feeling of otherness and the anxiety that comes with it—despite the objective reality that we’re surrounded by people like us (even if we don’t know it yet)—that I’m going for by using the Dirtbagger label. However, the term “dirtbag” can definitely be off-putting. It raises connotations of, essentially, an asshole. Or maybe a hobo. In climbing, the term is much closer to the second. Indeed, climbing dirtbags and hobos have things in common (low resources, a sparse or uncertain shower schedule, living in situ); but for climbers, hoboing is a means to an end. Some general definitions of the term capture the point: dirtbag: A poor climber, alpinist, skier or other outdoorsman [sic] who lives cheaply, without normal employment, and with few amenities in order to spend as much time on their sport as possible. Used praisingly. (Wiktionary 2020) dirtbag: A person who is committed to a given (usually extreme) lifestyle to the point of abandoning employment and other societal norms in order to pursue said lifestyle. Dirtbags can be distinguished from hippies by the fact that dirtbags have a specific reason for their living communaly [sic] and generally non- hygenically [sic]; dirtbags are seeking to spend all of their moments pursuing their lifestyle. (Urban Dictionary 2020)

As these definitions indicate, the label is really about commitment, dedication, and passion—but doing so in a way that breaks those social rules that simply seem irrelevant or hold a person back from pursuing their goals. Hopefully this brief discussion makes something else clear that is really important: “Dirtbag” is not a derogatory term, but one climbers embrace and use with a certain amount of reverence. As explained in an advertisement for the 2017 documentary, Dirtbag: The Legend of Fred Beckey (a pioneering climber who was still dirtbagging around until his death at age 94), “In the climbing

Introduction to Dirtbagging7

world, being called a ‘Dirtbag’ is a badge of honor, a hard-earned title not for the meek” (@DirtbagMovie 2018). It takes grit to be a dirtbag. However, to emphasize the character, behavior, and positive meaning intended, and to avoid the negative connotation of being an asshole, I purposely use the more active-sounding dirtbagger (or dirtbagging) rather than dirtbag. And when I’m describing my methodological approach, I capitalize the term to be clear I'm using it in a formal sense rather than as the informal label given to climbers. This is a good time to explain that, befitting the dirtbagger image, I’m going to swear throughout because research is fucking stressful. Swearing has been shown to help people cope with stress—and to help athletes achieve really difficult feats (Byrne 2018). (In real life, I definitely swear the most while my mental health is the worst and when climbing really hard routes.) In this book, I’ll try to limit it to those places where we’re talking about something really painful, annoying, or difficult (or all three). It would certainly be strategic for me to adopt a formal tone and avoid bad words; that would help this book appear more legitimate and help to convince skeptical audiences that the approach it describes is indeed appropriate. But this book is not written primarily for that audience; it is written for people who are already attracted to this approach but who struggle with doubt, anxiety, and sometimes feelings of helplessness (or hopelessness) when it comes to their research.

What We Can Learn from Rock Climbing One of the things that has helped me most with my more recent mental health struggles (research related and otherwise) has been rock climbing. Consequently, this book is something of an homage to rock climbing. But there are two reasons why I think others will find it a helpful thematic choice. First, I’ve found rock climbing to be filled with metaphors and stories that have been useful for thinking through challenges in qualitative (and some quantitative) social science research. So I’m going to use rock climbing metaphors and stories throughout. Second, this book describes a style of doing qualitative methods as a high-risk, high-reward endeavor that can be simultaneously fun and super scary up to that exhilarating moment where you have finished the project— much like rock climbing. However, as one of my undergraduate research assistants noted while reading a draft of this book, “Any physical sport—not just rock climbing—can be used to understand the concepts in this book.”7 So if you don’t like the rock climbing metaphors, feel free to think about other activities where the same ideas hold. Use what’s useful, ignore what’s not.

8

Chapter 1

Here’s an example. Before starting a route, a climber might study their project, tracing the path they are going to take. This is kind of like the research design phase where the researcher plans out what steps they are going to take, what route they are going to follow. But then they get on the wall and realize they can’t do that particular move, or they need to use a different handhold. That happens with research, too—you get to your fieldsite (whether that’s a neighborhood, an archive, or an internet database), and you realize you can’t get permission to ask a particular question, that person isn’t available to talk to you, you just can’t find any information about that important event, or the data available to you just don’t speak to your research question as well as you expected. Sometimes, you have to lean back and study the wall you’re climbing or even get off the wall to get a better view; likewise, sometimes you need to rethink your path while you’re in the field—or even after leaving the field for a bit. This is just part of the climbing/research process. As this example suggests, the rock climbing motif offers some pretty healthy norms and attitudes that are helpful when it comes to research. Contrary to the asshole connotations, dirtbaggers—indeed, most of the climbing community as a whole—are a generally nice, supportive, non-judgmental group of people. Sure, there have been disagreements during moments of major transition—for example, debates about the right way to scale a mountain or whether people were sellouts for embracing the model of professional climber with corporate sponsorships and abandoning the model of penniless, scraggy dirtbagger running from the law. But there is something about the vulnerability and intimacy of rock climbing that breeds a supportive camaraderie. Rock climbers who show up at the gym or crag without a partner will often ask a stranger to rope in and climb together or spot them as they try to ascend a boulder so they don’t snap their neck when they fall. In other cases, climbers will share beta (advice) about sending a particular route. While watching someone climb a difficult route, climbers at the base will say things like, “Yeah, you got it. . . . Nice! . . . Send it! . . . ¡Venga!” Perhaps as an extension of this supportive atmosphere, rock climbers—and especially the current generation of rock climbers—have a strong belief in the possible that I find super inspirational. No one encapsulates this mentality better than Tommy Caldwell, who reminds us, “We are capable of so much more than we could ever imagine” (Caldwell 2015). Caldwell is famous for sending the seemingly impossible-to-climb Dawn Wall, an incredibly smooth, steep rock face in California’s Yosemite Valley with climbing partner Kevin Jorgeson.

Introduction to Dirtbagging9

By the way, Caldwell survived being held hostage for nearly a week in Kyrgyzstan and (later) accidentally chopping off his index finger—pretty awful for a pro-climber—all before completing what is probably the world’s hardest multi-pitch climb (about 3,000 feet of climbing). He might be one of the best examples of this anything-is-possible mentality fostered in the climbing community. Who else would look at a crazy steep rock wall and say, “I’m going to climb to the top”? This belief in the possible is a really nice ethos to surround oneself in when facing something that frequently does feel impossible—like when you are slogging through a seemingly endless dissertation or when you have a mountain of data to analyze. This ethos is particularly welcome when you are working on a project that others—maybe your dissertation advisor, a colleague, an editor, or a reviewer—dismiss as unrealistic or a bad project. Sometimes they are right. But projects that break out of the mold often get these sorts of reactions unfairly. Rather than prematurely judging a research project as infeasible, I much prefer the dirtbagger attitude: Okay, that’s going to be really hard; let’s think about what’s necessary to make it possible. So, with this dose of climber ethos added in, let’s revisit what it means to be a Dirtbagging methodologist. In the narrow sense I first introduced, it means a researcher who does things a bit unconventionally—who does things outside of The Right Way, basically doing whatever is necessary to complete their project even if it violates those inviolable rules. In a broader sense, though, I mean embracing a certain ethos about research adapted from climbing: the idea that there is no one right way and that it’s okay—good even—to be flexible rather than rigid. It also shapes what it means to be a good colleague (friend, advisor, reviewer, audience member): We should be supportive of one another and think about how to make difficult projects doable rather than dismissing that they are possible. It also means evaluating projects on their own terms, recognizing the norms of the method or (sub)field, rather than evaluating projects by narrow criteria that we impose on them. As this book progresses, the Dirtbagger spirit will come through in each chapter, but as we get into the nitty gritty mechanics, I’ll have specific advice for those Dirtbaggers who want to follow their own line up the mountain as well as folks who embrace the Dirtbagger spirit but prefer to follow the established routes in their own research. Indeed, consistent with the Dirtbagger spirit, no one says you have to use the same approach all the time—sometimes you might go full-on Dirtbagger and other times you might just respect the ethos.

10

Chapter 1

The Topographical Map This book follows the research process in a semi-chronological manner: general overview stuff, followed by advice about selecting and justifying your project, setting up (or later defending) your research design, and figuring out your data collection and analysis. Chapter 2 starts us off by considering how to define qualitative methods. Chapter 3 shifts our focus to the actual decisions that structure the research process, starting with identifying a research question. This chapter is central to the Dirtbagging approach because it addresses several misconceptions, or differences of opinion, about research questions beginning with what counts as a research question. Chapter 4 confronts the sometimes annoying but always necessary challenge of relating your work to something bigger than your project. That something bigger might be a policy-relevant issue (or something the public just finds absolutely fascinating) or theory (or some ongoing discussion or debate in the literature of your field). Whether and how you connect up to these bigger things will shape how well your project is received and what sort of impact it will make. Chapter 5 addresses the crucially important issue of research design. How you set up your study determines your ability to make timely progress and not waste scarce resources (time, money, energy—ours and others’). It also determines your ability to insulate yourself from potential critique of the finished work, so you need to get it right. At the same time, Dirtbagging social scientists require a certain amount of flexibility and the recognition that designing your research is an ongoing endeavor that you don’t only do before you start collecting data. Chapter 6 focuses on case selection, or who, what, where, and/or when will be the main focus of your study. This chapter reviews the major selection strategies you might use—whether you think through these options before beginning your data collection or whether you look at them afterward to explain why your case selection makes sense (even if the reasons you give are not the ones that led you to your case in the first place). Chapter 7 rounds out our discussion of research design by focusing on sampling, or generally how you plan to collect your data. Unlike the other two research design chapters, this chapter gets a bit more serious because sampling is pretty serious. While you can be flexible about a lot of things, there are certain checks you need to consider before, during, and after you collect your data.

Introduction to Dirtbagging11

Chapter 8 moves into the process of data collection. In this chapter, I describe a general model of “fieldwork” for qualitative scholars, whether they are ethnographers, interviewers, historians, or others working in an archive, a school, a prison, a neighborhood, an NGO, a community nearby or in some other part of the world, or sitting in front of a computer literally in pajamas eating M&Ms between keystrokes. The fieldwork model of research—which emphasizes a commitment to reflexivity and taking fieldnotes—offers a way to structure data collection, regardless of one’s specific method. Chapter 9 turns to the process of data analysis. Although recognizing that data collection and data analysis are intimately connected rather than two separate phases of research, this chapter discusses specific techniques for data analysis. Chapter 10 discusses strategies for causal inference with qualitative data. While recognizing that most qualitative research involves causal statements— and is particularly good at demonstrating causality without relying on logic and inference—some approaches require more of a leap of faith, and that’s nerve-racking. Chapter 11 addresses the elephant in the room—the major criticisms of qualitative research—both the valid critiques and the dogmatic misconceptions. It also prepares the reader for the related challenges of presenting qualitative research to the world. Chapter 12 reviews what it means to be a Dirtbagger and closes the book with some of the lessons for breaking out of the mold sometimes provided, sometimes demanded, by other approaches. It discusses the advantages of allowing and enabling multiple right ways in academic research. Finally, it offers some practical tips for moving forward. As might be apparent from this overview, there are a lot of things I am omitting. One of the biggest omissions is ethics. Okay, it’s not entirely omitted, but I don’t devote a whole chapter to it. Here’s why: This is an area that is rapidly changing, particularly with calls to make data publicly available amidst simultaneous concerns with protecting research participants’ privacy, all against a growing backdrop of community resistance to data collection. Basically, things are changing too quickly for me to make claims I’m doubtful will be useful five years from now. Additionally, research ethics is one area where we really need to pay closer attention to the norms and needs of a given subfield rather than applying general rules—too often, ethics requirements are set by medical research or quantitative social studies with harmful consequences for

12

Chapter 1

qualitative research because the requirements do not translate as well across these disparate venues. There’s another reason why I’m reticent to spend a whole chapter on ethics. While there are some clear cases of over-the-line shenanigans or malfeasance, and other cases that feel problematic even if we can’t always articulate (or agree) as to why, people can legitimately have profoundly different views on this subject. I have my views on this subject, certainly, but so do a lot of other people who have thought about it more or who have stronger ideological commitments. Indeed, as someone with a flexible, live-and-let-live approach, I don’t like making strong, specific statements about what people should or should not do in their research—beyond some vague basics like don’t harm (or let harm befall) your research participants. But people can disagree even over what counts as harm or what are acceptable levels of risk of harm. So I’ll mention ethics when it seems appropriate, and where I am confident in what I’m saying, but a lot of the time I won’t, because it will vary so much across research topics, methods, locale, your own preferences, and the year in which you’re reading this. I am also not going to offer an in-depth discussion of specific methods or other issues unique to specific methods, such as access to fieldsites, the details of positionality, how to perform QCA, or the complete mechanics of process tracing. I’ll discuss many of these issues in passing, but this book is intended as a guide—a kind of text-based mentor for someone new to qualitative methods—to get you through the research process, focusing on the biggest hurdles, especially the hurdles other books tend not to talk about. But there are simply too many qualitative methods and important facets within any given method to cover in one book. For those specific issues unique to a particular technique, I recommend finding articles and books that cover that specific technique (or issue). For general issues that traverse the mountain range that is qualitative methods, I recommend this book.

2

Topo What Exactly Are Qualitative Methods?

Top o Topo in climbing is a term which refers to the graphical representation (sketch drawing or a photograph with routes depicted) of a climbing route. It is also used for a climbing guidebook of a crag or climbing area in which most routes are described graphically by such topos. Wikipedia (2020)

Definitions’ Limits If I wanted to climb El Capitan in Yosemite National Park, there are different routes I could take. There are more than a dozen, in fact. Some routes can be climbed in a variety of ways, and others rely on a particular climbing style— like old-school aid climbing (where you hammer in bolts and climb on the gear rather than climbing the rock itself) or the relatively newer but still very well- established free climbing, now called “trad” for traditional (where you climb the rock while connected by ropes to removable anchors). Using either of these two distinct climbing disciplines, people will take about five or six days to complete a climb on any of El Cap’s routes. Additionally, some folks speed climb where they will use whatever method they can to get to the top as quickly as possible (in practice, it’s about on par with running a marathon—the record now stands just below two hours!). Some climbers have climbed El Cap as part of a triple linkup—where they climb El Cap, Mount Watkins, and Half Dome (three of Yosemite Valley’s most iconic mountains) in under a day, the climbing equivalent of an ultra-marathon. And others free solo parts or all of one of the routes (that is, skipping the ropes and anchors altogether).

13

14

Chapter 2

There is a lot of diversity there, and that’s just when talking about one particular (really famous) mountain. We could also throw in ice climbing and mountaineering (such as climbing K2 and Everest), or off-width crack climbing where you wedge your entire body into a thick space between two rock walls and kind of inchworm your way up. There is also sport climbing—both indoor and outdoor—which is extremely gymnastic and relatively safe, and bouldering where you climb up about 10–30 feet without a rope and land on crash pads to cushion your fall. There’s also deep-water free solo (where you free solo over water so you don’t die when you fall). Trying to define the category that includes this great diversity of climbing is kind of difficult. If I were to define rock climbing as making your way to the top of a mountain, I would get mountaineering, aid climbing, free climbing, and free solo—plus speed climbing and such. But I would be leaving out ice climbing, bouldering, sport climbing, and any other single-pitch (or short) climbs where you get to the top of something, but not necessarily a mountain. You can come up with a definition, certainly, but people won’t always agree on it. In fact, you might even be one of those people who recognizes all of these types of climbing as related but not necessarily actually all the same category called “climbing.” Some people think mountaineering, for example, is really its own category, and rock climbing is different—maybe ice climbing splits the difference. Likewise, plenty of people initially thought bouldering wasn’t real climbing—you’re “only” climbing a big rock, not a mountain. Qualitative methods is a lot like rock climbing in this respect (and others). There is a lot of disagreement over what counts as qualitative methods and difficulty in defining qualitative methods in a way that is sufficiently inclusive without being useless. So before stepping on the wall, let’s first get on the same page by what we mean by qualitative methods. *

*

*

This chapter tries to define qualitative methods, while discussing some of the difficulties with the most common definitions. We begin with a rundown of the typical methods of qualitative data collection—but note that qualitative data can also be quantitatively analyzed. We then review a lot of the traditional ideas or even stereotypes about qualitative methods—but note that they have been repeatedly challenged lately. Consequently, the easy markers of qualitative methods recited in various texts no longer hold up very well. Finally, we discuss when qualitative methods are appropriate and what type of research they let you do.

Topo15

Qualitative Methods of Data Collection One reason that defining qualitative methods is kind of tricky is that we usually conflate two very different (if sometimes overlapping) activities under the umbrella of qualitative methods: qualitative methods of data collection and qualitative methods of data analysis.1 Let me start by explaining the typical qualitative methods of data collection. • Ethnography/Participant Observation: Generally, an ethnography involves embedding yourself in a particular place for some period of time. Part of this embedding involves participant observation—that is, interacting with people while you are observing them as opposed to lurking from the edges and just watching. There is some slippage between what counts as an ethnography and what counts as a participant observation study, since ethnography involves participant observation but not every participant observation study can be called an ethnography. Sometimes both are just lumped together under the category of field methods or fieldwork. Both ethnographies and participant observation studies (where you aren’t embedded for long periods of time) likely involve interviews—either informal chatting with people or a formal sit down in which you ask questions relevant to your study. Usually, your observations of the setting and interactions therein, recorded in fieldnotes (see Chapter 8), are the primary source of data. • Interview-Based Studies, In-Depth or Intensive Interviewing, and Focus Groups: While there is some overlap with the prior category, I (like others) separate interviews out from ethnography because you can conduct interviews without participant observation or embedding yourself as you would in an ethnography. This method is sometimes called in-depth or intensive interviewing to signify that interviews are the primary source of data rather than a supplement to participant observation or an ethnography. Interviews are often one-on-one, but they can also be two-on- one, and so on. Even though they are a distinct category from interviews, focus groups look like large-scale interviews: These are groups of people who are essentially interviewed together, usually involving a fair amount of conversation among that group, sometimes self-led and sometimes led more by the researcher. Interviews and focus groups can both take at least three forms: (1) structured (must cover a specific list of pre-specified questions asked in a particular order);2 (2) semi-structured (should cover a list of questions,

16

Chapter 2

but the respondent’s answers really guide the interview; the interviewer asks “followup” questions about the respondent’s answers, pushing the interview in different directions or asking questions in a different order than initially expected); or (3) unstructured (no specific questions laid out ahead of time; the interview itself might be spontaneous and more conversational). The primary source of data is the interviewer’s notes recording the answers or an audio or video recording—or more precisely the transcription (either taken in real time or later)—of the interview or focus group. Note that the notes or transcript will include not just answers to the interviewer’s questions but also recorded observations of the respondent’s appearance and reactions. • Text-Based Studies: This category might have the largest variation as it can include going to archives, libraries, or online to find texts. These texts might be in digital databases, on microfilm, in books or old newspapers (these days, both can be paper or digital). The texts you use may come from curated or uncurated collections—for example, a carefully assembled box of documents with an inventory or a mishmosh of papers that seem to have little in common; an online database of nineteenth- century periodicals or an online forum with many user comments, threads, and subthreads. Sometimes you use search terms to find your documents, and sometimes you use everything available (e.g., all New York Times articles on French cooking or just all NYT articles).Whatever their provenance, some set of text (possibly including pictures or other graphics) will be the primary source of data. With texts, you also have a greater flexibility on the time range you study, including anything from the contemporary period (current organizational documents, newspapers, online forum discussions, tweets) all the way back to wherever you can find documents. • Newer, More Inclusive Methods: There are some methods that, if not entirely new, are becoming more popular these days. Part of what they have in common is that they reduce the hierarchical relationship with your research participants. These methods include the Diary Method, in which participants use a (written or oral) diary to record their feelings or actions at specific times or whenever they feel like it (or according to some other criteria laid out by the researcher). In this case, the diary is the primary data source. Relatedly, one can use mixed-media approaches like Photovoice. This might

Topo17

include having the participants take videos or pictures of themselves (either with their own equipment, like smartphones, or with equipment provided, like disposable cameras). In this method, the videos or pictures are the primary datasource, although in most cases, these sources supplement other forms of data collection. Finally, more researchers are using Collaborative Research Projects, in which initial participants are later included as research assistants or co-researchers, who in turn interview other participants or observe other participants or settings. These participant-researchers might further collaborate with writing up the project and present the work in public and/or professional fora. The primary datasource is again the ethnographic fieldnotes or interview notes/transcript. This list is not comprehensive, but these seem to be the most common methods of qualitative data collection. Importantly, using these methods does not necessarily mean you’re doing a qualitative study. One can use qualitative methods to collect data that are then quantitatively analyzed. For example, you could potentially keep track of how many interactions you saw during your ethnography and create a quantitative dataset. Or you could perform qualitative content analyses on newspapers, then convert the results into numbers in a spreadsheet. In both cases, you could run regressions (or other statistical analyses) on your data. Increasingly, people are using machine learning strategies for analyzing millions or billions of lines of text—for example, tweets. Consequently, I feel more comfortable defining qualitative methods by the types of analysis one does rather than by how the data are collected. (However, we still need to talk about data collection because how the data are collected affects the types of analysis one can perform.) For reasons that should become clearer below, I will define qualitative methods broadly to include (potentially) any social science analyses that do not involve tests of statistical significance (e.g., regressions of various kinds). This is a Big Tent book (many ideas welcome—we’re not judgy), but I draw the line at (sophisticated) statistical analyses. Even though I love statistics, I think we can all agree they’re not a qualitative method.

Characteristics of Qualitative Methods Beyond this minimum boundary drawing, what are qualitative methods? There are some standard descriptors or criteria for qualitative methods, but there are exceptions to each. So let’s not think of them as criteria or as part of

18

Chapter 2

the definition, but rather as general characteristics of those methods within qualitative methods’ fuzzy boundaries. Let’s also not cling to them too tightly, because doing so can leave out important examples of qualitative work. One thing that most qualitative methodologists would probably agree on (although some quantitative scholars would dispute) is qualitative methods are empirical (e.g., Lamont and White 2005). To use a basic, dictionary definition, empirical means “based on, concerned with, or verifiable by observation or experience rather than theory or pure logic” (Oxford English Dictionary 2020). Said differently, it has some relationship with data; you cannot just sit in a chair and make shit up. Okay, that’s a pretty minimal qualification. But good qualitative methods are empirical in the sense that they involve the systematic collection and analysis of data (Morrill 2007). This is a point ethnographers are pretty clear about: You don’t just go into the field whenever you feel like it (“I went to the same café for a year to get coffee and I noticed this pattern” is not going to cut it), you can’t just talk to some people (“She seemed easier to approach than that guy” is a bad justification), and you can’t focus only on some observations (“This was just more interesting than that so I didn’t record that” might be problematic). Instead, there needs to be some justification, some consideration of how your data can show you the big picture and—this is arguably the most important part—where their limitations are. We’ll talk about how to do all that later, but for now, the takeaway is that qualitative methods involve doing things methodically and according to some defensible logic. Beyond this one characteristic, things start to get a little complicated. Small Sample Size? “It’s Not the Size of Your n That Matters” Let’s begin with one of the most common attributes: People often define qualitative methods as having something to do with “small-n” research, meaning research using a small sample size (n, short for number, signifies the number of cases or observations in a study). For example, a National Science Foundation report on qualitative methods states, “Qualitative research stresses in-depth contextualization, usually with small sample size” (Lamont and White 2005, 4). Traditionally, n is determined by your unit of analysis—that is, the thing you are studying in order to understand some larger phenomenon. In a case study of a prison or classroom, the prison or classroom is your unit of analysis; in an interview study, a person is your unit of analysis. Count up each of these units in your study and you’ve got your n. Thus, case studies—extended

Topo19

analyses of a particular setting like a neighborhood, a business organization, a classroom, a prison, a hospital wing—are commonly defined as having a sample size of one (also written n = 1). Alternatively, one might perform a series of interviews with perhaps somewhere between 20 and 200 people. This is what people mean by small-n research: you are studying one case of something or you are interviewing 200 or fewer people (or maybe 300 or fewer—there is no bright line distinguishing small from large). The underlying consideration is you don’t have enough cases or observations to run a regression (i.e., generate sophisticated statistics) on your data.3 Already, you see that how we define qualitative methods in contrast to quantitative methods. But this habit is problematic on a number of levels: For instance, why aren’t quantitative methods the comparison case and qualitative methods the norm, especially considering that so many classic studies in most disciplines were qualitative studies? But one of the biggest problems with this habit is its tendency to create mistaken conceptions, including misconceptions about sample size. Indeed, as some scholars have pointed out (e.g., Brady and Collier 2010), qualitative scholars actually end up with copious amounts of data. It’s a huge mischaracterization (and not just a little insulting) to say a historian or an ethnographer, for example, has few observations. The historian may review boxes upon boxes of texts across multiple archives. Likewise, the ethnographer may spend thousands of hours in the field and end up with fieldnote pages that number in the tens of thousands. That is, they have a lot of data.

Rock Climbing Is Dangerous Just as people mischaracterize qualitative methods, so too people mischaracterize rock climbing—usually in similarly judgmental ways. A lot of folks will characterize rock climbing as this particularly dangerous activity only done by crazy, thrill-seeking adrenaline junkies (just watch the news any time someone completes a really exciting climb). These labels are most often associated with free soloist Alex Honnold. Certainly, what he is doing is dangerous: He is climbing without a rope, so if he falls from, say, a thousand feet up, he would die. But as he points out, his climbing is not at all about chasing an adrenaline high; instead, “If I get an adrenaline rush, it means that something has gone horribly wrong” (Honnold and Roberts 2016). Moreover, while rock climbing poses some objective hazards—like possible falling rocks or the chance that your protection fails—there are many other things we do in

20

Chapter 2

life that are more dangerous. Indeed, it’s often the case that we can flip things around and demonstrate that the very things being critiqued are problems for those doing the critiquing.

We can also flip things around and, using the same criteria, apply the small label to a lot of quantitative work. My favorite subfield in criminology exn amines sentencing disparities—differences in sentencing across convicted criminals who have otherwise similar legal attributes but different demographic or contextual attributes. It’s a big subfield, probably the biggest one outside of studies examining the causes of crime. (It’s like the criminology equivalent of the health disparities studies in sociology or voting behavior studies in political science.) The standard method is to use some kind of regression analysis on a rectangular dataset (such as the type of data you might open with an Excel spreadsheet) obtained from state officials. These datasets are usually pretty big—there may be several hundred thousand people in the dataset. Sometimes, they are smaller, with closer to ten thousand observations. So large n, right? If we apply the same metric that is applied to qualitative case studies, I’d say no. For example, usually these datasets are confined within a particular setting—such as the state of Maryland, Pennsylvania, or Florida—or maybe a particular federal district. If I have a dataset from the state of Maryland, is that a case study? Officially, the unit of analysis (that is, the things we’ve collected data on in order to study sentencing disparities) is the person, so we would say the sample size is the number of people in the dataset (the thousands or hundreds of thousands of people). But I would argue that it really is a case study (n = 1) because we’re looking at sentencing disparities within that particular jurisdiction—or case. There are lots of differences across states, as sentencing disparities scholars are frequently aware and acknowledge. My hypothetical study of sentencing disparities in Maryland might not tell us much about sentencing disparities in Florida (let alone Montana or Connecticut) if they have different court systems, politics, and demographics.4 In that sense, our sample size really is just one because we’re investigating sentencing disparities in one state, however much statistical power we derive from our several hundred thousand observations. As with qualitative case studies, we have great confidence in our findings, but we have to think carefully about how things might change if we had selected a different site.

Topo21

So how we answer this question of sample size actually comes down to a matter of perspective and convention: Some quantitative scholars get a little annoyed if you tell them they are really just doing case studies. But that feeling of annoyance is also how those of us performing “mere” case studies feel when someone says our sample size is one, ’cuz it’s not. To take my research as an example, which I will be doing throughout, I have written a book about a particular prison, Eastern State Penitentiary, that opened in Philadelphia in the early nineteenth century. It is a case study and, in that sense, my n = 1. But it’s also an institutional history of the years when the prison employed long-term solitary confinement. This “Pennsylvania System” was authorized in 1829 and deauthorized in 1913, so my study focuses on that period (although, for context, I start a few decades before 1829). Putting it conservatively, we might say my n = 85—after all, I read and analyzed all of the prison’s annual reports from its opening to 1913. I was also interested in and collected information on the 60-plus men who ran the prison, the 10,000 or so people incarcerated within its walls, and the dozens of local penal reformers involved with the prison during this period. Zooming out a bit, I also collected information on the roughly 30 modern state prisons that were authorized nationwide in the 1820s to the 1850s. At the end of the day, my study of this one prison actually involved a lot of research on other prisons, places, and people; and for different analyses, I ended up with different n’s. Moreover, having analyzed several thousand pages each of a daily journal kept by Eastern’s wardens, the monthly meeting minutes of its board of inspectors, and the monthly meeting minutes of a local penal reform group—and having consumed about a hundred pamphlets on penal reform—I get a bit annoyed when people call this a small-n study. And I’m not the only one, certainly. To deal more constructively with this annoyance, political scientists Henry Brady, David Collier, and Jason Seawright (2010) have developed the term “causal process observations” (CPOs) as distinguished from “dataset observations” (DSOs) (see also Brady 2010). DSOs refer to the individual entries in a rectangular dataset that one can feed into statistical software like STATA, R, or SPSS to analyze via regressions and such. CPO is a more scientific sounding term for all of those documents I reviewed and the rich information within them—the tens of thousands of individual entries in the warden’s daily log, for example. The point is to realize that just because our data are not as easily

22

Chapter 2

quantified as large, computer-readable “datasets”—because we’re doing a case study (or even a multi–case study with maybe three or four cases)—or our data can be quantified in misleading ways—because people count the number of interviews rather than the amount of data they yielded—doesn’t mean we don’t have a lot of data. (The term CPO also helps remind people that qualitative methods can be used for causal inference, as we’ll discuss more below and in Chapter 10.) So this focus on the quantifiable amount of data, your n, is somewhat misleading and can prompt some cheeky responses. When I participated in a two- week training on qualitative methods hosted at Syracuse University by the Institute for Qualitative and Multi-Methods Research, we received awesome commemorative shirts, the back of which read: “It’s not the size of your n that matters.” I wore mine until it had holes in it—and even after that. Generalizable in a Different Sense The emphasis on small-n is related to another factor people use to distinguish qualitative from quantitative methods: People will sometimes say that qualitative research has limited external validity. That means it’s basically invalid to apply your conclusions to cases external to your study—that is, to generalize beyond your study. People who make this critique are usually thinking about how qualitative studies focus on a “small” area or group of people—an ethnography of a neighborhood or a specific hospital, interviews with 30 people going through a reentry facility, transcripts from an internet forum with several dozen participants. Because of a study’s “small” focus, its findings, however interesting, will not tell us much beyond that venue. Such criticism assumes that the study participants or locale aren’t representative of a larger collection of people or places. Another version of this critique deals with efficiency: What people really want to know is, if you expend a lot of resources (time, money, energy) collecting and analyzing data, what is the payoff? Sure, you can tell us something about your data—your place, your people, your period—but will it be true beyond that group? The fear is that small-n research is going to be insufficiently representative of a larger population to be generalizable and thus won’t be worthwhile.5 Quantitative studies get less pushback because people tend to have a general (if sometimes mistaken) expectation that quantitative data, if randomly sampled, will be representative of some population of interest. If you take a good-sized random sample of the national population, as many phone-based

Topo23

surveys underlying public opinion polls do, you can have some confidence that, yes, it will be generalizable—if you’ve done things correctly (for example, thought about whom you can’t reach via phone and taken steps to address that limitation). Of course, this isn’t always true. For example, sometimes people try to generalize beyond the population from which they sampled. But if you took a random sample of Californians, you shouldn’t expect that sample to be a representation of some other group living elsewhere unless you are measuring something really fundamental that is not expected to change across state lines. Even then, however, you might want to make sure by sampling from some other states. (This is one of the points I was hinting at in my sentencing disparities example above—who is to say the sentencing disparities in Pennsylvania will be generalizable to Oregon?) These considerations are the types of generalizability problems quantitative scholars deal with. But as qualitative scholars, we don’t typically draw random samples—in a lot of cases, that might be a bad idea. And, often, we don’t want to study a representative case. In fact, a lot of us qualitative people are interested in some fairly unique cases. The prison I studied was the only prison to continue to use long- term solitary confinement. Mona Lynch (2010) studied the history of punishment in Arizona, a state that was known for decades (almost a full century) for its rejection of national penal norms. Jooyoung Lee (2016) studied young Black men living in the Los Angeles area who hung out and performed in a particular rap club—the fact that they lived in the US’s entertainment epicenter was particularly significant to his account. Randol Contreras (2012) studied a unique gang in New York who couldn’t make money selling drugs so they specialized in finding and torturing drug dealers in order to steal their money—this strategy is certainly not the dominant one within the underground economy. So, in some sense, when people pester us about how generalizable our studies are, their question kind of misses the point of the research.6 Each of these studies is interested in a particular case (and in that sense, we don’t even care about generalizability in the sense of representativeness), but we are also interested in creating theoretical insights that will be true in other settings. In my study, the uniqueness of my case helped to crystalize the role of anxiety among prison administrators dealing with this new technology of putting people in boxes for long periods of time (i.e., prisons). Ultimately, I realized that this anxiety was stronger at my prison (Eastern), but it wasn’t unique to my prison—penal reformers, prison administrators, legislators, and ordinary citizens around the country (and across the ocean) were apprehensive

24

Chapter 2

about the new prisons and how prisoners would respond to their captivity. That insight became apparent because I was studying this unique prison: It would have been harder to see if I had studied the more mainstream prisons because, while this anxiety was present, it wasn’t as strong. Once you know what to look for, though, you start to notice it in these other cases. Something similar was going on in Lee’s study of the Los Angeles rappers. He found that the pressures of being a young Black man in the contemporary United States, with a low chance of upward mobility and a high chance of having one’s life getting derailed or ended by violence, gave these young men a sense of urgency about making something of themselves. In studying this unique population, he identified a more general struggle that affects other groups facing a similar time crunch like professional ballerinas, baseball players, or mathematicians: Because there is such a small window of opportunity before one can make their breakthrough, people in these groups make choices that appear irrational to outsiders but that are perfectly rational given the constraints they are working under. In both studies, these otherwise unique settings or groups reveal useful insights about other settings or groups. They are thus generalizable not in their specific features or attributes—which prison system Eastern relied on or the geographic locale and socio-economic status of the people in Lee’s study—but in concepts, processes, or mechanisms that transcend the specificities of time, place, or group. Kristin Luker (2008, 126) has a nice phrase for this style of generalizability that focuses more on abstracting from your study and its concrete particularities. She tells us to “bump up a level of generality. . . . Once you know—at the most abstract level—what your study is about, consider how it is informed by other studies that think about things on this same level of abstraction.” This is why someone studying nineteenth-century prisons still might care about my study of an entirely atypical nineteenth-century prison, or—getting to a higher level of abstraction—why someone studying aspiring ballerinas or academic mathematicians might read Lee’s study of aspiring rappers. The higher the level of abstraction, the more generalizable the study because it can speak to underlying processes that happen outside of your unique case or sample. Said differently, qualitative work tends to “generalize theoretically” (Luker 2008, 127). Here’s another way to think about it. If you are a climber who spends all her time climbing giant granite cliff faces in California’s Yosemite Valley, can you also climb in Morocco’s limestone cliffs? Yes! Will there be differences?

Topo25

Yes! How much do they matter? It depends on what you really care about! You need the same basic gear (harness, climbing shoes, chalk bag, rope). Depending on the weather, you might wear the same clothing. In Morocco, you are more likely to do some sport climbing—more common on limestone—than, say, other forms of free or aid climbing, as you would in Yosemite. That means you might not need a haul bag in Morocco, but you would in Yosemite. But, for the most part, you’re going to use a lot of the same moves: If you know how to place your toe on a small hold or bend your knee in the right way to help you reach a distant handhold, you would use these moves in both places. There are always going to be similarities and differences; the question is will there be (big) differences in the thing you care about. The climbing might feel different in Morocco on a limestone sport climbing route than if you free climb up Yosemite’s granite cliffs, but in many ways, it’s still the same activity. Likewise, whatever insights you come up with by studying some unique group might also be true of some other group, even if they are different in a lot of ways. Theory Generating and Theory Testing Thinking about the relationship between qualitative research and theory raises a third common misconception about qualitative methods. A lot of times, people will say that qualitative work is concerned with generating theory rather than testing theory. Sometimes, they’ll use other terms like “inductive” and “deductive.” Again, this is part of an effort to distinguish qualitative work from quantitative work: Quantitative work is for testing theory; it is deductive, or it moves from the general (i.e., some theoretical proposition that should be generally true) to the particular (i.e., some dataset with all of its uniqueness). Qualitative work is for generating theory; it is inductive, or it moves from the particular to the general. But like the small-n claim, or the lack of generalizability claim, this too is problematic. In fact, political scientists have built a number of qualitative tools for people to perform theory testing. Process tracing is one of the most popular of these tools: As I understand them, the basic steps are to create a research design that allows you to test a hypothesis, and then, getting into the nitty gritty data, trace the mechanism hypothesized to cause some outcome of interest (Bennett 2010; George and Bennett 2005). I think of it as akin to following the trail of whatever thing we suspect is responsible for that outcome of interest, according to some theory (ours or someone else’s). If we can’t find the trail that the theory says should be there, then that suggests there is a problem

26

Chapter 2

with the theory. There are also a number of checks one can run, with different levels of confidence, to avoid confirmation bias and such (see Chapter 10), but sniffing out a trail is the basic idea. It’s hypothesis or theory testing, but with qualitative data. Here’s a very simple example from my book on Eastern: When I tell my research question to people who are somewhat familiar with the prison I study, a lot of them try to answer it for me based on information in textbooks (almost every criminology textbook talks about this prison, and much of it is wrong). These speculations are not the most theoretically informed, but I can treat them as hypotheses nonetheless. One hypothesis is this: Quakers are responsible. Quakers, or more properly the Society of Friends, are the Christian sect known for their pacifism and history of persecution at the hands of other Christian sects. Those two factors made them particularly active in penal reform, especially opposition to capital and corporal punishments and support for incarceration as an alternative (going back to the 1600s at least). Pennsylvania is often called the Quaker State; Philadelphia, where the prison is located, is likewise called the Quaker City; lots of Quakers were involved in early penal reform; it must be the Quakers. (Again, not the most sophisticated explanation, but let’s run with it.) To use a process tracing approach, I’d set up a series of facts that I should expect to be true when I get into the archival data and go behind the scenes. Some of these expectations also rely a bit on logic, but some of it is also just about checking to see if the assumptions behind this hypothesis are accurate.7 We can start off at the most superficial level of these assumptions. Pennsylvania is the most well-known Quaker state, but actually there were a lot of Quakers in New York, too, and they were also active in penal reform. New York pioneered a different method of incarceration, so having Quakers around does not automatically cause a state to adopt the Pennsylvania System. But it is true that New Jersey and Rhode Island were also Quaker states, and they did adopt the Pennsylvania System, which seems to support the Quaker hypothesis. Not so fast: These states also got rid of the Pennsylvania System pretty quickly. In fact, there was another prison in Pennsylvania that also followed the Pennsylvania System only to abandon it later. So while Quaker stronghold states may have been more likely to adopt the Pennsylvania System, their Quaker status was not enough to keep it around. We can also go deeper, looking for other indicators of what role Quakers played beyond the superficiality of state-level correlations. There were a lot of

Topo27

Quaker reformers in Philadelphia, including about one third of the city’s major penal reform society’s members—that society was pretty important in getting the Pennsylvania System authorized. Now, if one third of the society’s members are Quaker, is that enough to say the Quakers were responsible? That might be a matter of opinion. But we could also keep looking around for more straightforward information. It turns out that same society also eventually criticized the Pennsylvania System and tried to alter it, getting rid of its most criticized features. So Quaker support—if we want to call it that—wavered. Finally, we can step back and examine the logic underlying the hypothesis: Quakers supported solitary confinement. It turns out they didn’t. Quakers were pacifists and preferred incarceration to capital or corporal punishments, but they didn’t have any unique preference for solitary confinement over other types of incarceration. In the end, we can safely say Quakers played a role, but there is enough other stuff going on that this answer does not seem to be a particularly viable explanation for Eastern’s unique history. I did the same exercise with a number of other hypotheses. Each one is based on a grain of truth, but when you go through the actual history, they are not very convincing—until you get to my explanation, but my explanation is a new explanation. I had to generate it. And this gets us to another complicating factor: A lot of research, both qualitative and quantitative, has a degree of both theory testing and theory generating to it. It is not uncommon for quantitative scholars (often associated with theory testing) to recommend alterations to the original theory based on their statistical findings. Likewise, qualitative scholars (often associated with theory generating) usually pursue some amount of hypothesis testing in the sense of looking at competing explanations as a way of evaluating the relative strength of their primary explanation, whether they use process tracing or not. There is even a third or middle way of testing and generating theory, popularized by Michael Burawoy (1998), which is to “reconstruct” theory—essentially adjusting the theory for the specifics of the case in question. But the main point is that it simplifies things too much to say that qualitative methods are only useful for generating theory. We’ll return to these ideas later, especially in Chapter 10. Mostly Words, but Numbers, Too Finally, perhaps the simplest method for dividing qualitative and qualitative methods (that is also the easiest to dispense) is the idea that qualitative scholars use words while quantitative scholars use numbers. Yes, people say this

28

Chapter 2

(e.g., Jacques 2014), and it’s one of those things that is kind of true—qualitative scholars tend to work with words and quantitative scholars tend to work with numbers. The problem is that there are too many exceptions to use this characteristic as defining the difference between the two approaches. For example, there are people doing statistical analyses on text (whether tweets or New York Times articles); their work necessarily includes both words and numbers, but I’d still call this quantitative work because their primary analytical tools are statistical analyses. Likewise, a lot of qualitative studies include some numbers. For example, David Snow and Leon Anderson’s study of homeless people in Austin, Texas, identified the number of different types of homeless people in their study. One of their key findings was that there wasn’t just one standard type of homeless person, but actually a rich diversity of types and different reasons and pathways by which they became homeless. The numbers associated with each type gave a sense of how common each type was in Austin (Snow and Anderson 1993). Likewise, Alice Goffman’s study of a Philadelphia neighborhood she called Sixth Street included counts of how many times she saw police officers do certain activities like stop people, arrest people, raid houses, and so on. She also included the results of her surveys of the neighborhood’s residents where she went from door to door asking people a series of questions (Goffman 2014). Both of these studies are ethnographies that involved long periods in the field, participant observation, and interviews that allowed the authors to present rich descriptions of the people, places, and experiences they studied, as well as compelling quotations from their research participants. But they presented both numbers and words, even though few people would call these studies quantitative (and there were no regressions). There are some other examples of qualitative studies that rely on numbers. Two of my favorites are included in Brady and Collier’s book, Rethinking Social Inquiry. One is by the late statistician David A. Freedman (2010) in which he reviews several historical cases of disease outbreaks and how qualitative reasoning, especially logic and a detailed knowledge of the context, in combination with statistical and experimental data, enabled people to solve the mystery of how these diseases were spreading and what their sources were. The other is a study from Henry Brady himself examining the 2000 US presidential election. The question was whether the premature announcement that Al Gore had won the state of Florida caused voters in the state’s western Panhandle (in a different time zone, so polls had not yet closed by the time of

Topo29

the announcement) to avoid voting in sufficient numbers and thus cost George W. Bush a clean win in the state. (Eventually, the question of who won the state, complicated by other considerations, went to the Supreme Court, which declared Bush the winner and secured him the presidency.) Brady produces a variety of estimates for how many people heard the announcement, how many of those people were still planning to vote when they heard the announcement, how many of those people were Bush supporters, and so on, to calculate an estimate for the number of votes Bush actually lost due to the early call. But rather than using regressions or high-level statistics, he uses logic and knowledge of the context. Ultimately, it’s a qualitative study that uses numbers (Brady 2010). Using numbers doesn’t automatically make it a quantitative study—but using high-level statistical techniques like regressions does.

So, When Are Qualitative Methods Appropriate? This chapter has reviewed—and argued against—definitions that characterize qualitative methods as small-n research that is not generalizable, that is only useful for theory generating, and that sticks to text without numbers. Instead, I’ve suggested that qualitative methods are an empirical approach to social science research that involves collecting and analyzing a lot of data, is broadly generalizable through theoretical concepts and mechanisms, can engage in both theory generation and theory testing, and mostly involves studying words, but often includes numbers, too.8 What can you do with this broad field of methods? Lots of things. Pretty much everything except answer the precise numerical questions for which statistical methods have been perfected. That is, if you are interested in measuring things or calculating the numerical effect of something on something else (or the association between two things), statistics are best. Statistics are great for answering “how much,” and qualitative methods are fantastic for answering the “how” and “why” questions. For example, how does a particular group or population make sense of [something interesting that happens to them]? Why did a particular group [do this anomalous thing]? How was this state or organization able to [do this really bad thing]? Qualitative methods excel at making sense of puzzles. For me, it was why this one prison retained its exceptional approach to incarceration despite intense criticism for doing so. For Jooyoung Lee, it was why young urban black men sometimes put aside the few, modest economic opportunities available in order to pursue their highly unlikely dream of becoming a rap star.

30

Chapter 2

So far, these are all question-based approaches to research, but there are other things you can do with qualitative methods. Foremost among them is describing and documenting new trends. Statistics are great at providing numerical estimates, which are attractive for news soundbites—a recent headline referred to the “surprising” number of women who are harassed when they go jogging outside. But qualitative methods can be summarized, too: Michael Gibson-Light’s (2019) ethnographic dissertation was picked up by a variety of news outlets, noting his finding that prison labor reinforces existing inequalities found in society. But the real power of qualitative methods lays in textured descriptions. One of my favorite articles is Mona Lynch’s account of attending an execution by lethal injection in Arizona (Lynch 2000). When her article was published in 2000, lethal injection was still fairly new—pop culture references to executions still tended to focus on the electric chair and gas chamber, methods that were still around but largely fading from the penal landscape. While there were some newspaper articles about lethal injection, they were fairly short and tended to focus on a common set of information—who the condemned and their victim(s) were, whether the victim’s family was in attendance, what the condemned had for their last meal, what their last words were, and how long the execution took or time of death—not rich descriptions of how the execution ritual was prepared and conducted. Lynch’s article was the best account of what these new, modern-day execution rituals looked like, as well as what happened before and after the execution itself. (She also related this information to other trends in US criminal justice and contributed to our theories on punishment and penal change, but the descriptive material always stood out to me.) It was a timely, important, and beautifully written article that became quite popular among punishment scholars. Qualitative methods are also very useful for identifying causal mechanisms. An exemplary illustration of this ability can be found in Katherine Beckett and Steve Herbert’s analysis of spatial exclusion ordinances in Seattle, Washington (Beckett and Herbert 2010). Over the last few decades, cities in the United States, UK, Australia, New Zealand, and elsewhere have passed civil ordinances that prohibit unwanted behavior that is not technically criminal behavior— loitering in a park, sleeping on the sidewalk, or trespassing on public property (yes, you read that right). People who violate these ordinances (mostly homeless people) can be excluded from other areas of the city (e.g., other parks, shopping areas) and even sent to jail.

Topo31

Because these ordinances are part of civil or administrative (and not criminal) law, exclusion orders are not technically criminal punishment (even though they can lead to jail time). Beckett and Herbert interviewed people who had been processed for violating these ordinances and found that they very much experienced the sanctions as punitive. In fact, the things they talked about as consequences of these violations corresponded to a classic account of the pains prisoners experience as a result of their incarceration. For example, prisoners are deprived of goods and services (they only have access to what is sent to them and what is available in the prison commissary, if they can afford to pay) and safety (there are some dangerous people in prison who can in turn prey on other prisoners). Likewise, Beckett and Herbert found that people processed under these exclusion ordinances no longer had access to goods (like the clothing and food delivered to certain parks for homeless people) or services (such as the Veterans Administration or Social Security office), and they often felt more vulnerable because they were forced to sleep in less populated areas where they were more likely to be harassed, assaulted, or worse. That is, these people, who were not officially getting punished, experienced some of the same pains as people who were officially getting punished. When I first taught this article to a group of criminology graduate students, one of them criticized the article because its findings were “confounded,” meaning something could have been causing the deprivation other than what the authors claimed. The student argued that because the people were homeless, they were already deprived of goods and services and safety. If this study had been a quantitative study (for example, a survey research design), the student would have been correct: The relationship between the pains the people experienced and their exclusion orders could potentially be a spurious (or misleading) relationship. But because the authors had used interviews, they could identify the reason for the deprivation and thus demonstrate that the people’s deprivation of goods and services, for example, was not simply a result of their poverty: Before the spatial exclusion orders, they were able to go to various parks and receive donations; the spatial exclusion orders prohibited them from going to those places, and there were no donation drop-off points in the areas where the people were permitted to go. The qualitative nature of the project enabled the authors to trace the causal connection between the exclusion orders and the pains rather than just demonstrate that a correlation existed. Qualitative methods also let you trace or explain processes by which some outcome occurs. In my book, I unpack the process by which Eastern, the prison

32

Chapter 2

I study, became (what I call) a deviant prison—that is, a different and highly criticized prison. Eastern certainly didn’t start out that way. Eastern was built in Pennsylvania, which was an early bellwether state responsible for producing many of the leading laws and innovations that other states had adopted. Eastern was also a highly anticipated, much-discussed prison that reformers from around the country were initially very excited about. Moreover, most early commentators did not distinguish very strongly between the Pennsylvania System that Eastern used (long-term solitary confinement round the clock) and the Auburn System that other prisons used (keeping prisoners together during the day and putting them in solitary only at night); to early commentators, they were pretty similar. But a series of pamphlets changed that perception: They documented some earlier experiments with a different type of solitary confinement at other prisons in which prisoners became physically and mentally ill, mutilated themselves, attempted suicide, and some died. One particular reform society that had been trying to convince states to build a particular style of prison used these episodes to criticize any use of long-term solitary confinement, even though there were substantial differences between the types of solitary confinement that proved fatal and the kind used at Eastern, which was designed to avoid these health problems and had a far lower fatality rate. (Keep in mind that none of these approaches, even the approach they advocated, were good for prisoners’ health.) Very rapidly, opinion turned against the Pennsylvania System, and criticism of its approach increased. My qualitative research allowed me to map out the process by which a widely anticipated prison, which had started as the heir to previously popular innovations, quickly became a penal pariah. Qualitative work is also excellent for constructing typologies and taxonomies. These are both ways of subdividing some population—of people, organizations, places, or other phenomena—into different types or categories. I often conflate these two terms or say “typology” when I should say “taxonomy,” and I think other folks do that, too. But, to be clear, they are different. A taxonomy is essentially an enumeration of the different types of things within a category (Lofland et al. 2006 [1971], 146–147). It’s basically a list of similar but different things—or rather different versions of a particular thing. One of the most famous taxonomies in criminology is Gresham Sykes’s (1956) taxonomy of the types of prisoners, sometimes called “argot roles”—extending a smaller list created earlier by Donald Clemmer (1940). These roles are based on prisoners’ behaviors: For example, “merchants” sell things, “hipsters” talk tough, “punks”

Topo33

and “wolves” engage in different types of same-sex relationships, and the “real man” maintains his dignity amidst all the challenges of doing time. Another example was Sykes’s taxonomy of the deprivations of imprisonment that I alluded to above: The deprivations of liberty, autonomy, goods and services, security, and heterosexual relationships (Sykes 1956). By contrast, a typology enumerates the different types of things within a category according to some set of criteria or variables (Lofland et al. 2006 [1971], 148–149). A typology is kind of like a two-or three-dimensional list of similar things sorted into several categories of difference. The easiest version of a typology is a 2x2 table, the simplest version of which has two binary (or two- category) variables, where the columns and rows are basically yes/no or present/absent for each variable. In my work on the early prisons, I set up a 2x3 table that looked something like Table 2.1 where I used a binary variable and a three-category variable. Basically, I divvied up the US states by whether they had adopted a proto-prison (the rows) and when they adopted a modern prison—early, late, or not at all in the period examined (the columns). There were many ways I could have done this, and I did experiment with several other versions: north/south by early/late adopter, urban (states with big cities)/not urban by early/late adopter, and so on. (I’ll come back to this example in Chapter 10.) These typologies let me see the relationship between different variables of interest, or in Table 2.1, the relationship between when states adopted a modern prison (second- generation prison) and whether they previously had adopted a proto-prison (first-generation prison). Typologies and taxonomies can be useful in a variety of ways. At the first order, they can be descriptively useful, as when Snow and Anderson showed Table 2.1 Typology Using a 2x3 Table. This typology sorts states, identified as “o,” by the date ranges when they adopted a modern prison (columns) and whether they previously had adopted a proto-prison (rows).

Proto-Prison No Proto-Prison Source: Rubin 2015a.

Early Adopter

Late Adopter

(1820–1834)

(1835–1860)

ooooooooo

ooooooooo

Non-Adopter

oo

34

Chapter 2

that—contrary to conventional wisdom—there isn’t one single type of homeless person but about a dozen. Typologies and taxonomies can also be useful for causal inference (i.e., making claims about how something causes something else) by examining the different types and relating them to some outcome of interest or treating the type itself as an outcome of interest. For example, Snow and Anderson showed, for each type of homeless person, the different pathways for becoming homeless and also how different types of homeless people spend different lengths of time as homeless people (helping to answer questions about how one becomes and remains homeless). As these examples illustrate, qualitative research does a variety of tasks really well—answering “how” and “why” questions, resolving puzzles, describing new trends in rich detail, identifying causal mechanisms, unpacking complex processes, and constructing taxonomies and typologies. But there’s something else that qualitative research excels at: It can be really compelling. While numbers can be very persuasive, even when they shouldn’t be (Merry 2016), qualitative research often involves a kind of storytelling that brings the human experience to life in powerful ways. This lends itself particularly well to advocacy work, by the way. While there can be tensions between rigorous scientific research and political or advocacy efforts, qualitative methods are a useful way to illuminate aspects of society and social life that can affect readers or listeners at an emotional or visceral level. I’ve also found after teaching several dozen undergraduate courses that qualitative studies are much more digestible, engaging, and instructive for college students than quantitative studies. Overall, qualitative work tends to be more readable for non-professional audiences. *

*

*

Qualitative methods are a powerful toolkit that let you do a diverse array of social science research, from theory testing to theory generating, and produce work that is generalizable far beyond your unique study site or group of people. They excel at many things and, like all methods, they have pretty clear limitations (some things are best left to statisticians). In the next chapter, we discuss how to figure out, among this vast array of things qualitative methods do well, what you want your project to actually be about by figuring out your research question.

3

Picking Your Proj Identifying Your Research Question

Project 1. A potential new route or bouldering problem that is being attempted but has not yet seen a first ascent. 2. An established route or bouldering problem that an individual is repeatedly attempting to ascend over a period of time but which has not been successfully sent by that climber. Sometimes slang in the form proj. Wikipedia (2020), “Glossary of Climbing Terms”

The Project List A lot of pro-climbers, like Alex Honnold and Margo Hayes, keep a list of “projects” or routes (or combinations of routes) they want to send (climb). These lists might be, essentially, a climbing bucket list—really aspirational rather than practical—or a systematic plan of fairly realistic, if difficult, climbs. Other climbers, like Tommy Caldwell, don’t necessarily keep a written list, but they do set their sights on a project and then go about determining whether it is possible. Without seeing a pro-climber’s list, an observer can piece together some entries on these lists retrospectively by looking at the climber’s big accomplishments: For Alex Honnold, it included things like free soloing Moonlight Buttress and Half Dome (two big walls), completing Yosemite’s Triple Crown (sending Half Dome, El Capitan, and Mount Watkins—Yosemite’s three biggest walls) in under 24 hours (first with Tommy Caldwell, then on his own), and free soloing El Capitan. For Margo Hayes, it included La Rambla in Spain and Biographie in France—both 5.15a sport routes (some of the hardest routes in the world). For Tommy Caldwell, it included the first ascent (with Alex Honnold) of the multi- peak Fitz Traverse in Patagonia on the border of Chile and Argentina and then 35

36

Chapter 3

the first free ascent (with Kevin Jorgeson) of the Dawn Wall, El Capitan’s super smooth eastern face, both of which were multi-day climbs. These climbs represent some of Honnold’s, Hayes’s, and Caldwell’s major life accomplishments; of course, when they were starting out, their goals were more modest (e.g., Caldwell 2017; Honnold and Roberts 2016; Reel Rock 12: Break on Through 2017). Likewise, a lot of scholars, myself included, keep a list of projects we want to do at some point in the near or distant future. Some of these projects might be potentially discipline-changing, and others might be bite-sized projects that lead to a nice but modest paper; some are books and some are articles. Most of the projects on my list are really just nascent ideas—something I want to study but without a clear plan for how I will go about it. For example, on my list right now, I have an entry that just says “myths” and another that says “professionalization/bureaucratization.” (I mentally refer to every paper I have written with a one-or two-word label.) These entries are shorthand representations of distinct project ideas. I have only a vague idea about what data, methods, and literature I would use, let alone what my argument or contribution will be or what my research question should be. Some of the hardest intellectual work in a project is moving beyond that initial idea to a vision—that is, figuring out what the project is really about or what it is going to be. The first step in that process is identifying your research question. *

*

*

This chapter seeks to do four things: First, it will start by discussing the role of research questions in the larger research process. Second, and building on that first discussion, this chapter will dispel some misconceptions about research questions, especially what counts as a research question and why people disagree about this. Third, it will discuss strategies for coming up with a research question. Finally, it will address some of the secrets about research questions relating to challenges and opportunities that can arise, particularly when you are Dirtbagging about in the field.

Why Do I Need a Research Question? Having a research question seems to be a distinguishing feature of social science research, relative to the humanities. As social scientists generally draw on the scientific method, at least loosely, we have copied that formality of stating a research question—and doing this at the outset. Part of the justification

Picking Your Proj37

for this requirement, at least for empirical research, is this: If we don’t have a question, we risk the possibility of going out into the field thinking we know what we’ll find and looking only for proof of what we expect to be there. That’s not empirical research (it’s not systematic). So, we need a question, however provisional, before going into the field to remind ourselves that we are exploring, discovering, or investigating. (Near the end of the chapter, I’ll make some points about when you don’t need a research question, but that tends to come later in the research process.) There is another justification: How do you know what to collect data on if you don’t have a research question? You might collect data that don’t answer your question, so pick your question first. The problem is that this advice really applies only to certain types of research—research that falls under the “deductive” banner (again, deductive is the idea that you move from theory to data, or from big, general ideas to particular situations). If you set out to test a theory, you absolutely want to pick your data after you select your research question—and if you are testing a theory, the theory itself will supply a research question. For example, let’s say I want to test group threat theory in criminal sentencing—the idea that the larger a minority presence in some jurisdiction, the more threatening that minority group will seem to the majority group, and thus the more punitive sentencing agents (judges, juries, prosecutors) will be toward members of that minority group (relative to non- minorities) and maybe toward everyone else (relative to jurisdictions with smaller minority populations). Possible research questions are already embedded there: What is the relationship between minority population size and sentencing? What is the relationship between minority identity and sentencing? Are minorities sentenced more harshly in jurisdictions where there are more of them? Is sentencing harsher in jurisdictions with larger minority populations? These research questions also give us clues about what sort of data we want to collect—sentencing data in a variety of jurisdictions with differently sized minority populations. But what if you don’t know what theory you want to test? What if you’re more interested in making new theory or extending or tweaking existing theory? What if you’re not really interested in theory at all? What if you want to start studying something, then figure out what theories seem relevant, and then think about testing them, applying them, combining them, or adding to them? These are more inductive approaches in the sense that they move from the data (your case or the thing you want to study) outward to more general things like

38

Chapter 3

theory. For these types of orientations to a project, selecting your (official) research question first doesn’t make a whole lot of sense. In fact, if you are like me (i.e., a Dirtbagging social scientist), you will start with a pretty simple provisional research question: “I want to study a in b context.” That’s not even a question yet, so you massage it into a question and ask, “How does a function/operate/work in b context.” Maybe that’s not the best research question, but it will get you started. You might not come up with a more interesting question until you are in the field, where you will also refine your research question. Sometimes, you might come up with a better question that requires a somewhat different focus or shifting to an adjacent topic. You might even change your entire project in the field or come up with spin-off projects. But you’re not beholden to that one, early, provisional research question or project idea. To be honest, I’ve felt a lot of guilt about this approach over the years. I was sure this was the wrong way to go about it, even when it proved to be a pretty successful model for me and even when I found out that other people did it this way, too. I was taught that you start with a research question—or you start with a research interest—but you don’t really have a project until you identify the research question. And, again, if you are a more deductive-oriented scholar, or especially a quantitative scholar, there are some good reasons underlying this idea. There is an adage that you choose your research question first and then your data, but never choose your research question based on the data you have available. Choosing your research question based on your data might force you into asking suboptimal questions, especially if the data you have available is suboptimal. But this advice also assumes that you are constrained by your data, which implies a comparatively small or limited amount or type of data. That tends to be true for quantitative projects: If you have downloaded a dataset of survey data and have to come up with a research question based on the survey questions other researchers asked, you might be screwed because their survey questions don’t speak to the things you are actually interested in. But if you have a whole archive, a lot of different websites, or a really interesting neighborhood to study, there are bound to be many useful research questions in there. The available data still might push you in a lot of different directions, including some you don’t want to go into, but you will have much greater choice. The point is, this advice about choosing your research question before your

Picking Your Proj39

data, which is good advice for many projects, is not the best advice for people like me—the Dirtbaggers. There’s another reason why this is sound advice—for some. A lot of successful scholars choose their research question based on the literature and existing theory, rather than by a substantive research interest in a particular topic, place, population, or historical context. They thus first pick their question and then figure out what data they’ll use to answer it—they don’t start with an interesting dataset and then pick their question. For example, the award-winning sociologist of law and organizations Lauren Edelman didn’t start with an interest in organizational compliance with civil rights laws or with discrimination more generally (her main substantive topics), but with a much more literature-driven frustration: Both organizational scholars studying law and legal scholars studying organizations get some pretty basic points wrong. She then used the case of organizational compliance with civil rights laws to set org theorists and law scholars right. That’s the way you are supposed to do it: The locale or specifics of your study are just the example for your larger, more important theoretical contribution, the generalizable point that is useful to others in the larger field. By contrast, I always just wanted to study punishment. I was interested in studying the death penalty, or a particular prison, or the criminal justice system in early Massachusetts. These were inherently interesting things to me that I wanted to learn about; they were not more generalizable phenomena or trends that other people would necessarily care about. This meant that I always struggled to find ways to justify the project—that is, to come up with a way to explain why any resulting book or paper should be published. Essentially, I was doing it backwards from how you are “supposed” to do it. But again, it turns out I’m not the only one. Again and again, in conversations with colleagues, I learned that others do it this way, too. And here’s the thing: I like doing it my way.

Tommy Caldwell’s Vision One morning in Yosemite Valley, Tommy Caldwell looked up to see the sun’s early light hitting El Capitan’s east face. This aptly named Dawn Wall is an amazingly smooth wall of granite. It took Caldwell years to identify a route—a single line to follow in a sea of rock—and then another several years to actually climb it. Some people, Caldwell included, thought that it might not be possible to free climb such a smooth route—that is, to climb the rock itself rather than hammering nuts and pitons and holding on to loops connected to

40

Chapter 3

those pieces of aid. But he looked at that beautiful mountain and said, “I want to climb it” (or something like that). He didn’t know how exactly he would do it, but he figured it out over time and as he went. On his final and successful attempt to climb the Dawn Wall, he even had to improvise while climbing and follow a different route than he was expecting on one pitch. His pursuit of this project was a long, drawn-out affair full of uncertainty and change. When he succeeded, it was awesome—quite literally awesome. It was a moment that demonstrated human potential. Even President Obama congratulated him and his climbing partner, Kevin Jorgeson, for the accomplishment.

Maybe it is better to do it the other way—it’s faster, it’s less stressful, and it guarantees a more impactful project (because you know you are selecting a project of interest to a broad audience). But the other way is not the only way to reach the same result. It’s also not the way that makes sense for Dirtbaggers. Dirtbaggers are visionaries. We look at something and say, I want to send (i.e., successfully climb) that route. I don’t know how I’m going to do it, but I’m going to do it. I might have to improvise. I might have to change course. I might have to downclimb in the middle—basically backtrace my steps—so I can take a different course. But damn it, I’m going to find a way. Having this mindset as academic social scientists does make our lives a little harder, and a little more uncertain, but it also lets us do amazing things.

Competing Definitions of a Research Question This variation in how people approach their research projects, and thus advice about when and how you come up with your question, is actually directly related to how people define and talk about research questions. That’s because there are different definitions of what a research question is. As you can probably guess, I define research question fairly broadly: Any central question your research seeks to answer—the question that drives your research. Clearly, I’m really flexible about what counts as a research question. If you’re like me, you might start with an interest in some topic—this is sometimes called a “research interest.” (For example, my project list basically contains research interests—penal myths, the professionalization of prison administrators.) But you can turn any research interest into a research question pretty easily. “I want to study social control on the subway” becomes “What does social control look like on the subway?” Alternatively, “I want to explore

Picking Your Proj41

coping mechanisms people use in [name your favorite high-stress environment]” becomes “What coping mechanisms do people use in [high-stress environment]?” Boom. You’ve got a research question. Some other folks wouldn’t count these as research questions, though. While I prefer a broad definition, many scholars prefer a narrower definition. It’s important to know that these other definitions (or versions) exist for at least two reasons. First, you might find that one version works better for you, or for your current project, than the other—and you’re not obligated to stick with one definition of research question over the course of your research career. Second, it’s important to know how other people might interpret your work and thus what definitions and assumptions they will be working with: Particularly if you gravitate to the Dirtbagger side of the spectrum, then those folks who have a narrower definition of a research question (pretty much everyone who isn’t a Dirtbagger) might express some skepticism toward your work—you need to be ready for it, both in the sense of knowing that it’s coming and knowing how to respond. To take a fairly typical example of the mainstream camp, Kristin Luker defines research questions more narrowly in her book Salsa Dancing into the Social Sciences. For Luker, a research question is not just a question guiding your research, but a question that has to satisfy certain requirements. As she defines it, a research question requires both an explanans and explanandum—a thing to be explained and a thing that does the explaining, or more formally, a Y (dependent variable) and an X (independent variable) (Luker 2008, 52). Anything else is just a “research interest.” Her definition is certainly a fair characterization of research questions in quantitative work and in some of the more structured qualitative work. However, many ethnographies, histories, studies using open-ended interviews, heavily theoretical empirical studies, and other qualitative works do not begin with a two-variable research question, nor do they ever establish such a question. Instead, they begin with what Luker would call a research interest (but what I’m comfortable calling a research question). A lot of times, these are “why/how” questions, and consequently they don’t have both a Y and an X: Why/how did/does Y (i.e., whatever we care about) happen? What forces allowed/allows Y to happen? Why does Y keep happening? How does Y function in this context? What was the role of X (i.e., something interesting that might be influential) in this context? How did some group of people make sense of going through X process? For some of these questions, it’s kind of hard to

42

Chapter 3

decide whether the thing of interest is an X or a Y because we’re not set up to think that way. In fact, it’s pretty hard to fit these broader research questions that us Dirtbaggers gravitate toward into a two-variable specification without being a bit misleading to yourself and your readers. As an example, the research question in my first book was, “Why did Eastern State Penitentiary retain its unique system of long-term solitary confinement despite intense criticism from local and international penal reformers and prison administrators?” Stated more theoretically and broadly, “How are deviant organizations able to maintain their deviance despite intense pressures to conform to the norms of their field?” To transform my question into a research question in Luker’s meaning, I would have to say something like, “How do organizational administrators enable deviant organizations to maintain their deviance?” Here, ongoing organizational deviance is my Y (the outcome of interest), and administrators are my X (the possible explanation for the outcome). However, this rephrasing highlights a variable I didn’t know was important until halfway through data collection and narrows the focus unnecessarily—administrators were crucial, but they were part of a much larger story. Moreover, this rephrasing leaves out other X’s like administrators’ motivations for supporting their organizational deviance. A proper rephrasing would be something like, “Why did administrators continue to retain this highly criticized approach to incarceration?” But in that (more accurate) restatement, administrators’ retention is the Y, and I still don’t have my X. Just to be clear, this difficulty isn’t unique to research on historical settings. In fact, reading some of the best ethnographies, one would often be hard-pressed to find a research question that actually fits the more mainstream definition. Instead, these scholars set out to understand how a group of people “make sense” of a particular phenomenon, or how they navigate the particular challenges of their existence. In fact, a lot of studies—especially ethnographies and histories—don’t frame their main research question as a question, but as a description of what the study is about or what its main goals are, and then they follow up with more explicitly stated, but smaller questions. For example, in the preface to their ethnographic study of homeless people in Austin, Texas, David Snow and Leon Anderson (1993, ix–x) explain, This book reports on a cross-section of Americans who found themselves homeless. . . . It is a sociological case study of a subculture of street life among the unattached adults, mainly males, who lived in or passed through Austin, Texas, between the fall of 1984 and the summer of 1986. . . . [W]e focus our

Picking Your Proj43

attention primarily on the streets as it is experienced by the homeless, that is, on their strategies and struggles to subsist from one moment to the next, materially, socially, and psychologically.

Later, in their introduction, Snow and Anderson (1993, 6) list some smaller- scale or more specific questions about homeless Americans: “Who are these people? Where do they come from? What are their lives like? How do they manage to survive physically, socially, and psychologically in this netherworld of the streets that is so alien to most Americans? How do they manage to make sense of lives that strike most of us as waking nightmares?” Finally, they state the primary goal of their study: “Our goal is to provide a detailed description and analysis of street life as it was lived by the homeless in Austin, Texas, in the first half of the 1980s” (Snow and Anderson 1993, 7). At least in my read, none of these statements or questions would count as a research question when narrowly defined as containing two variables. But I do see these questions as slightly different, more-or-less-specific versions of the same research question: How is homelessness experienced by those going through it, specifically in 1980s Austin, Texas? Or stated even more simply, What is homelessness like? This is a fairly open-ended, complex question that one can take in a number of different directions. Thinking about this question in relation to two variables would be overly restrictive and probably result in a much less interesting book. Luker’s recipe for a two-variable research question certainly takes some of the stress and inefficiency out of qualitative work. Lining up a question about the relationship between two variables offers a much cleaner approach to your work in a lot of ways. But there are also problems with thinking the two- variable research question is the only legitimate type of research question. First, one of the most useful aspects of qualitative research is the ability to look for new variables, to find new mechanisms, and generally to be open to what we don’t know before going into the field (whatever our field may be—neighborhood, prison, office building, classroom, park, archive, online database of newspapers, internet forum). This characteristic is what makes qualitative research so creative and exciting. Second, normalizing a two-variable research question breeds unnecessary critique of exceptional works as failing to have a research question when, in fact, they did have a research question. Especially when teaching introductory qualitative methods, but also in other contexts, there have been so many times where I’ve heard both students and professors complain that a project doesn’t

44

Chapter 3

have a research question. In many cases, the project did have a research question, it just didn’t “count” because of this narrower definition of research question to which many people subscribe. Finally, this narrow approach to research questions makes those of us who use broader research questions feel bad, mistakenly believing our research is unrigorous, or of low-quality. And that’s just unacceptable. Where does this difference in definitions come from? A lot of it has to do with differences in our training—from the discipline or subfield in which we specialize to what methods we use. Ethnographers and interviewers, although using overlapping methods, can have very different approaches to research. If your subfield is heavily dominated by quantitative scholars, you might be prone to presenting your work in ways that preempt some of their skepticism; if your subfield is filled primarily with qualitative scholars, you might be less defensive in your approach. Sometimes, though, it just comes down to preferences, which have been molded by our advisors, teachers, and other exemplars to which we are exposed throughout our (ongoing) training. This difference in our training can also lead to very different approaches for coming up with a research question. Luker notes that many graduate students will seek out their advisor with a particular research interest (again, one variable in mind, not two), but not a research question, only to be turned away with the instruction to identify an explanatory variable of interest. Without that guidance, Luker argues, graduate students will go into the field noting everything and will suffer “The Damnation of the Ten Thousand Index Cards”—that is, having endless notes about what they see but no clear direction or means of wrangling those notes (Luker 2008, 19). They will ultimately fail (either giving up out of exhaustion or taking years to produce their first study). Her book provides guidance on how to move from a research interest (or what I see as a broad research question) to a research question (defined as having an X and a Y) and avoid that failure. In fact, ethnographers have produced well-honed methods that sound very similar to these thousand index cards; it is a viable strategy that Luker dismisses, and consequently she fails to provide useful guidance for those who want to use it. In fact, research questions play a pretty different role in ethnography, history, and some other types of qualitative work steeped in theory, to take some examples. One of the best guides to ethnography (originally published in 1971, although now unfortunately out of print after four editions), John Lofland,

Picking Your Proj45

David Snow, Leon Anderson, and Lyn H. Lofland’s Analyzing Social Settings, provides an interesting illustration. Their book is organized into three parts: gathering data, focusing data, and analyzing data. There is no chapter devoted to research questions per se; instead, in the section on focusing data, there are chapters on coming up with topics, asking questions of the data, and arousing interest in your topic—but nothing about coming up with a research question. Indeed, one won’t find an entry in the index for research question. I believe their approach to laying out their book derives from the ethnographer’s approach to research: They go to a site, see what they find, and then construct their project. The final research product ultimately comes from the data themselves, rather than forcing the data to answer some question they perhaps can’t answer. This is an inductive approach. I really like this approach—as a Dirtbagger, it speaks to me and is basically how I like to do things. It’s also more similar to how historians approach things. But it’s not just these distinct fields. In their book, Case Studies and Theory Development, political scientists Alexander George and Andrew Bennett (2005) take a somewhat similar approach to the ethnographers even though they fall more on the deductive side of the research spectrum. Like Lofland et al., George and Bennett do not use the term “research question” but instead “research objective.” A research project may have “one or more” of these research objectives, each relating to “an important research problem or ‘puzzle.”’ (We’ll talk more about puzzles below, but for now think of it as just an interesting quandary you want to explain. Your research question, even though George and Bennett don’t call it that, would be: What explains or accounts for this puzzle?) Reflecting their more deductive orientation, George and Bennett advise that the problem or puzzle should come from the literature or, more specifically, existing theory (a key difference from the ethnographer’s approach where you don’t necessarily need your problem to come from the literature). Whatever this problem from the literature, one should be able to engage in theory testing (of either “a well-established theory or competing theories”), establish scope restrictions on the theory (e.g., identify under what sets of conditions the theory is true or adequate), identify new variables, examine the phenomenon at a different level of analysis, or “move up or down the ladder of generality” (George and Bennett 2005, 74). As with Lofland et al., rather than one overarching research question imposed at the outset, several questions are asked of the data: Specifically, George and Bennett recommend “asking a set of standardized,

46

Chapter 3

general questions of each case, even in single-case studies” where the questions “reflect the research objective and theoretical focus of the inquiry” (2005, 69). The point is that there are different ways to treat the issue of a research question—how to come up with one, when to come up with one, and whether you even need to think about one. Whether you lean inductive or deductive, you might depart from the more mainstream advice about coming up with a narrowly defined research question at the outset. So, rather than emphasizing one right way, this chapter lays out a variety of approaches and tries to stay in a middle ground that lets us keep the Dirtbagging spirit, not feel bad about doing it, and prepare ourselves for the common pitfalls and annoying critiques that come with this approach.

Common Strategies for Selecting a Research Question I’ve always found that having a template or model to borrow, copy, or build from is really helpful. There’s no need for us to reinvent the wheel, and it makes our life so much easier to know that there are standard ways of doing things— several in fact—that we can choose from and copy. This is especially true in the case of research questions. I can think of four fairly common techniques for coming up with a research question, although I’m sure there are others. They vary in terms of how well respected they are in the mainstream—that is, which techniques are actually the ones advisors and books tell people to follow, and also how some people will react when you tell them your technique. But it also seems that different people gravitate more toward one technique over another—although they might use different techniques for different projects or at different times in their lives. But one key point I’m trying to get across is you don’t have to stick to one of them, or any of them. If you feel stuck, try out this list; if you don’t, feel free to use a different trick for coming up with a research question.

Mix and Match The first three techniques involve some variation of a mix-and-match game in which you mix and match a topic and some theoretical stuff. By “topic,” I mean a policy (three strikes laws), a population (homeless people), a group or organization (March of Dimes), or an organization type (hospitals), usually bounded implicitly or explicitly by a period (modernity, the nineteenth century, 1980–2000, the present moment—whatever that means to you), a place (the Sixth Street neighborhood, Southside Chicago, Kansas City, California,

Picking Your Proj47

Western Pakistan, France, the Global South), and maybe some additional characteristics (e.g., age and gender restrictions for the population, public/private or large/small for the organizations). By “some theoretical stuff,” I mean a particular variable, concept, mechanism, or process (race, gender, class, agency, the life course, social control, the social construction of deviance); a theory or theoretical framework (life-course criminology, the pains of imprisonment, social constructivism, feminist theory); a debate (nature v. nurture, agency v. structure, importation v. deprivation); or a literature (the sentencing disparities literature, the voter turnout literature, the punishment literature, the prisons literature, the law and society literature). The first technique is to start with some theoretical stuff—by which, again, I mean a literature, a debate, a theory, a framework, a concept—and then select a topic in which to apply or explore that theoretical stuff. When using this technique, you want to choose a topic that is a particularly good place in which to explore this framework—that is, you are choosing the topic based on the theoretical stuff. This is the way you are “supposed” to come up with a research question, according to some folks (especially those who prefer deductive approaches, but not just them). For example, let’s say I’m interested in understanding how organizations respond to legal regulations. This is my theoretical stuff, because it’s not tied to anything super specific—I’ve not specified a type of legal regulation or a type of organization, so we’re working at that general level at which theory tends to operate. There’s a lot of literature on this theoretical stuff written by legal scholars in law schools and organizational theorists in sociology departments and business schools. Now, let’s say I have realized that what these groups of scholars are saying contradicts each other or at least that they have different expectations about how organizations respond to law. That’s a pretty good position to be in: I’ve found an intervention into the literature I might be able to make, which is that I can get two literatures to talk to each other and think about how they might learn from each other. That’s a solid contribution. Now I need a topic. So, let’s say I choose large companies’ responses to the US Civil Rights Act that forbids employment discrimination on the basis of race, color, sex, and other protected statuses. Boom. I’ve got my theoretical stuff—how organizations respond to law—and my topic—compliance with the Civil Rights Act. (This roughly describes the strategy Lauren Edelman used with her research agenda in the late 1980s and early 1990s, which helped her land articles in top journals of her discipline and interdisciplinary fields,

48

Chapter 3

setting up her several-decade career trajectory.) This is a highly respected approach, and it is the standard way of pursuing mainstream research: Look first to the literature/theory, then find a good topic in which to pursue the literature/theory’s claims. The second technique is to start with a topic of interest and then find the theoretical stuff to go with it. This is my preferred technique, and I think this is the technique preferred by most Dirtbaggers. Basically, we start off with the thing we want to study: a place, phenomenon, population, historical period. But then we have to do the hard work of figuring out how to dress it up in such a way that people will pay attention to it. (We’ll come back to this practice of dressing things up in Chapter 4 when we talk more about writing and framing your research. But for now, think of it as dressing up your research by enrobing it in a nice theoretical outfit that lets it get invited to the party—if you aren’t dressed appropriately for the venue, you can’t get in.) So, for example, let’s say I want to study prisoner resistance—it’s just really interesting to me, I can’t explain why. But my study can’t just enumerate all the examples of prisoner resistance I find in a site of interest; it needs to have some sort of angle or question to go with these examples—an angle or question other scholars care about—and that’s often determined by the theoretical stuff. So, I go through and read articles or review my notes on articles I’ve already read about prisoner resistance, and I realize that prisoner agency keeps coming up: Resistance is seen as an expression of agency. I know from the larger sociology literature that agency is usually discussed in contrast to structure. There are common debates about the relative roles of agency and structure (or individuals and their social context) in determining certain outcomes. So, I’m going to look at the relative roles of agency and structure in cases of prisoner resistance. Boom. I’ve got my research question: It’s the thing I really want to study (prisoner resistance) plus the theoretical stuff (agency v. structure) that other people will care about— even people who don’t care about prisoners, prisons, and resistance. I want to pause for a moment to emphasize an important difference between the first and second strategies. You have more flexibility with the second strategy than with the first strategy. With the first strategy, you read the literature first, then choose your topic, then do your research, almost in lockstep. With the second strategy, you might choose your topic, then read the literature, and then collect your data. Alternatively, you might choose your topic, start to collect your data, and then read the literature (and perhaps keep collecting more data). You might go into the field saying I want to see what resistance is

Picking Your Proj49

like in m prison and then later add in the bit about agency and structure. So you end up with a lot more wiggle room with the second strategy than with the first. (We’ll see this dynamic continue to play out as we get into research design, data collection, and data analysis in later chapters.) The third technique is a slightly different variation of the second technique. It’s another one of my preferred techniques for coming up with a research question. Following this technique, you select a question that combines the topic and theoretical stuff. I think of it as basically one step, rather than two—you could split it up, but I think of these questions as appearing in their entirety, all at once, rather than in separate steps. With the third technique, your topic and theoretical stuff are combined because you decide you want to study a well-known concept. In sociology, that might be inequality or social control. In criminology, it might be sentencing disparities or the age-crime continuum. In law and society, it might be the gap between law on the books and law in action. In political science, it might be voter turnout. There are theories about all of these things, but they are big enough—by which I mean standard or well known—concepts that they are topics unto themselves. But, of course, you don’t just want to study the topic everywhere—you’re not going to study social control, sentencing disparities, or voter turnout in all times and all places—but in a particular context that you think is interesting.1 For example, I want to study social control on the subway: What does it look like? What are its features? What are its variations? I can come up with any number of subsidiary questions, but basically I’m curious about what social control is like on the subway. That’s my research question. (Why the subway? Who knows—it seems interesting to me. We’ll come back to that question in the next chapters.)

The Puzzle There is a fourth technique, but I hesitate to call it a separate technique because it could also be used with the first or second techniques: That is, you can either start with the theory stuff and then find a topic, or you can start with a topic and then find some theory stuff. In either case, this technique involves identifying a puzzle or a problem. The puzzle might be theoretical or empirical, but it’s ultimately something that makes you step back and say, “Huh? Why did that happen?” By a theoretical puzzle, I mean a puzzle that is not necessarily attached to a specific, real-life situation, but arising from theory or the academic literature.

50

Chapter 3

You might find a recurring trend in the literature that doesn’t quite make sense and doesn’t have a clear explanation yet. It’s not limited to a particular time or place—instead, it seems to be pretty general—but no one has really explained it or perhaps even acknowledged that it is puzzling. By an empirical puzzle, I mean a puzzle arising from some specific, real-life situation. It is something concrete—as in, it actually happened in (or over) a particular time or place—that you want to explain. It might be a statistical outlier (Why does the United States execute so many people relative to other first- world western nations?), an interesting historical episode (What caused the lynchings in 1850s San Francisco?), a unique case/person/organization (What explains the rise and fall of Berkeley’s school of criminology?), a notorious disaster (What factors caused the Attica prison riot?), and so on. A lot of us Dirtbaggers really prefer the empirical puzzle. For example, in the preface to their study of the homeless population in Austin, Texas, David Snow and Leon Anderson explain how they each came to study homelessness through separate but related puzzles (in addition to their own personal interests in the topic). The book project “originated largely out of situational circumstances and our particular life experiences rather than as the result of studied, conscious formulation” (Snow and Anderson 1993, x). One of the authors, who had repeatedly driven from Texas to California over the course of a decade, noticed that “the scenery changed noticeably” somewhere in the middle of that decade, as a growing number of “hitchhikers” and “vagabonds” became commonplace alongside the roads and at truck stops (Snow and Anderson 1993, x). The other author, who had a long-time interest in tramping about since late childhood, noticed a disconnect between the types of hobos and tramps he came across later in life compared to those he knew earlier: “[I]t became increasingly clear to him that their world was strikingly different from the romanticized transient communities of his past. This disjuncture became the source of a gnawing curiosity” (Snow and Anderson 1993, xi). In both cases, the authors wanted to know more about this new population of hitchhikers, vagrants, vagabonds, hobos, or tramps—where were they coming from and why did they seem to be categorically worse off than earlier waves of transients? There are some important variations depending on which type of puzzle you start with. If you start with an empirical puzzle, you need to add in some theory stuff (technique two); if you start with a theoretical puzzle, you need to add in some empirical limits—some context in which you are going to look (technique one). (That’s why I say this is a variation of techniques one and two.)

Picking Your Proj51

If you start with an empirical puzzle though, you might only add the theoretical stuff later after you start collecting your data (just like with technique two). Don’t worry about how to actually do that, since we’ll cover that in Chapter 4. One thing I hope is clear from these four strategies is that you really need to be familiar with the literature in your area (defined theoretically or topically or both). This is a defining feature of academic research, after all. But this also means that it can be harder to come up with solid research questions earlier in your career or even when starting out in a new subfield. The good news, though, is this is a solvable problem: If you feel stuck when trying to come up with a good research question, reading more of the literature or rereading a favorite set of articles or books can help trigger ideas. We’ll talk more about this in Chapter 4.

Some Dirty Little Secrets About Research Questions Your Research Question Might— Probably Will—C hange in the Field We put a lot of pressure on our research questions. In many cases, the research question determines the research design—essentially, your plan for how to collect and analyze data. If you wish to make a causal argument, anticipate heavy scrutiny on the match between your research question and research design. But what a lot of this very important—and, frequently, correctly placed—emphasis misses is the practical realities of doing research: Your research question might (probably will) change in the course of your research. Unfortunately, however, this part of the process often gets left out of our methods discussions. If you are lucky, you’ll simply refine the research question, making it sharper. But there are a lot of reasons why you might go in a completely different direction, jettisoning your initial question. A lot of people start off with a particular research question only to realize they have to change their question dramatically even though they had done everything right—matched their methods to their question, selected an excellent site, perhaps fielded a pilot study. And there are even more people who either were insufficiently advised at the outset or . . . were too headstrong to listen to their advisors’ concerns (ahem, me) who later had to change their research question. My dissertation was going to be on penal cycles—shifts from rehabilitation to not rehabilitation (e.g., deterrence, retribution, incapacitation)—but when I got into the “field” (started reviewing the archival data), I realized there was

52

Chapter 3

no significant change in the state I was using for my case study. The penal cycles I wanted to study were well established in the literature—at least, they seemed to be. I also had national-level statistical data showing interesting trends that were consistent with the descriptions in the literature. Moreover, I had actually done a pilot study (collecting and analyzing a small bit of relevant data) that suggested there was a change in the particular state I was studying. Basically, my dissertation proposal looked great. But once I really dove into the local-level data, the apparent trends all went away. Disaster. I was so embarrassed that my magnum opus wasn’t going to work out that I abandoned the project entirely. Funny thing, though. This was actually a really important finding, and I should have written it up. Rather than viewing all the ways in which my data didn’t match expectations as my own personal failing, I should have used it to expose a significant problem in how we had been conceptualizing penal change. In fact, a few years later, a book came out doing just that, and it revolutionized the field (Goodman et al. 2017). Had I written up my findings, I would have been a rockstar. But, at the time, I didn’t yet realize that such unexpected findings—of the kind that prevent you from even analyzing a particular phenomenon of interest—are important findings. So back to the drawing board. I spent several weeks, maybe months, trying to come up with a new research question after I’d already started collecting my data—the epic no-no I had been warned about for the reasons I’d been warned about. This also seemed like the worst case scenario because a different research question might require different data, which would slow my progress way down. In fact, it was a bit inefficient. I’d been doing a content analysis of a particular prison’s annual reports. For my original research question, I had been paying attention to how prison officials described the purpose of punishment. This proved less useful to my new research question: why this prison retained a unique, heavily criticized approach to incarceration. I ended up not using a lot of that data, but it wasn’t the end of the world. Because I had started off by open coding my documents (see Chapter 9), I had been coding other things, too, which turned out to be relevant to my new research question. There was a silver lining to changing my research question: I spent so long trying to come up with a new research question that I generated a bunch of other research interests and questions. They weren’t right for my dissertation project, but they did lead to published papers and are still keeping me occupied years later.

Picking Your Proj53

The point is, sometimes, you get into the field, and you realize your data can’t answer your question. Sometimes, you get into the field, and you can’t even get data to answer your question. Sometimes, you eventually decide that your question is not particularly good. It’s not the end of the world when this happens, although it can (and likely will) be painful. In fact, there are many good reasons to change your research question after you’ve started data collection. In his book about the policing of LA’s Skid Row, Down, Out, and Under Arrest, Forrest Stuart explains that he “did not set out to study policing.” Instead, he was planning “to study residents’ informal economic survival strategies” (Stuart 2016, 271). But his field research led him to study policing, Stuart says, because of “ethnography’s demand that researchers take their participants’ preoccupations, motivations, and resulting actions seriously” (Stuart 2016, 26–27). In a context with intense policing, it was almost impossible for Stuart to avoid the topic. Here’s another example. Anthropologist and socio-legal scholar Sally Merry examined the legal consciousness of people bringing their interpersonal problems to lower-level courts in two Massachusetts cities in Getting Justice and Getting Even. She described the process by which her research interest and questions shifted from mediation to the relationship between mediation and court proceedings: When I began, I was interested in the mediation process itself: who participated, what happened in the sessions, where these people came from, and how the process differed from the court. I wanted to know the extent to which mediation was linked to informal community social order. But as I studied mediation, I became aware not of differences vis-á-vis the court but of similarities to it. Moreover, I began to wonder how it was that the people I saw in mediation had gotten to court in the first place, not just how they had gotten from court to mediation. (Merry 1990, 19)

Merry also explained that her book builds on two distinct projects that she had conducted with separate coauthors, again illustrating the way in which qualitative projects can lead to a multiplicity of research questions, taking your work in new directions. There are other reasons why your research question might change. If you are studying a contemporary setting, a significant change might take place during your research. This change may be generative and give you a ground- level look at how some revolutionary change affected people’s lives. When I

54

Chapter 3

was in grad school, the Occupy Wall Street protest took place. All around the United States, scholars had been studying settings that perceptibly changed as people began “occupying” spaces as part of this broader movement. While they could continue to focus on whatever their original question was, this change provided the opportunity to ask new (possibly more interesting) questions. Other things might also happen. You might lose access to your site. Maybe the program you are studying loses its funding and closes suddenly. (At least if you have some lead time before closure, you could examine how funding problems and closure affected daily practices, staff interactions, staff–client relations, etc.) Your archive may disappear. In her book about the Attica prison riot, Heather Ann Thompson mentions that some of the collections she had been working with were gone the next time she tried to use them (Thompson 2016, xvi). Most recently, a lot of qualitative scholars had to leave the field because of the coronavirus epidemic. Obviously, you have to adjust to new circumstances and that means likely changing your research question (and possibly your research design). Things can change closer to home, too. Your primary advisor may pass away, and now you have a new advisor who is not an expert in your topic or theory; now, you are encouraged to pursue a similar but different theoretical framework. More positively, a new faculty member may join the staff and introduce you to a new theoretical framework that revolutionizes how you see your research. The point is, shit happens: Your research question might change, either out of necessity or opportunity, for better or worse. The best thing you can do is salvage what you can of the work you’ve already done—perhaps write a smaller article or plan to do more data collection from a different site later on to supplement what you have. We’ll talk more about how it’s possible to change your question in the field when we talk about data collection. But for now, know that it happens and that it’s going to be okay. Also know that changing your research question doesn’t mean you did something wrong. You probably did it right: Holding on to your original research question, even one selected from the literature and that motivated your research design (what people say is the right way), might actually be the wrong call after experiencing these types of changes. So, if you do change your research question after a major change, good job adapting!

Picking Your Proj55

It’s Okay If You Don’t Have a Research Question Something that often gets left out of methods classes and texts is that you don’t actually need to have a research question. In fact, some of my favorite articles do not have a research question. Scholars can make excellent interventions into the literature without a research question. “Blasphemy!” I hear someone say. Yes, I will add important caveats to this statement, but for now, I state it boldly because it is a point that is too often forgotten. For example, the highly respected NSF report on qualitative research includes “Articulates a clear research question” among its shared standards across four disciplines for evaluating qualitative research (Lamont and White 2005). But, once again, the expectation that research must have a research question is an importation from the quantitative approach and from the scientific method more generally. One cannot conduct good quantitative research without a research question, but one can conduct good qualitative research without one. How is this possible? If you are a deductive scholar, interested in theory testing or in making precise claims about the causal effects of something in particular, it’s probably not possible. If you are just starting off with your research, particularly if you are working with a vulnerable population—for example, children, incarcerated people, or Indigenous communities—it also might not be a great idea to set out without a research question, even of the broad type that I’ve defined. Or, if you are a grad student about to go on the market, given current norms in most social science disciplines (at least in the US), presenting a job talk without a research question really won’t go over well. That’s because stating your research question clearly, locating it in the literature (showing how it was motivated by or builds on prior work and how it will be important for others down the road), and connecting it to your research design is one of the key points of evaluation for your talk. So there are actually a lot of contexts in which it’s a bad idea to not have a research question. But if you’re already knee-deep in your data collection, it’s totally possible to start another project without a research question. Here’s one example. (To my mind, this scenario is one of the beauties of qualitative research.) You can come up with a spin-off project that doesn’t have an explicit research question but that still makes an important intervention in the literature. Let’s say you have a broad research question driving your work, and you’ve gone into the field. But at some point, as you are collecting or analyzing your

56

Chapter 3

data, you realize that your data are showing you something that (a) completely contradicts the literature and (b) has little to do directly with your research question. In the course of collecting lots and lots of data for your project— because that’s how we roll in the Dirtbagging world—you have probably collected enough data for several smaller projects on more specific topics. Yes, you definitely want to make sure that you have not systematically overlooked relevant information to this new interest or topic because it wasn’t your focus. But let’s assume you haven’t been systematically excluding data because you did what a good Dirtbagging qualitative researcher does: You recorded everything! (You even go back and re-analyze your data just to be sure.) This situation allows you to intervene into the literature and say, “Hey guys, we’ve been doing it wrong,” or “We’ve been overlooking this other thing,” or “We’re calling this stuff X, but it’s actually Z.” You didn’t set out to study this thing, but you noticed it in your data. Basically, you come up with the important point after looking at the data but before you even thought to ask the question! There are a number of excellent articles where I suspect the authors came up with their intervention in this way, but this is not the sort of thing people usually put in writing, so I’ll use my work as an example. In the course of doing my dissertation research, I ended up with a lot of data on prisoner (mis)behavior. In fact, I had made a point of including all examples of prisoner (mis) behavior—and the (mis)behavior of guards, administrators, and reformers—in my search for some answer to my main research question: why my unique prison persisted in its unique approach to incarceration. But prisoner (mis)behavior was not really part of my research question; it was just one of the factors I was coming across and thought I should pay attention to. With all this data, though, I really wanted to do a side project on resistance to the prison regime and maybe assemble a typology of the various ways prisoners, guards, and others resisted the prison’s rules and norms. I’d long found the resistance literature fascinating, and I enjoyed flagging examples of rule violations as “resistance.” But in trying to write some memos about this material, I got to the point where I realized I had no data on people’s motivations. Did they see their own acts as resistance, or was this just my label? When I went back to the literature, I realized this problem wasn’t unique to my research or even to archival data, but instead it was ubiquitous—even with people working with interview and ethnographic data where they could have gotten data on prisoners’ motivations. We were all just assuming that actions that broke the rules must be acts of resistance. This struck me as highly problematic methodologically

Picking Your Proj57

in a way that also had implications for our theoretical understanding of resistance, not to mention the normative issues it raised as well. I wrote up the problems with the literature, used some examples from my work, and made my case. It was an empirically grounded theoretical article (Rubin 2015b). In many ways, there was no research question driving this spin-off project. There was my original research question about the prison’s exceptional retention of its criticized approach to incarceration that drove my data collection, but my “resistance” project was ultimately unrelated to that question, which does not appear in the resulting article. We could say that this article was driven by a broad question about the nature of prisoner resistance, but even that question only emerged after I collected the data; in that sense the broad question wasn’t driving the project, at least not the whole time. There was my question about whether we were imposing the label without empirical evidence, but that question was less a research question and more a question I asked as part of the data analysis. More importantly, it wasn’t a question I actually asked in that article (although perhaps I asked it implicitly).2 And, just to be clear, producing a research article that doesn’t have a research question is something other people have done—and they are successful and their articles are popular. For example, one of my favorite articles is that 2010 Law & Social Inquiry article by Katherine Beckett and Steve Herbert (see Chapter 2). Recall that their article is about exclusion orders in Seattle, and how these exclusion orders, which are officially deemed administrative or civil orders rather than criminal punishments, resemble a classic punishment. The article is really creative because it pushes back on narrow definitions of punishment and reminds us that many civil or administrative behaviors can essentially act like punishment—something scholars studying immigration are also talking about these days. This important and prescient article does not state a research question. Instead, like my resistance article, theirs promotes an analytical framework. As they explain, “We use this article to argue for greater recognition of legally imposed spatial exclusion—banishment—as a (re)emerging and consequential social control practice” (Beckett and Herbert 2010, 1). If forced to identify a research question in that article, we could come up with one that is stated implicitly: What are the consequences of new urban social control techniques? What are “the effects of banishment on those subjected to it” (Beckett and Herbert 2010, 18)? Or, what are the “implications” of these practices? But that article’s

58

Chapter 3

central intervention really answers a different question: How should/can we (as scholars or even as citizens) understand or make sense of these practices? This approach, whereby new insights can arise even if they do not relate directly to your original research interest or research question, is part of what is great about qualitative methods. Unlike research for quantitative projects, you do not have to close your eyes to other interesting research questions or projects. It can be overwhelming, and sometimes you might want to batten down the hatches, flag the data in a memo for yourself to come back to later, and otherwise ignore it for the time being. (This advice is best when you really need to finish your thesis or dissertation, some other deadline is approaching, or if you are just getting overwhelmed and want to focus on the project in front of you.) But you have the freedom to choose whether to do that or not—and the freedom to come back to it later. *

*

*

As this chapter hopefully makes clear, there is a lot of flexibility in how one comes up with a research question, at what stage in the research process you come up with your (final) research question, and ultimately what counts as a research question. Don’t let people shame you for choosing your research question to fit your data: A lot of us do it that way, and guess what? It leads to more exciting projects than whatever we set out to study in the first place. Don’t let people give you a hard time because your research question doesn’t specify the relationship between two variables and is therefore more “properly” considered a research interest: Tell them to go read some ethnography and get back to you.

4

On Belay Connecting Your Work to an Anchor

Bel ay To protect a roped climber from falling by controlling the movement of the rope. This usually involves the use of a belay device. A belay can also be achieved using a Munter hitch, a hip belay, or by passing the rope around a rock or tree to increase friction. Wikipedia (2020), “Glossary of Climbing Terms”

Anchor The two bolts, usually equipped with chains or fixed lowering gear, at the top of a route. The anchor is where the climb ends, and the goal in sport climbing is to reach the anchor without falling. Evening Sends (2020), “Climbing Dictionary”

Academic Researchers Shouldn’t Free Solo Currently, the type of rock climbing that is probably the most familiar to most people—at least in the sense that they have seen it or heard about it—is free soloing. Free soloing means climbing without a rope. If you fall, you will hit the ground and very likely die—or, on lower climbs, get grievously injured. This style of climbing is so well known because people rock climbing in movies seem to always be free soloing—the first five minutes of Mission: Impossible 2, an end- of-the-movie scene in the remake of Point Break, or even a brief clip in the fifth Twilight movie. (Seriously, it’s the rarest form of climbing, and yet it seems to be the most common version in Hollywood movies.) This type of climbing is even better known because the documentary (aptly named Free Solo) about the world’s best free soloist, Alex Honnold, won an Oscar in 2019. 59

60

Chapter 4

But most climbers don’t free solo. They free climb. Although it sounds similar, it’s really different. The big difference is they use ropes to catch themselves when they fall. Climbers refer to this as being “on belay.” Here’s how it works if you want to try “top roping,” one of the safer types of climbing. You go to a crag, hike to the top of a cliff face, and set an anchor—usually two big carabiners placed at the very edge of the cliff but connected by ropes to a tree or stake in the ground or something really, really sturdy—to run the rope through as it dangles off the side of the cliff face. When you get back down to the base of the cliff, you connect your harness to the rope, and your climbing buddy attaches the rope to a belay device (metal tool thing) attached to their harness. You start climbing. Let’s say 15 feet up, you slip and fall. You don’t hit the ground— because you are connected to that rope that runs up to your anchor and down to your climbing buddy’s belay device that’s helping them hold on to the rope. Instead, you just hang out in midair. Now you also see why you want a sturdy anchor: As long as the anchor holds (and your belayer is paying attention), you won’t fall to the ground. Consequently, except for super talented climbers like Alex Honnold, who is really good at not falling, it’s a terribly bad idea for most climbers to attempt to free solo. I tend to think about academic research in this way. Early on in graduate school, I really wanted to just write about interesting things. I wrote several “articles” in my first few years where I described interesting trends. Some of them had snappy titles like, “The Origins of Long Prison Sentences.” I posted descriptions of them online (which people were just starting to do) and, once, a really famous criminologist emailed me for a copy of the paper because it sounded so interesting. Unfortunately, these papers were not very good. It wasn’t so much that my research wasn’t good (although at that stage, it certainly wasn’t great); the problem was, as a nascent Dirtbagger, I was purely driven by an empirical interest and, therefore, so were my articles. Clearly, some people were interested in these topics, too. But I had done nothing to make the case for why my research was interesting. Nor had I done anything in these articles to relate my work to what other people were doing (except maybe two or three famous scholars whose work I’d read, but which wasn’t quite on point). Consequently, these weren’t really articles—they were more like essays or perhaps even memos. I tried submitting them for publication only to get rejected repeatedly. I also hadn’t shown them to any of my advisors so no one was able to pull me aside and say, “Ashley, you are doing it wrong. Here’s what you need to do to get these published.”

On Belay61

Basically, I was trying to free solo when I really should have been roped up and on belay. Consequently, I failed repeatedly—and certainly if I’d kept it up, I would have killed my career. Part of the problem was I didn’t know there was a difference between free soloing and free climbing, academically speaking. During my undergraduate education, it wasn’t super clear to me that scholars were talking to one another, not just writing about interesting stuff. I would read a history monograph on the American Revolution much like you might read a textbook: It wasn’t that this was a particular author’s take, I thought, or that they were making an important intervention into an existing literature about the American Revolution; rather, I thought this is just how things happened. (I was not a particularly critical reader until late in grad school.) Additionally, I had a bad habit of not reading the footnotes or endnotes in books, where a lot of the conversation was taking place. This wasn’t just the case for history books. Some of the best or most classic books in my fields of social control and punishment studies—Erving Goffman’s Stigma (2009 [1963]) or David Garland’s Culture of Control (2001)—mostly engaged with the existing literature not in the main text that I was reading but in their notes that I was skipping. (This is less true for articles, but I wasn’t reading so many articles at the time.) From my vantage point, it really seemed like people just wrote about inherently interesting things—which was, after all, why I was reading these books. Likewise, I believed that as long as I had an interesting topic, I was good to go. Instead, I was doing the academic version of free solo, when I wasn’t even that good of a climber. The truth of the matter is, academics don’t really like it when people free solo.1 It’s really bad form to omit talking about the other people who are doing or have done research in your area. Partly, I mean we need to cite their work, but I also mean we need to respond to it—agree or disagree, clarify or extend. It’s also really bad form to talk about your research in a way that does not make it understandable to other academics. (This is, in part, why some academics can be fairly dismissive of books written for a “popular” audience—meaning, for normal people—or of books written by journalists even when they cover some of the same topics that we study.) Basically, you can’t just tell an interesting story, no matter how interesting it is. (Some academics will actually derisively call such work “journalism,” which is, apparently, intended as a grave insult.) You have to explain to your readers what your story is really about in terms they care about. This means using

62

Chapter 4

certain terminology, referencing debates in the literature, and citing relevant works—that is, connecting your work to something else. *

*

*

This chapter focuses on how to rope in and set an anchor so you don’t fall off the wall before you’ve even gotten to the crux of the climb. Even if you are a pure Dirtbagger and you are driven solely by an empirical interest, there are ways to make your research relevant by connecting it to an anchor. Importantly, there is a lot of advice out there on how to do this. Unfortunately, as someone who has long struggled with this problem, I’ve found a lot of this advice—particularly the more clichéd versions of it—to be unhelpful. So this chapter discusses some of the key ways in which people tend to evaluate research—not so much in its nitty gritty details of research design and analysis, but in terms of whether your entire project is worthwhile. I maintain that you can pretty much make any project worthwhile, but you have to be able to do certain things to convince people of your project’s worth. If you can’t do those things, then maybe it’s not actually a good project.

A Note on Timing In a perfect world, you shouldn’t have to worry about this stuff until you are actually drafting an article or a book chapter. Unfortunately, pretty much even before you start to collect your data, people will start bombarding you with questions—what is this a case of? so what?—sometimes rather annoyingly, especially if you are still trying to figure this out yourself. Particularly for Dirtbaggers, these are questions you might not be able to answer until you have collected all your data and are working on analyzing it. Additionally, if you are at all like me, some of these questions won’t even make sense at first. And when people ask you questions you aren’t ready or able to answer, it can be rather anxiety inducing. Part of the problem is a disconnect between how conventional scholars approach research and how you might approach research: As a generally curious Dirtbagger, you might be doing your research in a different order than others expect. People might ask these questions without realizing that it’s actually the wrong time to ask you. In other cases, they might be right to ask you these questions because, for example, you need to submit a grant proposal to complete your data collection, and these are the types of questions funding agencies will be asking. It can also be helpful to keep these questions in mind, even if you don’t want to answer them yet.

On Belay63

The point of this chapter, then, is not to force you to answer these questions prematurely, but to explain the underlying issues behind these questions so you can handle these situations—and start to think about the answers. Understanding those issues is particularly helpful if you feel like you should be able to answer these questions, but you don’t know how to go about it. Until you are ready to answer these questions, you can keep this chapter in the back of your mind.

What Makes a Good Project: The Theory–Policy Matrix In Chapter 3, we talked about strategies for coming up with research questions and how these questions can change. Certainly, research questions are not created equal. Most scholars will tell you that a good research question is theoretically informed and empirically answerable. The thing is, if you are a true Dirtbagger at heart, you aren’t driven by theory; for you, theory is an afterthought—it’s what you add in when you are knee-deep in your data as you are writing memos and trying to make sense of your coding categories (tricks we’ll discuss in Chapter 9). That’s the stage where you link up your empirical study with theory. There are also some folks who don’t like theory at all. Here’s the really interesting thing: Much of the advice about what makes a good research question ignores that there are a lot of successful people who do “atheoretical work”—meaning work that doesn’t use theory. It’s still tied into the larger literature, but it just has no need for theory. How do you get away with that? Not everyone can—I can’t, for example. It really depends on your project. To figure out if you are one of the lucky ones, use what I call the Theory–Policy Matrix (Figure 4.1). According to the conventional wisdom for evaluating research, the gold standard is a project that is a theoretically motivated empirical study of a topic that is either sexy or relevant to policy. By “theoretically motivated,” I mean it pushes the academic literature forward not just empirically but also by extending or creating new theory—I’ll say more about what counts as theory in a bit. This concern with theoretical motivation is connected to some extent with the general social science preference for following (or trying to follow) the scientific method as the preferred model. By “sexy,” I mean it is genuinely interesting to a lot of people—think sex, drugs, and rock and roll. Or it is “policy relevant,” meaning it is of interest at a high level because it can have implications for official policy, especially a policy that is particularly salient right now. If the gold standard requires that you check both of these boxes—sexy/ policy-relevant topic and theoretical—a project for which you can’t check

64

Chapter 4

Also Great!

Gold Standard

No-Go Zone

Also Great!

Sexy/ Policy-Relevant Topic

Theoretically Motivated Project Figure 4.1 How We Judge Research Questions. The most successful projects combine empirical appeal (sexy or policy-relevant topic) with theoretical appeal. However, projects can also be successful if they are either empirically or theoretically interesting. Projects that are not interesting along either dimension are generally not successful.

either box is very likely to fail. You cannot be successful if you do research on an obscure, boring topic that no one cares about and that has no bearing on existing theories. Here’s the part that often gets left out, though. A project that does only one of these things—either it covers a sexy/policy-relevant topic or it is highly theoretically motivated—can also be quite successful, and that’s something too many people overlook. The part of this equation that is surprising to people, and that departs from the standard advice, is that you can be successful without having a theoretically motivated study. In fact, there are plenty of examples of successful research (and researchers) from both qualitative and quantitative (and mixed-methods) traditions that is sexy or policy-relevant research and that is not very theoretical. For example, sociologist Devah Pager (2003) produced an audit study that revealed the extensive discrimination during job searches for people with a criminal record, and for Black people—and especially Black people with a criminal

On Belay65

record. Her study is rather famous because it addressed an extremely important issue—racial discrimination in employment contexts—with a nice twist: the growing problem of criminal records. Very policy relevant. Additionally, she used a strong research design—we all wish we came up with it. However, this was not the most theoretically motivated study; while we have theories about under what conditions race matters and in what ways it matters, her study was more about looking at how much it matters in the employment context and introducing a new variable: the role of criminal records. It is a super famous, important study—and a great example of a successful but not very theoretical study. Another scholar, Keramet Reiter, has produced a number of qualitative, quantitative, and mixed-methods studies of supermax prisons, particularly looking at their rise in the 1980s and later, how litigation has shaped their refinement over the years, and what life is like in a supermax—a facility that keeps prisoners in solitary confinement for about 23 hours a day for long periods of time (e.g., Reiter 2012a, 2012b, 2016). Each of her studies interacts with the existing literatures on punishment, penal change, and prison litigation, but most of them are not particularly theoretically motivated studies. Basically, she is interested in the empirical phenomenon of the supermax and addressing all of its many angles. Because supermaxes are extremely important in the current context—a growing number of people are spending time in solitary confinement for about 23 hours a day for days, weeks, months, and years on end— her studies are likewise important. They don’t really need theory: It’s important enough to describe these important phenomena. I once went to an Author Meets Critic (AMC) panel for a policy-relevant, but not super-theoretical book at a national conference in one of my fields and was really annoyed by what I heard. An AMC panel is usually organized around a particular book and then three to five “critics” (or “readers” in the nicer-sounding version) respond to the book—what it made them think about, how it changed the way they do their research, and any criticisms of the book. At the AMC panel in question, one of the critics turned to the author and said he was really disappointed by how little theory was in the book. This comment really pissed me off, on behalf of the author (whom I don’t know personally), because the author had been clear in the beginning of the panel about what her goals were. There was an important, policy-relevant problem out there, and she wanted to better understand what the operating forces were. She was not interested in theory development, but in increasing our empirical knowledge

66

Chapter 4

about this problem in part to help change policy. That is a perfectly acceptable and successful model for people to use, but not everyone sees it that way—the critic sure didn’t. But before you run out to do your study, gleefully thinking about how you can jettison theory from your work (let’s be honest: a lot of people don’t like theory), there are some important caveats. First, in placing your work in the Theory–Policy Matrix, you have to be really honest with yourself about how exciting your topic is. The sexy/policy-relevant measure is not based on how exciting you find your own work; otherwise we’d all be safe. It’s what other people think that really matters—for publication and presentation purposes, anyway. People tend to have some biases about what counts as interesting. For example, historical work, generally speaking, is not super sexy. There are some extremely good historical works out there, and they rarely get cited. I won’t name names, but I’m disappointed on behalf of a lot of folks when I read a fantastic historical book and then see their book hasn’t been cited more than 100 times in more than a decade, whereas some new (non-history) books come out and meet that measure in a few years. As someone who also tends to do historical work, it gives me pause: It’s a good reminder that my work will never be sexy enough (and certainly won’t be policy relevant enough) to enable me to jettison theory. Bottom line: If you know your work isn’t policy relevant, and you aren’t sure it’s sexy, add in theory to be safe. Certainly there are exceptions. Sometimes you luck out because something happens that makes your research suddenly important—because whatever is considered policy relevant or sexy is determined by fads and fashions, which change over time. For example, for a long time, people in the United States who studied Russia and the Russian language were seen as hanging onto an older decade, doing research that was no longer useful given then-current geopolitics; but once US–Russia relations started to disintegrate again, those folks were in demand. This was ochen horosho (very good) for those of us who speak Roosky yaszeek. Overall, though, having some big event change the relevance of your topic is a bit like having lightning strike, so I wouldn’t count on it. Now, for folks who are studying the sexy/policy-relevant topic, there is an important consideration, which is how new your topic is. If you are writing the millionth study of a particular topic, the bar is set pretty high—you need a new angle. (It will also impact the bar for evaluating your methods, which we’ll discuss in the next chapter.) A basic research question, such as one that lets you simply document the frequency of a topic, is not going to be sufficient. If,

On Belay67

however, you are one of the first people to do so, then you’re golden—especially if you give it a name. I’ll use some examples from the quantitative and mixed-methods world again. For example, if you are the first person to point out in the late 1990s that incarceration rates have really gone up, and that they are at historically and internationally unprecedented levels, and that they are really overrepresenting certain groups of people (especially young Black men from urban areas), that’s pretty big. I don’t know if he actually coined the term, but David Garland (2001) is generally the go-to citation when people talk about mass imprisonment as he was certainly one of the earliest scholars to define it. More recently, punishment scholars have turned to “mass probation” (Phelps 2017) or (referring to both probation and parole) “mass supervision” (McNeill 2018). Even these studies, however, do more than simply document the new phenomenon and give it a name: They also analyze it theoretically and connect it to other developments. In general, the less novel your research topic, even if it is really sexy and/or policy relevant, the more work you need to do on the theory side of things.

What Do I Mean by Theory? I’ve been talking a lot about theory and “the literature,” so it’s time to also clarify those terms. We often refer to the literature as that body of scholarship2 that someone in your field is expected to know because it’s relevant to your research. In what way is it relevant? It might be the scholarship that everyone like you reads or finds important; here, “like you” might mean in your discipline (a sociologist, a political scientist, a criminologist) or in your subfield (someone interested in the sociology of inequality, voting behavior, or life-course criminology). What is relevant also might be more empirically determined—what do we know about your particular topic? In this case, it might be a fairly broad range of literature that spreads across different fields and disciplines. I’m interested in prisons, so I read works about prisons by historians, sociologists, criminologists, political scientists, psychologists, and geographers—as well as works by currently or formerly incarcerated people, activists, and journalists. When you write articles or books, you need to be sure you address this larger literature—whether that’s the scholarship that everyone in your discipline reads (or recognizes is important) or the scholarship on your particular topic. But doing so does not necessarily mean you have engaged with theory. People have a lot of different ideas about how to define theory or what counts as theory. I think of theory as generalizable statements that are expected

68

Chapter 4

to be true in a variety of different contexts. (I should note that historians have a different understanding of theory, but we’re focusing on social science, so I’m going to skip that one.) Basically, theories are ways of helping us make sense of the world, but usually in a somewhat narrow way. Indeed, we can go a step further, as do some people (e.g., Karl Popper), and say theories are not just generalizable statements but generalizable and falsifiable statements—meaning, we can go out and test whether these statements are actually true or false in a variety of contexts and thus evaluate the strength of the theory. So, for example, in the sentencing disparities literature, some scholars test group threat theory, which suggests that minority groups will receive longer prison sentences than their majority counterparts in jurisdictions that have larger minority populations (up to about 40% when they are usually no longer a minority group). If I go out and find that Black people get shorter sentences, on average, in counties with a 35% Black population than they do in counties with about a 12% Black population, I have falsified the theory: I have shown it isn’t true, at least in some context. Another way of thinking about this is theories of this type (i.e., falsifiable) go hand in hand with prediction: We want to be able to say, with some amount of certainty, that under X conditions, Y will happen. The problem with the requirement of defining theory as containing falsifiable statements is that a lot of good theoretical work doesn’t include falsifiable statements. This tends to be especially true of critical, post-modernist (or late- twentieth century) theories. More generally, it’s also true of theories interested in questions about how we make sense of the world in a much broader way. These theories might contain statements that call our attention to particular variables or processes that are important—maybe they aren’t important in all times and all places, but they will improve our ability to understand social life. For example, a lot of feminist theories call our attention to the importance of gender in many different contexts. The theories aren’t saying that sexism is always a problem or that gender always affects outcomes in a particular way; instead, they are saying we should look at how gender is playing a role in a particular context. They aren’t really predictive theories other than to say gender will probably be important in some way, and we just need to figure out what that way looks like here. People sometimes refer to these types of theories as using a particular lens. So, in my mind, I envision these types of theories as different types of glasses— glasses with colored lenses, 3D glasses, or even night-vision goggles. When you put them on, you see the world in a different way. Stuff that was always there

On Belay69

is now more visible, and you can see that it’s doing something you didn’t know was happening before. Since the movie The Matrix was really popular when I was growing up, I also describe this as akin to seeing the Matrix—once you take the red pill, like Neo, you have a bit of a rude awakening and realize that social life doesn’t work the way you thought it did, and now you see certain social patterns everywhere—and you can’t turn it off.3 Standard lenses in a lot of social sciences include race and ethnicity, gender, class, culture, economic or market forces, politics, institutions, agency v. structure, nature v. nurture, legitimacy, and power (which can often touch on each of the preceding categories). Using a different lens can help us reveal different aspects of a given setting or social phenomenon. For example, when Malcolm Feeley examined how a New Haven criminal court processed cases, especially low-level misdemeanor cases, he realized that the processing itself resembled punishment. Even the people who were never convicted of a crime, Feeley found, still lost money, stayed in jail, and sometimes lost their jobs because of their court case (Feeley 1979). While inequality, mostly class-based inequality, was a big theme of his work, he did not pay much attention to race and gender as specific foci in his study. Since then, however, his study has been updated and expanded by scholars studying the Chicago criminal court and misdemeanor processing in New York City; these newer studies reveal the punitive dimensions of court processing, but they also uncover the racialized and gendered ways in which these dimensions play out (Gonzalez Van Cleve 2016; Kohler-Hausmann 2018). They look at similar empirical phenomena (albeit in different settings), but with different theoretical lenses. When describing these lens-based types of theory, to avoid confusion for folks who define theory narrowly, I sometimes use the phrase “theoretical framework” instead of “theory.” Theoretical frameworks are tools that help you organize or arrange the world in a way that makes sense but does not necessarily include firm (e.g., falsifiable) statements about how the world works. This type of theory tends to be particularly common for qualitative methods (although it’s certainly not the only type of theory that comes up). It’s still theoretical, as in related to theory, even if we aren’t making or testing falsifiable statements.

Different Uses of Theory The other thing that sometimes leads to some confusion is how, exactly, are you going to use theory. In previous chapters, I’ve already mentioned two standard uses: theory testing (applying existing theory to your case to see if the theory is

70

Chapter 4

accurate or not) and theory generating (making new theory). But these aren’t the only or even the most common approaches in qualitative methods. Additionally, both theory generating and theory testing actually have different variations that get lost in this binary. Very helpfully, Alexander George and Andrew Bennett, drawing on earlier scholars Arend Lijphart and Harry Eckstein, identify the various ways people use theory in case studies (although we can extend their insights to other types of studies because, as I keep arguing, pretty much all studies are case studies). For the most part, they are oriented toward existing theory. Here are the types of case studies George and Bennett (2005, 75–76) enumerate that I think are the most interesting: • “Atheoretical” or “configurative idiographic case”: This is a purely descriptive study in which you are describing some case of interest; it’s not using theory at all, but it can be used later by those who want to make new theory. (This is one of the options I laid out above—remember, to get away with this, your topic really has to be sexy or policy relevant.) • “Disciplined configurative case”: This is a study that applies theory to explain a particular case or topic. It’s not testing the theory, but rather assuming the theory is correct and letting it give you insights about your case or topic. (This might include using theory as a lens or framework.) • “Heuristic”: This is a study that identifies new variables, hypotheses, mechanisms, or paths. For example, a theory might say X causes Y through a, b, and c mechanisms, but this study points out that d is another important mechanism. This is a type of theory generating study, but by extending existing theory. • “Theory testing”: This is a study that uses a particular case to see if an existing theory is accurate. An important variation is using a case not necessarily to see if the entire theory is accurate or not, but to see if it is accurate under some additional set of circumstances, which helps to further refine the theory. So when you think about different theories you might use, you can also think about different ways you might use theory. You will often want to start off with the existing theory, but you can also think about ways in which existing theory is insufficient and engage in some theory generation, whether with

On Belay71

a heuristic case study where you extend existing theory and create new theory in the process or with a theory-testing study where you refine existing theory by creating new scope conditions. Another strategy is to use a theory that no one has applied to your broad empirical topic and see what you can learn about your topic (disciplined configurative case), see what you can learn about your theory when extended to this new venue (heuristic), or see if the theory holds up in a new venue (theory testing).

Framing: Connecting Your Project to Theory, or Dressing Up Whether your project is theoretically motivated or focuses on an interesting/ important topic, you always need to connect your work to the larger literature. Because a lot of us, Dirtbaggers or not, won’t luck out and examine a really important or interesting topic—or be the first one to study such topic—we don’t need to connect our work just with the existing literature but also with existing theory. Before I get into some specific strategies for how to go about doing this, I want to offer a motivating example to explain the underlying dynamics of why you want to connect your work to theory and what that process looks like.

Dressing the Part Academics are a pretty clubby or cliquish bunch. Is this because so many of us didn’t get to sit at the cool-kid table in high school, and those who are now the cool kids are mimicking that familiar tendency to exclude those who are uncool? Maybe—I have no idea, but that’s what it feels like, sometimes. What does this mean for our research projects? When we want to talk to other people about it, or submit a manuscript for publication, we need to present it in such a way that the cool kids, and the wannabe cool kids—or whatever clique you are trying to reach—will be most likely to let it into their circle. Especially for scruffy Dirtbaggers like me, that’s a tall order. What does this mean in practice? First, it means you have to know your audience—whom do you want to read (and maybe even cite) your work? It’s okay to be aspirational here. I often imagine Really Big Name People in the field (whichever field or subfield I’m working on at the moment). You might not care about Big Name People’s reactions—maybe you’re really impressed by a relatively junior person in your field who has been doing interesting work and you’d be thrilled if they were excited by your work, too. Either way, once you have a person in mind, think about what you need to do to get them to pay attention to your project or maybe like it or cite it. To even have that thought

72

Chapter 4

experiment, you have to actually know what they care about—what sorts of questions, conversations, and other scholars they pay attention to. Second, it means you need to present your research in a way that speaks to those concerns or considerations. Going back to the high school analogy, acceptance doesn’t necessarily depend on how smart or talented you are. So how do you get classmates to take you seriously? One way is to dress the part. If you want to hang out with the goth kids (who dress in black and such), maybe don’t walk up to them wearing an orange shirt or a sport jersey. If you want to hang out with the cool kids (let’s imagine that at this fictional school, the cool kids are basically the wealthy kids who have expensive clothes from some name- brand store), you don’t want to walk up wearing hand-me-downs. Basically, you want your intended clique to take you seriously enough to listen to you— then they can accept you (or not) based on the quality of what you have to say. And to get them to listen, you need to dress the part. When it comes to research, this means you have to dress your project up in a way that gets other scholars to take it seriously—we call this framing. This is what I was failing to do in my early research projects by not engaging with what had already been written on my topic. Basically, you want to use the terms and theories that are familiar to these scholars. But you also don’t want to do this superficially, name dropping Big Names and jargon here and there, and especially (inadvertently) misciting said Big Names and misusing jargon. Going back to the analogy, you also don’t want to dress in a way that is clearly inauthentic: For example, if the cool kids have designer clothes with strategically placed holes and tears in their jeans, they will see right through your attempt to fit in by wearing jeans with holes in non-strategic places and that are clearly created by overuse rather than a designer’s eye. They will know you are an imposter. Pay attention to how they dress so you can emulate it as well as possible—and then have other folks vet your outfit for you and let you know if you’ve missed some subtle detail that might prevent you from fitting in. Now, let’s talk examples.

Dressing up Your Empirical Puzzle As a Dirtbagger driven by sheer interest in things, with little natural interest in theories and debates, I tend to start with the empirical puzzle. This is how my book on Eastern State Penitentiary started. Eastern was something of an anomaly: Why would a prison retain a unique and highly criticized approach to incarceration? Now, that’s a puzzle. But, as an empirical puzzle (because it’s

On Belay73

just about a set of facts in a particular situation with no clear connection, yet, to something larger), it’s one that might not be interesting to people outside of my field—that is, to people who don’t care about prison history or prisons more generally. It does help that this is a prison that most punishment scholars have heard about—but most other people haven’t. To get people to really care about my question, I needed to reframe it (or dress it up) in theoretical language. About midway into my research, I turned to organizational theory, specifically to neo-institutional theory, which predicts that under certain circumstances, we should really expect all similar organizations (e.g., all prisons) to look the same and adopt the same policies: Organizations that reject norms are expected to do very poorly and ultimately either conform or fold (i.e., close down). Eastern seems to be a puzzling case for neo-institutional theory, so that’s how I reframed my empirical research question: How was this exceptional organization able to withstand pressures from the field and avoid conforming to the field’s norms for so long? Now, that’s a question that organizational scholars might care about, even if they really don’t care about prisons. Here’s another example, this time from ethnography (after all, most ethnographers are Dirtbaggers at heart, too). Randol Contreras’s book on the “stickup kids”—a group of men who torture and rob drug dealers—starts with the following puzzle: Violent crime dropped significantly in the early 1990s and continued to stay low (or, at least, lower than it had been), but “unreported violence within the drug world seemed to be rising.” Contreras added a layer of theoretical interest to this original empirical puzzle by noting that the unreported crime that was increasing was largely committed by men who were past their “‘crime-prone’ years” (criminologists put peak crime age at late teens to early 20s). This is a double puzzle, or as he puts it, a “double irony” (Contreras 2012, 2). The initial empirical puzzle is just the disjuncture between what he was seeing on the news (and in academic literature) and what he was seeing in real life, behind the scenes. The theoretical puzzle was that standard accounts of criminal behavior (i.e., theories) couldn’t account for why grown men commit crime, particularly in large and growing numbers, and particularly when young men were committing less crime. This is a nice way to dress up his study for criminologists for whom the age–crime dynamic is really interesting. Contreras went further, though, to dress his study up for sociologists not interested in crime. In the process of trying to explain this doubly unexpected surge of violence, Contreras also asked other theoretical questions. For example, he asked about the relative roles of macro-and micro-level social processes

74

Chapter 4

in shaping this interesting development. Stated more specifically, he asked how much of this outcome was related to changes in the US economy, which affected the men in his study, and how much was related to their own agency or personal desires for excitement and thrill? Thus, he reframed his original empirical puzzle in theoretical language that would interest a larger audience— that is, people who don’t care about drug dealers in New York City, or even about questions regarding age–crime dynamics, but might be interested in the macro v. micro, or economics v. agency, question.

Strategies for Connecting Your Project to Theory So now that maybe you have a sense of when and why you need to connect your project to theory, and some sense of what the end result might look like, how do you go about doing it? In this section, I’m going to review some of the advice people usually give on how to frame or connect your research to existing theory. I actually find a lot of this advice (at least in its original version) unhelpful. I’m going to cover it anyway because people will often ask you these questions—sometimes trying to be helpful, sometimes in a condescending way, and sometimes to trip you up—so it’s good to know these terms. But I’m also going to offer my own concrete advice for how to go about linking your project to the literature—your lifeline—so you don’t fall off the cliff.

The Reigning Advice: “What Is This a Case Of?” I once presented an early version of a research project at a national conference. It was one of those 15- or 18-minute presentations where you can’t cover much ground: You might have one or two slides on the prior literature, one or two slides on your methods, and the rest of the time you spend talking about your findings with a final slide on their implications. I had been a bit nervous about this presentation because it was a brand-new project that I had started only months before. It was a project examining the relationship between the authorization of the first state prisons or “proto-prisons” and the decline of capital punishment in the United States. In the literature, and even while teaching, people often refer to the rise of the prison and the decline of capital punishment as linked developments. I decided to collect the legislation that governed each of these developments and see what I could find. My findings were really interesting, but I couldn’t think of a good way to package them—that is, to make them theoretically interesting. I was familiar with the literature on prison history and the literature on penal change, but

On Belay75

I did not have a particular theory or theoretical framework in mind. At the beginning of the talk, I gave the standard caveat that I was still early in the research process and that I was looking forward to any suggestions people might have. I was truly hoping someone in the audience might see something I was missing. A friend of mine was in the audience and, trying to be helpful, said, “Well, what is this a case of?” I remember feeling a bit disheartened at this because I knew this question well, but I had never found it to be helpful.4 To be honest, it took me a really long time to understand this question at all. People asking this question always seem to think it’s self-explanatory, but it really isn’t—at least not to me. Does it mean is this a case of historical sociology or organizational theory or something else? Does it mean is this a case of X pattern—like net widening, cognitive dissonance, or self-fulfilling prophecy? It was always so ambiguous and since—as my fourth-grade teacher pointed out—I tend to overthink things, I came up with too many directions to actually answer the question. Ultimately, I just find this question, at least phrased in this way, to be utterly unhelpful. A better way of getting at an answer is to ask this question more concretely: What are other examples of the same thing in totally different contexts/circumstances? Stated in the same terminology as the cliché but reworded to be helpful: What are other cases like yours? Answering the question in this way forces you to think about different examples—other periods, other countries, other populations, other professions—where you see similar mechanisms or processes at work. Here’s a bonus move. Once you can identify some other, diverse examples to go along with the specific thing you are studying, then you can work on naming the larger, generic pattern you are talking about. Naming the pattern might mean using the name someone else has given it—if it has already been recognized in the literature—or literally giving it a name yourself, because no one has identified it yet. Remember my discussion in Chapter 2 of Jooyoung Lee’s work on amateur rap artists and how he likened them to professional ballerinas, baseball players, or mathematicians because they faced similar constraints and made similar choices despite very different contexts? That’s a great illustration of this practice of identifying other examples to go with your main empirical example. Moreover, because Lee was the first scholar to discuss this pattern, he named it: These different groups experienced “existential urgency” (Lee 2016).

76

Chapter 4

So how do you find other examples, in different contexts, of the underlying pattern you are studying? You read the literature. This can be your disciplinary literature; that is, you will need to look more broadly than the literature on your empirical topic. You might read some articles in the flagship journal of your field (usually the one sponsored by your professional association or the one with “American,” “British,” “Canadian,” or “Australian and New Zealand” in the journal name, depending on your country). Look for articles about really different contexts, but that have something in common. I study prisons, but maybe I can learn from people studying hospitals and schools (other people- processing organizations): What theories, frameworks, scholars, and literatures are they building on? Does any of it resonate with what I’m looking at, even though I don’t really care about health/medicine and education/child development? Or you might read some articles in the journals of your subfield—the ones that are closer to your area(s) of interest; in my case, these are primarily Punishment & Society and Theoretical Criminology. That’s actually what ended up solving my problem. I was reading an article and something in its literature review struck a chord. The authors were reviewing the various examples of how punishment is always a mixture. There has been a tendency for punishment scholars over the last two or maybe three decades to describe the many ways in which punishment is “variegated,” “braided,” “bifurcated,” or resembles an “assemblage” (e.g., Hutchinson 2006; Maurutto and Hannah-Moffat 2006; Seeds 2017). Each study basically argued that the picture was much more complicated than how we usually described it. I realized that what I was seeing in my study was similar to these descriptions. Rather than proto-prisons emerging in response to capital punishment’s decline, I saw a lot of variation. For example, some states expanded their list of capital crimes when they added a proto-prison, and some states authorized a proto-prison before they reduced their list of capital crimes. Importantly, this pattern of complication didn’t fit any of the existing versions of how punishment is complicated (the list of variegated, braided, etc.), so I came up with a new one—and I gave it a name—“penal layering.” Basically, I argued different types of punishment can become “layered”—we can have both prison and capital punishment, for example—and those layers will be thicker in some places (places that rely on them more) than in others. Using some references to geological layers, I then related my metaphor to another set of scholars’ metaphor of how penal change was like plate tectonics (Goodman et al. 2017) and showed how these two metaphors could work together to give a more accurate

On Belay77

description of how penal change occurs (Rubin 2016). Boom. I tied my empirical findings to theory, and I actually made new theory in the process. What helped me figure out how to link my paper to the literature was realizing that I am not restricted to the existing theories, frameworks, and even concepts out there. Asking “what is this a case of ” sometimes makes me think in that narrower way—what other person’s concept or theory is my case or finding an example of. Instead, it can be helpful to ask yourself first, “What are some ways that other people might explain what I’m seeing—or what I might expect to see (in my case, the many variations in how penal change plays out)?” And, second, allow yourself to say, “The literature hasn’t covered this yet—I’ve reviewed the major articles or books, here are the standard concepts people use, and nothing seems dead on—so here is a new way to look at this finding.” What’s somewhat annoying about this strategy is it takes a bit of confidence. I don’t think it’s an accident that my first two papers, which I published in graduate school, were both theory testing—I lacked the confidence to come up with my own theory or concept to make sense of the world. After grad school, I got a lot more confidence to say, hey, they’re doing it wrong (although I tried to be a bit less abrasive in how I eventually wrote it up).

The Reigning Advice: “So What?” Trying to be helpful (or sometimes not), people frequently raise the “so what” question as in, “This is all interesting, but so what?” What they mean is, “Why should I (or someone else) care about your research?” I hate this question. I think it’s an asshole way of asking a legitimate question about poorly motivated work. By “poorly motivated,” I mean when the author/presenter has not made a sufficient case for what it is that their project does for people who don’t care about the intricacies of their data and/or empirical topic. This happens when the author mistakenly thinks they have a sexy topic, when they really don’t, and thus end up in the no-go zone in the Theory–Policy Matrix. To avoid such outcomes, mentors or friendly audience members sometimes ask this question early on in a project to encourage the researcher to explain why their project is valuable to a broad audience of people who don’t care about the empirical topic.

Because It’s There When I hear people ask this “so what” question, I’m reminded of George Leigh Mallory’s response to a reporter’s question of why he wanted to climb Everest (Sagarmatha or Chomolungma, as it should be known), particularly after

78

Chapter 4

several failed attempts, back in the 1920s. His response: “Because it’s there!” It is a celebrated quip in the climbing community, especially among mountaineers, partly because of its implicit exasperation at having to justify something that they think is obviously desirable. Even so, it’s actually a subject of deep self-reflection for a lot of climbers: “Why do I want to do this thing that so many people think is crazy (and [for some] that I myself recognize as crazy) because it very well might kill me?” (Mallory died on Everest not long after.) Some climbers who have asked themselves this question have come up with interesting answers that range from the high of challenging yourself, pushing yourself to your limits, and seeing what you are made of to the sense of peace one feels—and feels only—on a snow-covered mountain, on a wall 1,000 feet off the ground, or even on the 30-foot wooden wall of a climbing gym as your mind shuts off and you mechanically execute the moves. But most climbers recognize that there is some sort of drive that they struggle to articulate fully. For a lot of researchers, especially us Dirtbaggers, there is a similar sort of drive: I can’t tell you why I find prisons interesting, I just do. Unfortunately, that’s not good enough if I want to get published.

When someone asks us the “so what” question, a lot of us social scientists probably wish we could respond like Mallory did. Some social scientists are driven by fundamental concerns about society—the deep inequalities that they want to see reduced and ultimately ejected. For them, these questions can be a bit easier to answer. But if you aren’t driven by policy-relevant concerns, you might be driven by the general enjoyment of the research process, the feeling of accomplishment, the joy of discovery, or the desire to push forward basic knowledge—not trivia, but fundamental truths about humanity. It’s helpful to remember this last one in particular: Sometimes we can get lost in the weeds, or we can buy into some of the bullshit about the irrelevance of ivory tower academics set off from the “real world” by their (our) highly specialized interests. Toward the end of graduate school, I suddenly had a major crisis of faith in my research. How does one justify social science research that is not policy relevant? Policy relevance is often emphasized by funding agencies, and certainly it seems like people who do more policy-relevant work get invited to give more talks at universities and more TV and radio appearances. Some advisors tell their students that their work should always be policy relevant. It’s also a mandate I’ve noticed that a number of academics make on Twitter in 2019 and 2020 (the

On Belay79

years when I was writing and revising this book). But I do historical research; it’s not generally policy relevant, at least not directly. Indeed, people often ask me how my work relates to the present, and I frequently answer, “It doesn’t; the contexts are too different.” In grad school, I was just starting to experience this disconnect—between my policy-irrelevant research and a clear demand for policy-relevant research—and I started to feel like my work was worthless.5 I would often compare my work to my husband’s: He is an astrophysicist specializing in cosmology (essentially studying the history of the universe). Because physics is within the “hard sciences” and clearly rooted in the increasingly popular STEM fields, I needed no convincing that his work is important, regardless of whether it ever reveals anything policy relevant (which is unlikely). His work tells us something about the universe—what we call “basic knowledge”—and about where we, all of us, come from. But I could not, for the life of me, make the same case for my own work. I had previously explained to myself (and others who asked) that my work would never be policy relevant, but rather it would be used by other people who were doing policy-relevant research. I argued that my type of foundational or basic research is necessary because it makes the job easier for scholars doing policy-relevant research. I would feel a bit like the betas in Brave New World who did the work necessary for the alphas to do their more important work. This is a depressing way to think about one’s research, but it did give me a clear sense of where my work ranked and a way to articulate my sense of purpose. Ultimately, I was driven by interests and curiosity rather than policy questions, and that was good enough for me, even if it didn’t seem quite as valuable to others in society. But for whatever reason, at the end of grad school, this reasoning was not working anymore. In light of my doubts, I sought out one of my advisors, Cal Morrill, who set my mind at ease. In a totally genuine tone (I wish I had recorded this because it was so reassuring), he said that as social scientists, we are working to understand basic truths about the human condition—“What can be more important than that?” I think of this when I feel guilty about not studying something that is more policy relevant, something that would more directly relate to reducing the suffering of others or that at least had the chance of getting read by a policymaker (or, realistically, someone who works for policymakers). One way, then, of answering the “so what” question is to think about what your study tells us about humanity—human interactions, society, organizations, political behavior, cultural formation, and so on. Each field has a set of

80

Chapter 4

core concerns. For the last several decades in sociology, I would say the most central concern has been understanding inequality (its causes and consequences), but there are others. Think of ways to connect your research question to these core concerns. You might have to get a little creative, making a few leaps at first, but you want to be able to articulate that connection. If you can’t make the connection to your field’s core concerns, start with the core concerns of your subfield; these subfield core concerns should be relatable to your field’s core concerns, but that can sometimes be difficult to see, especially when you are just starting out. Here are two strategies for doing that.

Remember Why You Got into This Mess One way to figure out what theory, theoretical framework, or literature you want to engage is to think about what attracted you to your topic in the first place. I ask students this question, and they usually have a story to go along with how they came up with their interest. So, step one: Sit down at your computer and start typing, or go for a walk, or get coffee/dinner with a colleague, friend, or loved one and start a voice recorder. Step two: While typing or talking into a recorder, explain why you started working on this project. This exercise can be useful in figuring out what your research question is, how to frame your work, or even just reinvigorating your love for your project. Sometimes, we get lost in the weeds and forget what the main point or goal of the project was, so go back to the beginning. Maybe you selected your research site (e.g., trains in the local subway system) and an associated phenomenon (e.g., social conflict), and you have a broad research question. Remind yourself why you thought your site would be a good place to study that phenomenon or maybe just what drew you to that setting. For example, maybe you (or a friend or relative) were sexually or racially harassed on a train, and you wanted to understand why people didn’t intervene. Or maybe you (or a friend or relative) were scolded (or even ticketed) for “manspreading,” and you wanted to understand why this behavior is being regulated when other behaviors (perhaps putting one’s purse or bag on an adjoining seat) are not. In these cases, you wanted to see the range of conflictual behaviors or actions taking place and understand their underlying dynamics, but what really brought you to these cases was a particular story or event. Let that story or event guide your thinking about how this study matters or what literature(s) you should use. For example, these stories probably have gender and racial

On Belay81

components, and perhaps age dynamics as well. Maybe rather than focus on all instances of social conflict, you want to focus on cases that also involve demographic factors (gender, race, and/or age), or you want to focus on the gender, race, and/or age elements of the cases of social conflict you have found. Spending some time thinking about your original interest and explicitly writing or telling how that interest started this project off can be really useful for connecting your research to something greater than your site or topic of interest.

Ask Yourself the (Other) Asshole Question When a university department wants to hire new faculty, they bring in three to five people to interview; part of the interview process involves the job talk. During a job talk, an applicant (often a graduate student near the end of their dissertation) presents some piece of research to the faculty (and maybe graduate students) of the department they want to join. After the presentation part of the talk, the faculty in the audience start to ask questions about the research. In one of the universities where I’ve spent a period of time, a certain professor—a fairly famous professor, let’s say of sociology—would ask the same question at every job talk in his department. At the end of each talk, he would ask, “How is this sociology?” Although opinions vary on the acceptability of this question, I call this “the asshole question” (which is admittedly confusing because all of these questions can be pretty assholish, but this is the most aggressive of them all and therefore gets the special titular status). It’s an especially bad question in this context of a job talk: The poor unsuspecting graduate student gives a fantastic talk and then has to explain how their obviously sociological talk is actually sociology. For some speakers, answering this question might be an easy thing to do. But sometimes, the more obvious the linkage, the more difficult it can be, and the grad student ambushed in this way tends to sputter a bit. But it’s a question I keep in my arsenal for thinking about projects, particularly as an interdisciplinary scholar in a disciplinary world: Even when you sit at the intersection of multiple fields, you often send your work to disciplinary journals and seek jobs in traditional-discipline departments (soc, poli sci, crim, psych, history, anthro, and so on). It’s a helpful question for interdisciplinarians to think about to make sure we don’t lose track of that broader disciplinary audience or two that we may sometimes want to speak to. But it’s also helpful even for squarely disciplinary scholars because they too can lose track of the broader disciplinary audience beyond their subfield.

82

Chapter 4

So even though it’s an asshole question, I sometimes (nicely) ask graduate students this question when they are trying to decide if they’ve found a good research question. It’s useful in at least two contexts. The first context is when they have a topic or a puzzle and even a pretty clear set of variables they want to explore, but no theoretical framework yet. One way to quickly find a theoretical framework is to ask yourself, “How is this sociology—or political science, criminology, etc.?” To answer this question, ask yourself, “What is one really big question all sociologists (or political scientists, or criminologists) care about that this topic might shine light on?” That can direct your attention to the correct literature. If the answer is not immediately clear (after thinking about it for a bit), go back to your comp exam reading list. Most doctoral programs in the United States and other countries require their students to take a written exam, known as the comprehensive (or comp for short) exam6 to demonstrate competence in some area or areas before they can officially begin writing their dissertation. The idea is once you know the area, then you can be unleashed to write your dissertation because you can be trusted to write something of interest to the larger field—that is, something that can be published and is ultimately necessary to get an academic job (if that’s what you want). Your comp exam will cover most, maybe even all, of the key debates in some recognized area or subfield of your discipline. For example, a criminology grad student in the United States might be tested on all of the (leading) theories about what causes crime. A sociology graduate student in the United States might be tested in one of the several dozen major areas of the discipline—race and ethnicity; gender; stratification; social theory; culture; work and organizations; professions; knowledge, science, and technology; crime and deviance; law; and so on. There would be similar breakdowns for political science, psychology, anthropology, and other social science programs. Knowing your area is one of the ways you establish yourself as a disciplinarily trained scholar. To “know” this area means to be able to summarize (and maybe even weigh in on) its big debates, questions, and theories. These debates, questions, and theories can then be your toolkit for analyzing your topic. (Sometimes, you will take a comp exam in more than one area, in which case both areas are fair game.) The other context in which the asshole question becomes relevant is when you come up with a research question that no one in the field has written about. I don’t mean that no one has combined your topic and theoretical framework; I mean no one in your field is even asking this type of question or exploring this

On Belay83

topic at all. (If this happens, definitely double-check that you are right by asking as many profs as you can if they know of anyone who has done this type of project.) Sometimes finding out you are in new territory means you’ve hit the jackpot, and sometimes it means you’ve traveled out of bounds for your discipline. The jackpot scenario is wonderful: You are doing truly innovative research that is combining multiple literatures or is addressing a new or under-examined phenomenon of interest, and your research has the potential to be groundbreaking. Congrats! But that’s really hard to do, and it might be more likely that you’ve traveled out of bounds, by which I mean, you are no longer in your discipline. It might be that no one has written about this thing—at least within your field— because no one in your field actually cares about this topic. This is especially true if, when social scientists divvied social life up into distinct disciplines, your topic of interest is actually considered the domain of another discipline. For example, you might be interested in studying courts. Great! They are super important to study, and they are of interest to many disciplines— political science, sociology, criminology, history, psychology, anthropology, and probably others. But each discipline tends to be interested in certain types of questions, so if you ask a question about courts, make sure it’s a question people in your discipline care about—otherwise, it might be really difficult to publish in your discipline’s journals, to get your dissertation committee to agree to your project, or to get your department to agree to give you tenure because you can’t make the case that your research is really sociology, or whatever discipline. *** Going out and studying something just because it’s interesting is a perfectly fine strategy when you are just starting your project. But as you start to talk about it with other people, or transition from analyzing your data to drafting an article or book chapter, new considerations emerge. If you are one of the lucky few people doing fairly novel research on a sexy or policy-relevant topic, you can chase your interests. Most of us—especially Dirtbaggers, but also people interested in important things that have already been studied quite a bit—are not in that situation. For most of us, then, we need to connect our research to the existing literature, especially to existing theories in our subfield(s) or discipline(s). This process of framing our work allows us to stay connected to an anchor, connected to our field, which is so much safer for our careers than trying to go solo.

5

Mapping out the Route How and When Research Design Matters

Rou t e The path of a particular climb, or a predefined set of moves. Rock Climb Every Day (2020), “Glossary of Climbing Terms”

Finding Your Path Forward When big-wall climbers and alpinists approach a mountain, they look for a line up the mountain. In the case of cliff faces like Moonlight Buttress, Half Dome, or El Capitan, big-wall climbers look for cracks or fissures and veins of different types of rock—places where you will be able to find good holds—and trace a path from the base to the top. In the case of snow-covered mountains like K2, Everest, or Cerro Fitz Roy, alpinists also look for cracks and changes in the rock composition, but also for ridges and valleys—places where you can trek or climb more easily. The line up the mountain is your route—basically a path that you follow to the top. In general, you want to have a sense of what route you are taking before you start to climb. Some people study these routes very intensely before they start the climb. They plan out the sequence of their climb, what sorts of gear they might need, what moves they’ll need to use, how long it all should take. Some will collect beta, or a set of step-by-step instructions of how to climb that route, available for many, especially well-established, climbs. Climbers attempting a first ascent, however, won’t have that beta and have to figure it out either from the ground or by just going up there and trying it out. Research design—that is, your plan for how you will collect and analyze your data—is just like planning your climbing route. You want a rough map

84

Mapping out the Route85

before you start and to make some decisions about how you are going to proceed. As with climbing, there is no one right way to do a particular research project—different people might go about it differently—but there might be some popular trends that some people gravitate toward. For some projects, moreover, you might be able to perfectly predict the exact set of steps you’ll need because you’re completing a fairly well-known climb. For other projects, you might end up making some changes while you are on the wall; just as your research question may change in the field, so too might your planned research design, and you have to adapt. In the worst case, the route you laid out from the ground might not actually be doable.1 But for some projects, the type that Dirtbaggers like, you might just look at the wall and start climbing, figuring out the moves as you go—sure, this way may take longer, but it’s more fun and, as Alex Honnold said once, more sporting.2 *

*

*

This is the first of three chapters on research design. People seem to define research design differently, but I think of it as your plan for collecting (and analyzing) your data. I’m using “plan” loosely here because you might initially plan things out in broad-brush strokes, do some research, and then make a more detailed (or different) plan. Another way of thinking about this is research design is your account of the research process—how you explain or justify your decisions about how to collect and analyze your data. Your explanation may not actually be what guided your decisions (the conventional idea of research design that takes place before you collect and analyze your data). In fact, thinking of your research design as what you planned to do from the beginning is misleading, particularly if some part of your design involves piecing together what you already did. But your ability to defend your choices is key to how we evaluate research. So before getting into the nitty gritty of creating your research design, this chapter addresses general things you want to keep in mind as you plan and execute your research, whether or not you want to map everything out carefully ahead of time or be a true Dirtbagger and play it by ear. In either case, keeping these things in mind—not necessarily acting on them immediately, but letting them inform your decisions—will lead to a better project. Conventional advice would say you have to do them immediately. Instead, I’m going to point out that as long as you come back to them at some point, you’re all good.

86

Chapter 5

Research Design, or the Many Layers of Sampling At some point (especially when you are writing up your project), you need to be able to summarize (and justify) what you did to collect (and analyze) your data. That sequence is your research design: It will be central to how people will evaluate your research and determine the credibility of your findings. Regardless of when exactly you make or update your research design, it should include not just what method(s) you use (interviews, ethnography, close reading of texts, etc.), but most of it should be about how you decide whom to interview, whom to observe, which documents to read. These questions boil down to questions about sampling. Skeptics have a lot of hang-ups about sampling in qualitative methods. And this will be true regardless of your approach—ethnography in the sense of embedded, long-term participant observation; an interview-based study; an analysis of an internet forum; or an archival/historical study. Each study type will engage in sampling of some kind. In fact, any given study will engage in sampling at multiple levels, including at the very least your case(s) or site(s) (Why this case or site and not a different one?) and your sources of data within or about that case or site (Why did you interview this person and not that one? Why did you read this document and not a different one?). This discussion might already be a bit surprising to some readers. Certain subfields or methodologies don’t speak in terms of case selection or sampling, but these concepts are still relevant. So let me back up and clarify these terms. First, in every study, you will have one or more “cases” or “sites.” These terms are sometimes used interchangeably, but they are distinct. A case can refer to a population, event, state/province/country, organization, some phenomenon common in a particular period, or court case. A site usually refers to a specific location—a particular organization, neighborhood, or even a website. Sometimes the difference is determined by whether you physically visit the location (a site) or if it’s something you study from afar (a case). But really, it doesn’t matter. Illustrating another difference, some subfields or methodologies don’t use the term “case selection” or “site selection” because, for example, they are doing interviews or archival analysis. But you are still studying something that is bounded by time and space, so you are, in effect, studying a case or a site. And I count studies focusing on either cases or sites as “case studies” because every study is a case study. I think the artificial distinction between studies that are seen as case studies and those that are not sometimes leads to disproportionate criticisms of

Mapping out the Route87

self-identified case studies. As with the example I gave before in Chapter 2 of sentencing disparities in Maryland, it might be a study of the full population in Maryland, but it’s still a case study of Maryland. Likewise, when people do research on another country, it’s often called a case study, without people realizing that the same type of research done on the United States is also a case study of the United States even though we don’t call it that. I emphasize these similarities in part to illustrate that, despite the different terminology, the same considerations are at play. So even if you don’t see yourself doing a case study (as in a study of a single case—a single prison, hospital, neighborhood, clique, event, policy, country, etc.) because, for example, you are going to interview 100 people, you still need to be ready for these questions. Next, for each study, you will have some sort of sample. This term might also feel weird to people who don’t see themselves engaging in any type of sampling—for example, because they are examining the whole population or all the data available (ethnographers and historians, I’m looking at you). Your sample is basically the immediate source of data—the people you interviewed, the interactions you observed, the documents you read. You also might have one or more samples—for example, you might have a sample of prisoners and a sample of correctional officers, or students and teachers, or shoppers and salespeople; you might have a sample of newspapers and a sample of official reports, or a sample of laws and of recorded debate that happened on the floor of the legislature. Again, whether or not you see yourself as engaging in sampling, you still need to be ready for questions about how and why you selected these samples. Ultimately, a good chunk of your research design needs to be able to answer many questions about how you selected your case(s)/site(s) (and not other cases or sites) and how you decided to collect data from your particular sample (and not other samples).

The Blurry Line Between Case Selection and Sampling So far, I’ve (hopefully) made it sound like cases/sites and samples are distinct ideas. But now I’m going to complicate things. There are many decisions you make when designing your study, and they don’t always fall neatly into categories of case and sample. In fact, I pretty much think everything is just a type of sampling—because there is not always a clear line between your case and your sample. Let’s take David Snow and Leon Anderson’s study of homeless people in Austin, Texas. We can ask multiple questions about their sample: Why did they choose to study homeless people in Austin and not, say, Los Angeles, New York

88

Chapter 5

City, or Kansas City? Taking for granted their choice to study Austin, how did they decide to study that city’s Salvation Army as one of their primary fieldsites for ethnographic observation? Finally, how did they select their 100+ participants to interview—why these people and not others? Why did they focus primarily on men and not both men and women or just women? Snow and Anderson answer all of these questions in their book, and we’ll come back to some of their answers. For now, the point is that each of these questions can be asked about any study—and they need to be answered. Likewise, if you are doing an archival study on a particular prison, like I did, these types of questions are still relevant. Why this prison? One could study my prison, Eastern, a unique prison; the more influential Auburn State Prison in New York; a typical model of that prison like Connecticut’s Wethersfield State Prison; or an example of an Auburn-style prison from a southern state. Why this period? The modern prisons of the Jacksonian era (emerging in the 1820s–1840s, roughly) get a lot of attention, but one could also focus on the proto-prisons of an earlier era (emerging in the 1780s– 1800s, roughly); or what the modern prisons were like as they aged over time, such as after the Civil War; or we might even study prisons in the twentieth or twenty-first centuries. Similarly, why this specific range of time? Should you trace the history of this prison from its opening to its closure, or its first few decades, or from its inception well before it opened to some other end point? Why these archives? There are numerous archives, large and small; some are located in the same state as the prison of interest, and others are out of state; in addition, there are many online databases one can consult. Why these periodicals? One can look at popular literary magazines, newspapers (both short- lived and those with longer lifespans—again, from in state and out of state), and city or state governmental reports. Why these pamphlets? The penal reform literature from the nineteenth century (my period) was vast: One can examine the most popular pamphlets (and explain in what sense they are the most popular), all the pamphlets in certain databases or archives, or extend beyond penal reform literature to other concerns (such as discussions about the theater, the family, or slavery). There are many questions that arise regarding sampling in each project. For precision, I’m going to continue to distinguish somewhat artificially between the sampling we do at the level of the case (or site), which most will call case selection (or site selection), and sampling at the level of immediate data collection, which is usually what people mean when they say sampling.

Mapping out the Route89

However, it’s important to remember that a lot of the same logic is going on, which is why I like to think of both of these strategies as levels of sampling. You get a lot of the same types of criticism and advice about your case/site selection and sample. For example, in both cases, you might consider random sampling—or people might ask why you didn’t use random sampling.3 So, I like to think about case/site selection and sampling as being on a continuum rather than as distinct activities. Whether you are dealing with your case/site or your sample, there are certain key considerations you want to keep in mind: What are we leaving out, or what might we be overemphasizing, by studying this case and not this other case, or this source of data and not this other source of data—and (most importantly) should we be worried about it? That is, what are the sources of bias and how bad are they for what you are trying to do? It’s good to be really familiar with these questions—and, more importantly, their answers, because all too often someone will accuse you of selection bias, and you want to be able to respond, confidently, that they are wrong, or that it just doesn’t matter for what you are doing, and why. So those general types of criticism are what the rest of this chapter is about.

How Will You Be Evaluated? Research design is so important because it’s the key thing people (and that includes you) are going to use to evaluate your research—and especially to decide whether or not to trust your findings. So this section will discuss some of the main considerations that come up when people evaluate your research design.

Ask Yourself: Is This a First Ascent or a Fairly Ordinary Send? Doing research projects is, once again, a lot like rock climbing: If you are the first person to do something, and if it’s really interesting or amazing or difficult, you can get a lot of fame (relatively speaking, of course). But if you are the second person to do it, no one cares—okay, not no one, but it’s not going to make the same type of splash. Tommy Caldwell and Kevin Jorgeson have a feature-length film about their first free ascent of El Capitan’s Dawn Wall that made it into theaters around the world. I don’t think people are going to make a big-screen movie of Adam Ondra repeating the climb almost two years later, even though it’s one of the most difficult climbs in the world.

90

Chapter 5

From First Ascent to First Nude Ascent Climbers get a lot of attention for doing something first: The first person to climb a mountain. The first person to climb a new route on a mountain. The first person to climb a route in some record amount of time—say, under a day, then maybe under 12 hours, and so on. As climbers start ticking off firsts, the requisite levels of accomplishment and creativity increase before another person can reach some sort of fame. Basically, you won’t get media coverage for climbing a route that many people have already climbed unless you do something audacious. Climbing commentators now recognize the first nude ascent of some routes—as in, yes, someone went climbing buck naked (Flashman 2017). The more people who have gone before, the more amazing or flamboyant one has to be to get any recognition for something.

Here’s the thing about the research equivalent of a first ascent: In a lot of ways, the bar is lower. In grad school, I once went to a talk that was really interesting, but it had some pretty clear methodological problems. My stats professor was also in the audience, and afterward I asked him why no one was pointing out these problems. He explained that because this project was so innovative—what climbers call “next-generation” or “futuristic”—people aren’t going to quibble over the methods. But any studies that come next will have to step it up methodologically. I’m not sure this is the best way to do research, particularly given all of the attention to the so-called replication crisis in psychology and other disciplines. But it’s important to keep in mind. Basically, if you have a really sexy project that is well framed and is a question of broad interest, your methodology still has to be pretty good, but it doesn’t have to be perfect or even close to perfect. In fact, certain top journals specialize in this type of article—really snazzy topic or theory but a bit lacking on methodological rigor because the article is just that novel. Of course, few of us are ever in that (very rare) situation of producing such an article or book, so the standards remain pretty high.4

Audience: Who Are Your Gatekeepers? When figuring out what those standards are, a good place to start is to think about the norms of your field. Your field can mean a lot of different things. It might be methodologically like-minded folks (all qualitative scholars, all ethnographers), people with your epistemological outlook (positivist scholars, critical scholars), people in your subfield (all folks interested in sentencing

Mapping out the Route91

disparities, all prison scholars, all gender scholars, all social inequality scholars, all education scholars), or people in your discipline or interdisciplinary field (criminology, sociology, political science, law and society), and other distinctions (these things might vary by country—if you plan to travel or publish abroad). Importantly, your field probably is not composed only of people who think exactly like you. For example, if you think about people in your subfield, this will include people with different epistemological approaches (critical or positivist) and different methodological training (qual v. quant, ethnographic v. in-depth interviewing, plus old school and new school variations). Each of these different fields will have different norms to keep in mind as you research and write. These norms can vary quite a bit—even among qualitative methods. It can be tricky when people read your work according to their own sets of norms that differ from the norms among your more immediate peers. As sociologist Mario Small (2009, 7–10) has pointed out, in some fields, ethnographic work is particularly likely to be evaluated—for grants, publication, and post-publication reviews—by quantitative demographers, who are likely to use a different set of criteria. Part of your awareness of this audience involves knowing what sorts of things they will be looking for. That doesn’t mean you have to buy into it and do what they say (particularly if they are unthinkingly transplanting mantras from one method to another). But it does mean you’ll be better off if you can anticipate or at least be ready for those criticisms.

Selection Bias and Generalizability I would say the two biggest critiques that come up—or perhaps are at the basis of more specific critiques—involve concerns about selection bias and generalizability. When we talk about things like case selection and sampling, these considerations are usually the elephant in the room. They aren’t always big problems, but before we can figure out if they are problems, we have to make sure we understand what they are. So for that, I’ll start with some examples where they would be a potential problem. When you interview, go out observing, or read various documents, you must address selection bias: Are you systematically (and perhaps unknowingly) overrepresenting pieces of data that have an atypical representation of or relationship with the thing you care about? For example, let’s say I am interested in why Norway is so good at rehabilitating their incarcerated people. (Nordic countries are often praised for their prison systems, but I don’t actually know if Norway is particularly good at this; I’m just using it as a hypothetical example

92

Chapter 5

for the sake of illustration.) So I find a Norwegian prison that has a particularly low recidivism rate (meaning few people commit a new crime after their incarceration). I find that this prison has some variety of programs focused on group therapy, skills-based job training, and education in which people can finish their primary and secondary education. Through my fieldwork (observations and interviews), I’m able to see how these programs have a direct impact on incarcerated people’s rehabilitation—for example, I can see them getting better (along some dimension) over the course of their programs and, in interviews, the incarcerated people attribute their own self-growth to these programs. But then it turns out that the incarcerated people in this particular prison are unique in some way—before they came to this prison, they spent time in an intensive cognitive therapy program. No one mentioned it in the interviews, but it reframed their approach to life and taught them how to get the most out of their current programming. I’ve inadvertently gotten a unique group, and I didn’t even know it. Importantly, that hidden uniqueness is letting me think the training in the three programs is the cause of their success, when it was really the cognitive therapy. Another question that comes up is generalizability. Is your sample going to be representative of some larger population that we care about? Let’s stick with the Norwegian prison example, but let’s pretend there was no selection bias—it was a totally normal Norwegian prison. Prisons in Norway are pretty different from prisons in the United States, and even from those in the UK and Canada. So now the concern is whether what I learned about this Norwegian prison has any application to US prisons (or prisons in other countries that are substantially different). For example, maybe the rehabilitative process works so well in Norway because there is a strong framework of resources in the community to which formerly incarcerated people return after prison. Or maybe the rehabilitative process works well because prison sentences are shorter in Norway, so incarcerated people haven’t been fully transformed in the many negative ways that happen when people spend long periods of time in prison. Whatever the cause, if we tried to transplant this rehabilitative program to a US prison, we might not expect it to work as well. So I might not learn much from my Norwegian prison study if I’m really interested in improving rehabilitative strategies in US prisons. These two examples illustrate how selection bias and generalizability concerns can impede your analysis and limit the utility of your study. In reality, though, people tend to overstate these problems. For example, in the first

Mapping out the Route93

example, I think it’s rather unlikely that a researcher in this situation really wouldn’t have known about this earlier cognitive therapy program: Someone would have mentioned it at some point, and a good researcher would have asked about prior treatments. But even if they didn’t, the study is still tracing the importance of this programming on rehabilitation. It might be the case that for such programming to be particularly successful, it needs to follow such intensive cognitive therapy, but that doesn’t mean this current programming is useless. It’s possible that someone does another study—comparing prisoners going through the same set of programming, but with some who had the cognitive therapy and some who didn’t—and finds that those who had the cognitive therapy actually did worse overall. Likewise, in the second example, the study is actually not useless to someone interested in US prisons. The programming might not work as well without those additional Norwegian societal factors, but it still might work to some extent— and possibly better than other alternatives. Additionally, a followup study examining such a program in US prisons might be able to determine that it’s not the programming itself that matters so much as the larger conditions of incarceration and the social context to which formerly incarcerated people are returning. In both of these cases, concerns are overblown. Such studies aren’t worthless; rather, they give us a piece of the puzzle. In both cases, we can put the one study in a larger context of other studies and get a more complete picture. We can always learn more from more studies, and individual studies should be put in that larger context of other studies. Sometimes people forget this when it comes to qualitative research, and instead they place a lot of pressure on one study to do more than it can or should. The goal of any individual study is not to solve the world’s mysteries, but to get us one step closer to that goal.

Sometimes, Parts of Your Research Design Just Aren’t That Important Let’s go back to the concerns with selection bias and generalizability. These are huge issues for quantitative research, and they are fairly important for some types of qualitative research. But for other types of qualitative research, these issues aren’t that big a deal. If you work with concepts, mechanisms, and processes, you don’t need a perfectly representative or generalizable sample to trust that your findings will be true for other groups. For example, I’m particularly interested in understanding prisoner behavior, especially that behavior typically described as resistance. Scholars are

94

Chapter 5

studying this behavior in a variety of historical and international settings, some of which are highly unique. As a consequence, researchers have pointed to a diverse set of behaviors as examples of resistance or resistance-like activities (what I prefer to call “friction”: activities that break the rules but that, as far as we can tell, prisoners themselves did not intend as resistance). One scholar studying prisoners who are disproportionately visible minority immigrants—in contemporary Norway actually—described how they illicitly spiced their bland prison food (Ugelvik 2011). In my research on this subject, I talk about a diverse group of prisoners in nineteenth-century Philadelphia who illicitly spoke to one another (Rubin 2015b). These are two very different activities, which can be motivated by very different desires (appetite v. boredom). But both activities are about making life in prison bearable and continuing to act as a human being—in ways that happen to be illegal because of prison rules. In this type of study that is interested in understanding general phenomena, it’s okay if the details are different from case to case. It’s not just okay, it also adds to our understanding. For example, there are some important, unique differences in the activities of immigrant prisoners in contemporary Norway that add to our understanding of resistance and friction. Their status as immigrants, a subordinate status in Norway, contributes to the symbolic meaning of their food consumption practices as a form of cultural rejection, a small way of demonstrating their distaste for the country’s mistreatment of them and people like them (Ugelvik 2011). The point is not that we should expect this to be true of all immigrant prisoners, but rather to see whether similar dynamics are operating elsewhere—that is, do we see other immigrant prisoners engaging in similar practices or, even more broadly, do we see prisoners engaging in other forms of cultural refusal (spoiler alert: we do!). When we care about the underlying mechanism or concept, especially something that happens at an abstract level, some considerations don’t matter. We’re not interested in the exact number of X in our sample or whether specific actions are going to be the same elsewhere. In this type of study, which qualitative methods excel at, the things we do care about are common, so the details really don’t matter as much. Of course, the boundaries between this type of research and other types aren’t always clear—especially to people in the audience or potential reviewers—which is why it’s a good idea to be aware of these issues, acknowledge them, and be able to speak to whether or not they are a (potential) problem in your study, especially before you start collecting your data.

Mapping out the Route95

Other Common Pitfalls: Internal Validity, aka Did You Get It Right? In the quantitative world, you hear a lot about internal and external validity. Validity is essentially the extent to which we can believe one’s findings. When we talk about generalizability and selection bias, we are actually getting at concerns over external validity: Is whatever you found in your study going to be true of other studies in other times, other places, other people? In this section, though, I want to focus on internal validity: Is whatever you found in your study actually an accurate representation of what you are studying—not just of your data, but of what your data are intended to describe and explain about the people, process(es), organization(s), or place(s) you are directly studying? In experimental and survey research, people spend a lot of time talking about internal validity—especially “threats” to internal validity. Not all of those threats will be relevant to all qualitative studies, but they are helpful to keep in mind.5 That’s because someone (a reviewer, an audience member, someone interviewing you for a job) will ask about one of these threats to your study’s internal validity anyway, and you need to know how to respond, explaining how a particular threat isn’t actually so threatening. Other times, these threats will be relevant to your study even though you’re doing the most Dirtbagging type of qualitative study. Threats to your study’s internal validity will be particularly relevant to people studying something over time—including a lengthy embedded ethnography or historical research—but they can also be relevant to the practices you employ when analyzing your data, particularly when you do your content analysis. So for multiple reasons, these threats are good things to know about. They kind of make you better at life. All of these threats to internal validity are essentially versions of confounding—something else is going on, that we might not be aware of, to cause the outcome we observe. In quantitative research, confounding can occur when comparing the outcomes of two groups that have not been randomly assigned. For example, let’s say you compare the heart health of people who regularly eat chocolate and those who don’t and then find that people who regularly eat chocolate have good heart health. One confounding factor might be that people who regularly eat chocolate do so because they feel they have earned it from otherwise living a healthy lifestyle, and those who do not regularly eat chocolate are on a deprivation diet after years of an unhealthy lifestyle. Maybe it turns out eating chocolate has nothing to do with heart health, and differences in heart health are really related to the overall healthfulness of people’s

96

Chapter 5

lifestyles. In other words, there are other differences going on that we’re not picking up on. How does this relate to qualitative research? On the one hand, in qualitative research, we rarely randomly assign people (or other entities) to control and treatment groups that we then compare; consequently, some people think qualitative research is always confounded. They’re wrong. As I’ve mentioned before, confounding is typically less of a problem for qualitative methods. (Recall Chapter 2’s discussion of Beckett and Herbert’s study of trespass ordinances used on homeless people.) Unlike much quantitative work, qualitative work does not rely on correlations (where there might be unobserved confounding); instead, we frequently trace the process by which some factor causes an outcome of interest. While people can mistakenly use this idea of confounding to criticize research, there are other types of confounding that do come into play, even when you aren’t dealing with correlations. Indeed, people have identified about a dozen or more causes of confounding, or threats to internal validity, but there are about five that I think are relevant to qualitative researchers. One type of confounding factor—or threat to internal validity—is the cause and effect (aka chicken and egg) problem. Sometimes we observe something happening, and we think A is the cause and B is the effect, but it’s actually the reverse. This chicken and egg problem was a big consideration I had to contend with in my book on Eastern. I noticed that over time, Eastern’s administrators started engaging in behavior sociologists call “professionalization.” Basically, they increasingly referred to themselves as experts, claimed to be particularly well versed (especially compared to others) in a body of knowledge, and highlighted the special training they established for the rest of the prison staff. Versions of this professionalization were also happening in the field at large: Penal reformers and other prison administrators were making similar claims. Initially, I wasn’t sure if Eastern’s administrators were trendsetters (although trendsetters that no one was acknowledging) or if they were responding to and copying trends in the field—which came first? Indeed, anytime we try to claim something happened first can be a bit tricky, because we always might overlook a predecessor. By looking across time and different places, though, I ultimately determined that it seemed like the administrators I was studying did engage in this behavior before others (in part because most prison administrators in other states were basically entrepreneurs contracted out by the state, not typical

Mapping out the Route97

governmental administrators). But I also saw that some of the strategies the Eastern administrators used later did come from the field—sometimes they led and sometimes they borrowed. It was a complicated picture because these things were interacting rather than proceeding in a linear, unidirectional fashion (this thing first and then that thing). So it’s always good to keep in mind: Which really came first (e.g., X → Y or X ← Y), and do we need to consider the possibility that there are other causal arrows (e.g., X ←→ Y)? A second and related type of confounding is called history—the other things going on in the world at the same time that are influencing the outcome of interest. Again, this is particularly important for studies that examine a long period of time. This was another big consideration I had to contend with in my book on Eastern. After the US Civil War, there was a lot of change all at once: Prisons became overcrowded; rehabilitation became interesting again to policymakers; there was a growing concern for young adult offenders; statistics were increasingly used; knowledge was increasingly professionalized (albeit in a rather primitive way); and there was a growing discussion of repeat offenders, families of criminals, and eventually what penologists called the “crime class” (people who were, officials believed, congenitally destined to be criminals). At first, I wasn’t aware of all of these developments. And once I became aware of them, it was really difficult to determine what role any particular development was playing in shaping administrative behavior at the prison because all these developments really were happening at the same time. Unlike the last example where I wasn’t sure if the fieldwide behavior was the cause or effect of the behavior at the prison, here the complication was that I wasn’t sure which of the fieldwide behaviors was shaping the behavior at the prison, if at all. But as a qualitative researcher, I didn’t just assume that things happening at the same time were shaping what happened at my prison. Instead, I looked for clues to see how, if at all, things happening around this time were shaping developments at my prison. I examined how the people I studied referred to some of these trends in supporting or justifying their decisions, or at the very least if they referenced these trends elsewhere so I knew they were at least aware of these trends. But, since I could never be certain if something was or was not shaping behavior at my prison, I sometimes just mentioned the context without making strong claims about which of these factors actually had a direct impact. They all probably mattered to some extent, but I couldn’t always demonstrate that impact, so I didn’t make that claim.

98

Chapter 5

A third consideration is maturation—whatever you are studying was going to change anyway, but not for the reason you thought. It might be easiest to think about this in terms of your education or life experience. There is a certain amount of growing up we do because we just get older and our brains develop, and some of it is because of specific experiences that force us to grow up. I sometimes wonder for myself how much of my personal development is related to my specific experiences and how much of it is just time—had I gone to school A instead of school B, what would I be like now? In grad school, I really came out of my shell. How much of that was just aging and how much was the supportive environment? Over my first few years as faculty, I really developed a lot of confidence as a scholar. How much of that was aging (and other things like moving across country and living with my husband away from our families—basically, having to completely grow up) and how much of it was from publishing more papers, teaching more classes, and generally getting established? Because we can’t see the counterfactual—who I would be today if I really had gone that other route—and because I didn’t keep careful notes, I don’t think I’ll ever have an answer to those questions. But with good qualitative research, we can come up with viable answers. In studying Eastern, for example, there were some interesting changes over time, and it wasn’t always clear how much of that was related to the changing cast of men in charge and how much of it would have just happened as routines at the prison solidified with time. Organizations often experience a certain amount of drifting away from the original mission as time goes on: Were the changes at Eastern just natural drift or were they the result of a different generation of men at the helm in later decades who introduced new ideas and values? Because I had a lot of data, though, I could investigate whether things changed when those specific men took charge and whether they changed at these men’s behest—that is, demonstrating some clear connection—or whether these changes had precursors, suggesting the change was already underway before these men arrived? A fourth consideration is called instrumentation—that is, the way you measure something changes over time. This one is a bit different from the previous sources of confounding because it’s less about the research context and more about how you engage with your context and data. This is a big consideration across the board, especially for interviewers (Are you changing how you ask your questions over time?) and for ethnographers and people using texts (Are you focusing on different things over time?). It is also important during the content analysis stage when you are analyzing your interview transcripts,

Mapping out the Route99

ethnographic fieldnotes, or your documents (Is your coding schema changing over time?). To a large extent, this change over time is natural and even encouraged. In interviewing, you might come across new questions or better ways to ask your questions. In ethnography and text-based research, you begin to focus on some things more than others. The only reason this can be a problem is if you also seek to make a causal argument about some change over time, but that change is actually a result of your interests or techniques changing. Thankfully, there are some easy fixes: As you collect your data (in your fieldnotes), keep track of how your interests change; make a note to yourself saying, “I realized that I’ve not been recording instances of this thing happening so much anymore (because I don’t think it’s important or it’s not so interesting anymore).” This can help you later on realize that it wasn’t a change in your data but a change in how you collected your data. When you do your content analysis, recode your documents again and again, consciously aware of how your interests change. (This is actually standard good practice, but it’s important to emphasize here when discussing confounding because skeptics are often worried about such things.) Let’s go back to the professionalization I mentioned that happened at Eastern. This pattern became really clear after the US Civil War, but it didn’t actually begin there. One of the challenges I had while doing my content analysis (since I was coding the documents in chronological order) was whether the trend I was seeing after the Civil War was new or if I’d just become more aware of it; the answer was actually both. I was able to resolve this puzzle by going back and (a) reviewing my coding categories and (b) recoding earlier documents. In the process, I realized that professionalization was there all along: It’s just that I had coded it as other things in the beginning when I didn’t think of it as professionalization (because it was such a primitive version of professionalization, it was barely recognizable as such). But despite the difference in the administrators’ professionalization strategies before and after the Civil War, the underlying claims of expertise and of authority over this area of policymaking were the same.6 Without doing additional analyses, however, I might have incorrectly stated that professionalization only occurred after the Civil War. Instead, I was able to publish a really nice paper (or I thought it was really nice) developing the concept of “primitive professionalization,” or the strategies to claim expertise and authority in ways that differ from those strategies that we see later in more established fields like medicine and law (Rubin 2017a).

100

Chapter 5

A final consideration, and one that is especially relevant for people working with living subjects (participants) or archival records, is testing (aka the Hawthorne effect). Testing recognizes that our presence as researchers always disturbs the natural order of things—simply by measuring something, or even watching something, we change it. For example, if we start filming people, and people know they are being filmed, they might act differently. Awareness of this problem goes back to an old experiment in the 1920s at the Hawthorne Western Electric plant where researchers found puzzling results. They were trying to understand how various working conditions—like how much light there was—affected worker productivity. They followed a careful (for the time) experimental protocol where a control room had normal lighting, and a treatment room had low light (and variously medium, high, etc.). The problem was that the treatment and control rooms showed no difference—but both were more productive than they had been previously (and continued to be more and more productive in subsequent experiments until the researchers imposed such low lighting that the workers simply couldn’t see). When the befuddled researchers finally talked with the workers, they learned that the workers in both control and treatment groups, knowing they were being watched by the researchers, wanted to make their company look good, so they all worked extra hard. Voilà! The Hawthorne effect. As with so many classic studies, followup studies cast doubt on this story, but nonetheless it illustrates a serious point. Indeed, lots of ethnographers talk about this challenge because they are hyperaware of how their presence changes things (see, e.g., Duneier 1999). For example, if you are listening to or watching police interrogations, you might worry that the police officers are not proceeding as they would if you weren’t in the room. Particularly because universities have a reputation for being liberal enclaves, there is the possibility that people might sanitize their behavior or speech (e.g., Leo 1996). Consequently, people can be skeptical that ethnographers are picking up “the truth.” To counteract this critique (and also convince themselves about the veracity of their data), ethnographers often report how things changed over time as their participants got more comfortable with their presence, sometimes even forgetting they were there or that they were researchers. This then creates further problems including ethical dilemmas: If people forget you are a researcher and do something in front of you that they wouldn’t do if they remembered you were a researcher, is it ethical to include it in your study? To avoid those problems, some scholars aggressively remind

Mapping out the Route101

their participants that they are researchers (e.g., Contreras 2012; Stuart 2016). So there is always some sensitivity around the question of how your presence affects your data. Beyond analyzing living participants, though, document analysis can have related problems. Here, historians are really good about thinking about these issues. What gets written down, by whom, and for what audience? Here, it’s not so much our physical presence but the presence of others in the writer’s imagination. Letters, reports, newspapers, and other documents are always written with some audience in mind, which can shape what those texts’ authors include and how they include it. The prison administrators I studied did not admit publicly to violating the system they so defended—that would have hurt their claims that the system was superior in every way. I only knew about these violations because I had access to private documents as well as to the documents written by non-administrators (some of whom had an ax to grind, and others were administrative allies). Ultimately, in both cases—ethnography or history—using multiple sources or getting different perspectives on the same thing (aka triangulation) is a helpful trick for dealing with this issue. Of course, we can never fully overcome this challenge (no researcher can), but we can take steps to ameliorate it. That’s why it’s good to develop a conscious awareness of these possible issues.

A New Headspace: Moving Away from a Linear Model of the Research Process I used to tell people that research design was the most important part of the process. What I really meant was that your case selection and sampling (things I’ll address in the next two chapters) have to be perfected early on. For quantitative research and some types of (more deductive and literature-driven) qualitative research, that’s true—there’s no point in analyzing your data before you know it’s good data. And whether you’re a quantitative or a qualitative researcher, if you collect your own data, it’s extra important: in general, you want to get your research design nailed down before collecting your data or else you might waste your own and others’ time, energy, money, and other resources. But with so many types of qualitative research projects, particularly those projects Dirtbaggers like, you aren’t going to perfect your research design early on. There’s more flexibility built into the process: You don’t need to know—nor can you know—all the details of your research design before you start. The problem is that a lot of advice out there doesn’t recognize this reality. A lot of

102

Chapter 5

advice aimed at qualitative researchers is adapted from quantitative research and the more deductive and literature-driven qualitative research. And that just doesn’t fit the type of research I like to do—the Dirtbagging variety.

Hangdogging Sport climbing, a transplant from Europe, entered the US climbing scene in the 80s. It refers to climbing up fixed routes (the protective bolts are already placed so you safeguard your falls by just clipping in as you go). These routes tend to be more difficult and require more gymnastic positions and really good form—what climbers call “technically demanding” climbs. Sport climbers can do really amazing things on fairly short (usually single-pitch) routes. But because of the difficulty level of this impressive climbing, a lot of climbers will descend from the top from a fixed anchor and work the moves of each section until they have the whole route dialed in (mentally and physically memorized). At the time, some of the more traditional climbers saw this as cheating and referred to this process pejoratively as “hangdogging”—because you are hanging on the rope, often repeatedly falling and reworking the move until you get it. Certainly, this process is not as cool as “flashing” the route (climbing it cleanly in one go, without falling or stopping once, on your very first try), but in the end, what really matters is that you send it—and, really, that you are sending a fucking awesome route. For some people or for some projects, it’s just going to work better to hangdog a bit. (Almost no one flashes really hard routes!)

Hangdogging is a lot like how Dirtbaggers set up their research design. You might need to go into the field or hangdog repeatedly and get a better look, try some things out, see that they don’t work, and take a while to really understand the moves that are required. Sure, it would sound more awesome if you had figured it out from the start, but that’s really hard to do, especially with more difficult, complex, or innovative projects. Exactly when in the process you nail down your research design reflects a larger distinction between Dirtbaggers and other types of researchers. Quantitative and some types of qualitative research tend to look to the scientific method for inspiration (and legitimacy), but that model doesn’t fit the more Dirtbagging type of qualitative research. Interestingly, the scientific method doesn’t really fit any research—in borrowing the scientific method, we’ve

Mapping out the Route103

borrowed some of the myths that go along with it. Even “hard science” researchers don’t follow the scientific method as perfectly as we sometimes believe. But that’s another story (Latour 1986). Still, these myths are the root of the problem. A lot of people seem to think that research is a linear process in which you do one thing and then the next. First you come up with a research question, then you design your research (originally, your experiment), then you collect your data, then you analyze your data, then you write up your findings (Figure 5.1). You might have noticed that I’ve even set up this book replicating that general process so the outline is like a roadmap. This roadmap is an idealized description of the research process, but the reality is not so clean. More often, each stage in the research process will loop around and intersect; it’s much more iterative than linear, messy than clean (Figure 5.2). As we’ve already seen, the research question “phase” can basically span the entire duration of your project, right up to the writing stage. So while this is a good map to follow, at least loosely (it’s often good to try to do or start certain steps before others), we don’t have to make our creativity and curiosity subservient to it. Importantly, how carefully you need to follow this map depends on the type of research you are doing. If you are doing research that is closer to the normal science side of things—narrow research questions, working with existing theory (or theories), more deductive than inductive, and particularly comparative research projects (as in those projects comparing two or more cases in order to make claims about why they differ)—then you want to hew fairly closely to the map. But for a lot of other types of qualitative research—particularly the type to which the Dirtbagger is attracted—there is a lot more flexibility. How do you know which group you’re in? How carefully do you need to follow the map? Do you need to finalize your research design before going out into the field or is it okay if you hangdog a bit before finalizing your route?

The Virtues of Broad Research Questions and the Dirtbagger Way of Research Design Your research design will depend entirely on your research question. I mean this in two senses: First, your research design is solely created to help you answer your research question. Second, just how important your research design is depends on how narrow your research question is. In general, the narrower your question, the more important your research design will be. Basically, if you want to use a conventional, narrow research question that involves analyzing

Figure 5.1 How People Think the Research Process Works. In an idealized model, mostly one drawing on the scientific method, research is linear with one stage cleanly following another. Real life does not follow an idealized model.

Figure 5.2 How the Research Process Really Works. In real life, research is messy, iterative, and nonlinear, with each stage intersecting and interacting with the others.

Mapping out the Route105

the relationship between two variables, your research design is hugely important. But your research design is less important—still important, but just less important—if you use a broad research question. For example, if you want to study how social control operates on a subway, that’s a fairly broad question. You’ll want to make sure you think through the various steps of your research design, but you have a fair amount of flexibility—the project that ultimately results will be guided by some combination of your interests and what data you come across. If, however, you are interested in the relationship between time, place, and social control—for example, differences in day/night, busy/slow commuting times, and the “safe”/“dangerous” train lines—you need to design your study fairly carefully. Your decisions about case selection (what subway you choose to study) and sampling (which pieces of data you actually collect) will matter more, and people will (perhaps justifiably) worry more about selection bias and generalizability. Another way of thinking about this is to ask whether you are interested in discovery (coming up with new theory or theoretical insights) or essentially hypothesis testing (working with and evaluating existing theory). You can absolutely do a combination of the two, but to illustrate my point, it helps to think of them as two separate options. The former (discovery) involves a lot more flexibility—it’s not a free-for-all, but you can wait to narrow things down and add in checks. The latter (hypothesis testing) requires more careful planning at the outset and sticking to the script you set for yourself. In either case, this doesn’t mean that you can’t, over the course of your research, inadvertently discover data you can use for theory testing, of course—serendipity is totally a thing. It just means you need to be more careful if you are hoping to test hypotheses. I think this difference is why I like the Dirtbagging approach. The questions we use tend to be broader, so there is just a lot more flexibility built in. We don’t often need to do it a particular way. It’s more open-ended—you’re out there seeing what you can find, discovering new variables, new pathways, new mechanisms. You can kind of play around, using different frameworks and theoretical lenses, and ask, “When I use this theoretical framework, what do I see?” and then, “When I use that theoretical framework, what do I see?” You can repeat this process until you find something interesting. 7 (And just to be clear, I don’t mean you can use different frameworks until you find evidence for something politically satisfying; if you are interested in learning and discovering new things about some time, place, people, or process, that’s not the same as being out there to prove something.)

106

Chapter 5

Because of this broad, flexible approach, much of the heavy lifting comes in later. After you’ve already collected or even analyzed some of your data, you do additional checks to make sure you didn’t get “off course” (so to speak)—that is, that you aren’t accidentally and unknowingly making shit up, or setting yourself up to do so. By comparison to other approaches to research, though, the stakes are just lower: If you mess something up early on, you can usually fix it later. It might mean more work, but you probably haven’t ruined your project. Now, just because I’m saying you don’t necessarily have to put the cart before the horse and map out your research design before you start the data collection and analysis phases, this doesn’t mean you can entirely skip research design. Every study has a research design, even if you don’t think of it that way or consciously design it before your study (e.g., data collection) begins. And I’m not saying you can throw this chapter’s concerns out the window. To repeat and foreshadow some important points: You don’t always know at the outset if these issues will or will not be important (you might know they will be super important, but you don’t always know they won’t be important). Either way, these are things you should think about—at the outset before data collection, during data collection, during data analysis, and when you write up your article or book chapter. (Notice I’m saying “think” about—if you’re thinking about them at every stage in the process, you’re not going to have them nailed down in the early stages. And if you think you have them nailed down, think again: You should always be evaluating these things.) Why? Because you always want to be aware of your limitations, and the first step to being aware is actually determining whether or not these considerations are limitations. Finally, people evaluating your work will often want to see that you talk about these things—in your article, book, dissertation, job talk, conference presentation, policy presentation, or policy report. *

*

*

Research is a messy process. There is a map, but Dirtbagger that you are, you aren’t really going to stick to it. You might ignore the map and end up designing your research intuitively, basically making decisions as you go without consciously designing your study. Or you might try to follow the map, but you’re also pretty clear that things are going to change in the field, so you’ll get off track and that’s okay. Either way, later on in the process, you’re probably going to go back to think about those decisions you made implicitly or out of necessity and work on making sure they are justifiable. This is particularly true

Mapping out the Route107

when people ask you annoying questions about selection bias and generalizability (such questions can be legitimate, but I say annoying here because so often people ask these types of questions as gut reactions to qualitative work) or about your study’s internal validity. The next two chapters talk about the two most important aspects of research design: case selection and sampling. You can read these chapters as a range of options from which to choose, or you can think of them as containing possible scripts for justifying choices you have already made. And if you can’t actually justify your research, these chapters contain fixes you can add in later to remedy the situation.

6

Starting on the Right Foot Making and Justifying Your Case Selection

Fo othold A foothold is a flat or steady area on which you can put your foot when you’re climbing. Part of the trick to rock climbing is learning to find good footholds. Vocabulary.com (2020)

Finding Your First Foothold When you are just starting a particular route, both feet are on the ground until you lift one up to a foothold, usually after placing your hands on holds above. If you are relatively new at climbing, or it’s a particularly tricky start, you might place your foot on a hold and it might feel pretty good—only to realize, once you try to place most of your body weight on that foot, that it’s not going to work. You try to reach for another handhold and see that starting with your left foot makes it harder to reach a particular hold with your right hand. It would have been better, perhaps, to start with your right foot and point your left foot off to the side to give you some balance when you reach with your right hand. It sounds more complicated than it actually is. The more you climb, the more you can figure out your footwork intuitively. Somehow you just know, without thinking about it, to start with one foot or the other. But it’s also not that big of a deal if you start with the wrong foot. You just get down (you only have one foot on the wall) and start again. Of course, if you want to flash a route—that is, climb it in one go, without any restarts or falls—then choosing the wrong foot to start on ruins your chances. But you’re not going to flash every route, and that’s okay. 108

Starting on the Right Foot 109

Case selection—figuring out what specific site or group or event you are studying—is kind of like figuring out the first foothold. It’s something you can figure out just by looking at a route—and sometimes you can’t. But it’s also something you can start to intuit and get much better at over time. Most importantly, it’s something you can restart if necessary; that won’t be super efficient, and it could even be embarrassing, but you can do it. In general, though, it’s better if you can figure it out correctly the first time. It’s also something everyone has to do—even though some climbers don’t think about it, because they take that part for granted and are more interested in other parts of the climb. Likewise, some qualitative scholars think case selection isn’t that important for their project because they don’t do case studies. But really, everyone is using at least one case, whether they think about it that way or not. *

*

*

This chapter reviews the various considerations that go into case selection. And before I lose you, trust me—you have (or will have) a case (or at least one). We start with some strategies for figuring out how to select a case if you are in the design phase and don’t know which case(s) to choose. Then we turn to the various types of cases we use in social science; each type comes with its own justifications for why you might choose this case and not that case. If you have already selected a case, you can turn to these justifications to rationalize the case you wanted to study all along. But thinking about these justifications can also remind you about the limits of the type of case you have selected and thus what you can (and can’t) claim with your study. As we’ll see, the type of case you choose will substantially impact what you can do with your project and what type of relationship your study will have with existing theories.

Different Approaches to Case Selection Different types of scholars have very different approaches to case selection. I find it funny that so much quantitative work spends so little effort justifying their cases beyond a pro forma explanation. As mentioned in Chapter 2, people rarely seem to make a fuss about the fact that their sentencing disparities study came from “just one” state or federal district. But with qualitative work, case studies on “just one” state—or a smaller unit like one neighborhood, hospital, school, corporation, and so on—is seen as problematic and not generalizable and therefore of limited utility. This goes double for “international research” (which usually means anything outside of the United States—and

110

Chapter 6

sometimes outside the UK, Canada, and Australia); basically, if you study the United States, people don’t typically ask you to justify that decision, but if you study another country, you do have to justify it.1 Overall, there tends to be a smaller emphasis on case selection in quantitative research than in qualitative research. Even within qualitative methods, though, we see big differences in approaches to case selection, especially in how carefully people think about it as a specific stage of research design. For example, in their useful guide to fieldwork for ethnographers, John Lofland and coauthors spend very little time on case selection. They assume (rightly from what I can tell) that most ethnographers are going to start with “curiosity about a topic and access to settings and people”—and this might result from one’s “personal experiences or opportunities that provide access to social settings” (Lofland et al. 2006 [1971], 9). For example, I might want to go out and study social control on the subway after (a) riding the subway every day to work, (b) after getting harassed during my first trip on the subway, or (c) after hearing about a friend getting ticketed on the subway for manspreading—or any number of other reasons. Lofland et al. do not talk extensively about how to strategically select a site or a case, then, because they expect that a lot of folks have already selected their site based on some deep curiosity and interest in the goings on there. (Ethnographers are definitely Dirtbaggers.) Moreover, in many ethnographic projects, we don’t have to be overly strategic in selecting our site because the types of interests we have are fairly general: We’re not making a survey of subway riders’ demographic characteristics or looking at the correlation between riders’ occupation and frequency of use or something we’d be better off studying with a survey. We’re interested in understanding how social control works on the subway, which we could study in the London Underground, Toronto’s TTC, the New York City Subway, the Chicago L, the DC Metrorail, LA’s Metro, or San Francisco’s BART trains. While the technical details of the study might be different from one place to another, the overall concepts, processes, and mechanisms are going to be the same. Remember my example from the previous chapter: If you are interested in resistance (or what I call friction), it doesn’t really matter if you are looking at contemporary Norway’s immigrant prisoners’ illicitly spicing their food or prisoners in a nineteenth-century Philadelphia prison illicitly talking to one another, because at the end of the day, you see similar processes and it’s the processes we really care about. So that’s ethnographers (and a lot of other folks, too).

Starting on the Right Foot 111

By contrast, qualitative political scientists tend to take the opposite tact.2 A lot of their projects are explicitly designed to engage in a combination of theory testing and theory generation, with an eye to the former. Consequently, they tend to prefer comparative, and especially multi-case, studies (although not exclusively). So they need to pay closer attention to case selection to ensure they have the right case(s) to test their theory of interest. Since these sorts of projects tend to start out with a narrower research question related to a specific theory, they need to be more careful at the outset about their case selection. Additionally, since they do have a very specific (rather than an open-ended) goal with this type of research, they want to be as efficient as possible; it would be pretty awful to collect all their data and realize that they really should have used a different case to be able to make the types of claims they were hoping to make. Now, as a Dirtbagging social scientist, that’s not really how I roll. I’m more like the ethnographers who choose something because of some inarticulable interest, and I frequently don’t have a specific goal when I set out. However, I do find it helpful to think about the standard ways people go about selecting their cases. Also, sometimes I do want to use a narrower research question—I’m not required to always follow the same model of research for every project—so it’s especially useful for those projects to set up a tighter research design and to think more carefully about my case selection. What if you don’t even have a case because you’re not doing a case study? Maybe you are interviewing 100 people scattered across the country. You still have at least one case. How did you decide what group or groups from which you would sample those 100 folks to interview? Why that group and not another group? (For example, why graduate students in elite universities and not the many other schools?) What country did you choose and why? You might also think about your case more conceptually. Maybe you are asking people about their dating lives because you want to better understand the processes underlying romantic pairings. Dating is a case, but you could also look at other types of romantic pairings—here, each of those different types of pairings is, conceptually, a case. So even if you’re not focused on a specific place, you can identify the case in your study. This is important to think about because it will affect how you think about your study’s contributions and limitations. Whether you feel like a Dirtbagger or a more conventional type of social scientist, using the language of the different approaches I discuss in the following can help convey the legitimacy of your research design. You might have

112

Chapter 6

intuitively selected a particular case to study because it was interesting; but in thinking about how to sell your study to other people—that is, how to dress it up so it gets invited to the party—you need to be able to explain what type of case it is (What clique are you addressing?) and how or why you selected your case. Is that a bit dishonest to come up with your case selection after you already selected your case? Perhaps. But a lot of people do it. And when it comes to things like getting a tenure-track professor gig after grad school, getting your grant proposal accepted, getting your paper published, or presenting your research to policymakers, people expect you to be able to say why your case should matter and why it’s a good case for your research project.

Mapping the Terrain: What Is the Universe of Cases? In making (or later justifying) your case selection, you always need to keep an eye on the larger universe of cases. The universe refers to the full collection of the organizations, groups, states/provinces/countries, neighborhoods, hospitals, prisons, subways, events, or whatever else you are planning to study— essentially, all the cases you could potentially choose to study. For example, in my study of Eastern (that unique prison), my universe of cases was all the prisons in the United States. Since this is actually a little more complicated than it sounds, I’ll add that I mean all “modern prisons,” which refers to prisons built between 1821 and 1860 (there are some modern prisons built later, but I put them in a different category—plus, I was most interested in the period before the US Civil War, which began in 1861). This comes to about 30 prisons.3 So that is my universe. Figuring out your universe depends on your research question and what sorts of claims you want to be able to make. For example, let’s say I’m interested in studying discrimination against Latinx Americans in bail hearings. I might be interested in how they are treated differently or similarly to people from other racial or ethnic groups (e.g., non-Latinx White Americans, African Americans, Asian Americans) going through the bail hearings. My universe, then, would include all those racial and ethnic groups. But maybe I’m interested in how Latinx people going through bail hearings changed before and after a particular person was elected president or before and after the passage of a particular law that increased the arrest rate of Latinx people. My universe, then, might be Latinx people going through the bail process in all years, including before and after whatever big event I think might have caused a change in their treatment. And maybe my interest is really about how bail proceedings

Starting on the Right Foot 113

magnify certain inequalities in ways that I think are less clear in later stages of the criminal justice process. Then my universe might be the various stages of the criminal justice system from arrest to sentencing (sticking to what is sometimes called “front-end” criminal justice) or even beyond to release from prison and reentry into the community (that is, the “back-end” of criminal justice). For a lot of qualitative research, place will also be an important consideration regardless of your other conceptual or categorical concerns. In addition to the questions we discussed above, I would have to figure out where I’m going to study discrimination against Latinx people going through bail proceedings. Let’s say I decide I want to study discrimination against Latinx people in Arizona because I have some intuitive sense that Arizona would be a good place to study this. Maybe I know there are some counties with a large Latinx population and some counties with a very small Latinx population, making it great for comparative purposes. Maybe Arizona has some interesting state laws that have negative effects on the Latinx population, and I want to see how some law plays out in practice. Maybe I think Arizona’s unique history is fascinating, and I have a hunch it will make for an interesting study. Will I study this on the state level or at the county level? In one court or several courts? In the court(s) only or in the newspapers and so on? This all depends on my universe, and my universe depends on my research question and my goals. If I’m interested in some very specific things at the outset, such as the role of particular variables or comparisons to other places, that will shift my universe. For example, if I’m interested in how the treatment of Latinx people in Arizona is different or similar to other states, then my universe is all the US states. If I’m interested in the role of some court-level factors (like the county population, the demographics of judges and lawyers, or if the court is situated in an urban or rural locale), then my universe would be all Arizona courts (in various counties or cities) or perhaps all US courts (depending on how I phrase my research question). Maybe I’m not interested in comparison or specific variables I pre-specify, and I’m just interested in looking at discrimination in this setting. I could still go either way—have a universe that is all US states or all Arizona courts/counties/cities. To figure it out, I need to ask, “What was it about Arizona that made me think it would be a good/interesting setting or case to study discrimination against Latinx people?” The answer to that question will help me figure out my universe. Here’s a twist: The answer might be both (that is, all states and all

114

Chapter 6

counties). That’s right. You might actually have two universes. You might need to figure out why Arizona relative to other states and why a particular county or set of counties and not others. Remember what I said previously about sampling happening at many levels? Case selection was part of that discussion. Now, let’s say, Dirtbagger that you are, you have already selected a case intuitively. You know you don’t want to study the whole state of Arizona; you know exactly what court in Arizona you want to study, and you know it’s going to be a doozy. You still want to go through the exercise of thinking about your universe at some point—the earlier the better, but definitely before presenting your findings in some public forum. Specifically, you want to think about what other cases you might have selected and what the relationship might have been between the case you are interested in—whether that is Latinx people, bail hearings, or this Arizona courtroom—and those other cases you might have selected. Finally, there are some situations where you really might think the universe is not important. Maybe it’s not going to have a big impact on how you go about selecting your case. For example, in the social control on the subway project I keep talking about, I’m probably just going to go with my local subway. But, trust me, at some point, you’ll need to justify why your local subway is a good place to study social control (or why subways—as opposed to, say, public buses or busy sidewalks). And you want to know for yourself, as well as for skeptical audiences, that it’s not completely different—or if it is (or might be), how. So it’s a good idea to be able to say something about what subways are like across different places and how your subway compares along some general dimensions like frequency of riders, rules and regulations, or whatever things you think are salient.

What Do You Need to Know About That Universe? Why do I need to know about this larger universe—and what do I need to know? A lot of the case selection strategies we’re going to discuss assume some level of knowledge about these other cases to help you figure out which case(s) to select (or how to justify your already completed case selection). For example, before I can say I’m selecting a pretty unique, interesting, or deviant prison, I have to know something about the other prisons—at least enough to know what a “normal” or “typical” prison would look like. Or before I can say I’m studying two very different courts to examine how certain factors affect Latinx

Starting on the Right Foot 115

people’s treatment in bail hearings, I have to know if those courts are actually different in the relevant ways for me to make those claims. The type of knowledge you need is somewhat basic, but generally relates to specific variables of interest. Talking about variables in qualitative work can feel a little weird, but here I mean things like the cases’ characteristics or the context in which they are situated. Sticking with my study of Eastern, at the very least, I want information on the variable “prison routine.” All modern prisons in the period I’m looking at followed only one of two routines: the Pennsylvania System (essentially solitary round the clock) or the Auburn System (essentially solitary just at night and factory work during the day). Since people took prison routines really seriously, this is an important variable. I could also look at other variables like prison governance (was the prison run by an appointed warden or an entrepreneur contracted out by the state) or whether the prison was primarily urban or rural or some other things. But, going back to my research question, the dominant variable for my study was the prison routine. You might be thinking at this point, “How am I supposed to know some of this other information when I haven’t even collected my data yet?” Keep in mind that we won’t have fine-grained detail on the variables of interest— otherwise we wouldn’t necessarily need to do our study. And there is a bit of fudging going on: Realistically, prisons followed lots of different routines in practice, but they officially followed only one of the two I mentioned. It’s this sort of rough descriptor that helps give us the context we need. And a lot of that comes from the literature we’ve already read on our topic and field. If you’re coming up with blanks on these sorts of things, it’s time to go back to the literature or do some preliminary research. Let’s say I want to study capital punishment in the United States, and I want to focus on one state (but I don’t yet know which state I should study). It would be helpful if I knew something about the sorts of ways capital punishment varies across states. That would let me figure out, at a minimum, which states are typical and which are deviant. It will also help me determine if some states seem to group together in various ways (e.g., this set of states all currently authorize execution by hanging). But I don’t know any of these details yet, let alone what variables will be interesting or relevant to my interests. To figure out what factors or variables I can (or should) pay attention to, I’m going to need to do a bit of background research. Reading some books or articles on capital punishment would reveal that some big variables are method of execution (lethal injection is the most common method during the last few

116

Chapter 6

decades, but some states still allow other methods), size of death row (some states have huge backlogs of people awaiting execution, and some states have fairly small backlogs), and the frequency of executions (some states execute fairly frequently, but other states have the death penalty but don’t actually execute people due to moratoria or other reasons). While you can find this sort of background information in academic books and articles on your topic, these sources won’t necessarily provide an inventory for each of these variables: For example, you might be able to pick up that California has a really big death row and Texas executes frequently, but you don’t know every state’s status in these categories. For that, you need to do a bit more research. Usually, you don’t have to collect original data on these things because often other people have done this for you—either government or NGO reports or studies in the literature already map the terrain for us. For example, I can fill in all those variables above by looking up a report, conveniently titled “Capital Punishment” released by the Bureau of Justice Statistics on a semi-annual basis. I can supplement that by looking at the Innocence Project, the Death Penalty Information Center, and the Legal Defense Fund of the NAACP. In general, federal, state (provincial), city, and nonprofit organizations, as well as the existing academic literature and quantitative surveys can be useful sources of this type of information to help you fill in the details of your universe, at least somewhat superficially.4 Let’s assume for a moment that I really can’t find the information that lets me map the universe. It turns out I’m interested in a variable or set of variables that no NGO or government agency is keeping track of publicly, or they are, but it’s in a dataset rather than a report. In that case, I might want to start with a project that actually lets me map the field—if it doesn’t take me too far from my interests. For a dissertation project, you might think of this larger terrain- mapping project as your “Chapter 2.” Basically, you do the initial legwork— collect the necessary data to summarize the relevant variables that the literature thinks would be important for each possible case. Once you map the terrain and get a sense of the variation in the universe, you can use these early analyses to make and justify your case selection(s).

Selecting from a Typology A standard way to select your cases is to produce a typology that efficiently summarizes the cases in your universe according to the variables the literature says are important. As we discussed in an earlier chapter, typologies (similar

Starting on the Right Foot 117

to taxonomies) are an enumeration of all the cases in some universe based on where they fall according to some set of variables. Visually, this might look like a 2x2 table (or 2x3, 3x3, . . . , 4x13, etc.) in which you list the relevant contextual variable(s) as well as the outcomes of interest. Previously, I gave the example of US modern prisons, which I divided based on when they were adopted (before or after a particular year) and various variables I thought might be useful to help explain why they would be adopted earlier or later. I made a different typology for each explanatory variable such as region, the status of slavery, or the status of capital punishment, ultimately finding one variable (proto-prisons) that worked. Typologies are a helpful way of systematically mapping out the universe of cases. You can see, all at once, the variation in your universe according to the major variables or categories of interest. But typologies are also fairly simplistic tools—2x3 tables work, but when we get to 2x2x3 and beyond, things get messy. If you limit your focus to one or two sets of variables, you can stick with the simpler kinds of typologies. Certainly, you can experiment with multiple typologies and think about which combinations of variables you find most interesting (and why). But there is usually some overarching concern either in your interests or in the literature that can guide your thinking. My typology for my book on the unique prison was pretty basic: There were two approaches to incarceration in modern prisons. In fact, I didn’t actually make a typology because it was clear enough without making it. But if I had made one, it would look like Table 6.1. There were only four prisons that adopted the Pennsylvania System (the unique approach using long-term solitary

Table 6.1 Simple Typology for Case Selection. This typology sorts prisons, identified as “o,” into whether they adopted the Pennsylvania or Auburn System (columns) and whether they maintained it for a long period or a short period (rows). The lower left cell indicates only one prison retained the Pennsylvania System for a long period, a “deviant case.” Pennsylvania System Short Term

ooo

Long Term

o

Auburn System ooooooooo ooooooooo oooooo

118

Chapter 6

confinement), and only one of them (my prison) retained it for a long time. All other prisons followed the Auburn System (collective factory work during the day and solitary confinement at night). This typology helps to visually illustrate one of the main reasons why I chose that prison as my case. In practice, though, there are a lot of reasons why you might select a case. Methodologists who specialize in case studies have repeatedly inventoried the reasons for why one might study a particular case and not another. Implicitly or explicitly, a lot of these standard approaches usually assume some sort of typology underlying your case selection. Before reviewing these reasons, we have one more task: figuring out how many cases you will select. That’s because these reasons depend on how many cases you want to study.

One Case or More? It’s time to decide whether you are going to do a single-case study (where you have one case) or whether you are going to do a multi-case study (where you have two or more cases). You might already have a kind of intuitive feeling one way or another. If not, read through the strategies that follow and see which of them trigger some ideas or feelings like, “Yes, that’s what I’m hoping to do,” or “No, that sounds awful.” One consideration I should mention is that if you are interested in looking over a long period of time, you probably want to stick with a single-case study or with a small number of comparison cases. Another consideration is that if you are really interested in testing specific theories, you might be better off with two or more cases. But it’s hard to make strong rules about these things because qualitative methods are just so beautifully flexible, as we’ll see. Once you decide if you want a single-or multi-case study, then you need to think about what exactly your selection strategy will be—that is why you are choosing this case (or these cases). The next two sections list the various strategies you might use. Think of this list like a menu: You don’t have to order everything (you could be a vegetarian and not want the meat dishes).5 But it’s also good to be familiar with the various strategies because other people will be familiar with them and might be confused about why you aren’t choosing one of their favorite dishes. In assembling this list, or menu, I’m drawing a lot on other scholars’ lists of standard case selection strategies. Jason Seawright and John Gerring (2008) provide one of the most useful inventories describing the major strategies for case selection. I build on their discussion, and I also draw on a few other

Starting on the Right Foot 119

scholars (including George and Bennett [2005]) who have written about case selection. But I also depart from their descriptions at times. That’s because their discussions, like other discussions of standard selection strategies, are a bit too narrow and rigid for Dirtbaggers like me. It’s not a bad thing; it’s just that we have different goals and approaches. Seawright/Gerring and George/ Bennett tend to be primarily interested in causal inference and relating qualitative methods to quantitative methods, finding similarities across the two approaches. Consequently, I sometimes disagree with their interpretation or description of some of these strategies. And I think they would agree. For example, Seawright and Gerring explain case selection following “a fairly narrow definition” of case studies. For them, a case study is “the intensive (qualitative or quantitative) analysis of a single unit or a small number of units (the cases), where the researcher’s goal is to understand a larger class of similar units (a population of cases)” (2008, 296). Of course, as we discussed previously, you might not really care about the population (or what I’ve been calling the universe), or you might wish to speak about more general processes (like resistance or social control) such that your particular case could really be anything where you see that process. Also, I’m using the idea of a case study more broadly than is typically defined to emphasize that we always need to think about the role of selection, at multiple levels, in our studies and what those choices let us do (or not do). Underlying this discussion of selection strategies is that different strategies lend themselves to different purposes. You shouldn’t select a deviant case when you really need a typical case to be able to make an empirically generalizable claim; you shouldn’t use an influential case when you’re better off with an extreme case. The goal is to know what each strategy lets you do and when you should use it, rather than to simply use one of these strategies because it sounds impressive.

Match Your Gear to Your Discipline Different disciplines of climbing have a lot in common, but they also have some pretty big differences—so big that if you plan to do one discipline of climbing but end up doing a different discipline, you won’t be prepared, and the consequences can be devastating. Let’s say you are prepared for sport climbing, but you want to take on a big wall. It’s not that you don’t have the skillset—a lot of the climbing moves will be the same—but you do need different gear, and you need a very different approach. A big wall might take you

120

Chapter 6

days; a sport route might take you a minute. You need a full rack filled with nuts and cams and a lot of rope to help you up a big wall, but with a sport route you don’t. The mismatch you face between your gear and the discipline your route requires is like the mismatch between your case selection and your project goals. If you wanted to do theory testing but you selected a case out of sheer interest, you’re going to have to turn back, and you might not even be able to finish the project you set out to accomplish.

Options for Single-Case Studies Single-case studies are actually my favorite because they offer so much flexibility and opportunities for discovery—and fun. We have to be a bit more buttoned-up when we do the multi-case studies. There are also just so many more options for single-case studies. In fact, there are several options I’ll talk about that the standard lists don’t necessarily include. If you’re not into single-case studies, you still want to review these options for at least two reasons. First, some of the logic will transfer into the multi-case studies. Second, you might select two or more types of cases from this list—say, a typical case and a deviant case. One more thing. Sometimes people assume single-case studies can’t be comparative, so some scholars—who want to emphasize the comparative appeal of single-case studies—have renamed them “within-case” studies. For example, if you are looking at a long period, you are looking at how things within your case change over time. Or if you are looking at different people within your case, you might be comparing their behaviors, statements, and so forth. The point is you can do comparisons with single-case studies (which is also why I don’t like the term “comparative case study” as a synonym for a multi- case study—it implies only multi-case studies are comparative). However, I don’t use this “within-case” label consistently because you don’t have to do a comparison. You have a lot of choices here.

Typical/Representative A typical or representative case is pretty much what it sounds like. It’s a case that is similar to other cases we might select from the universe of cases we could select, or at least that minimizes the differences from those other cases. To put it in the language of variables, if we are interested in studying Y, then we

Starting on the Right Foot 121

want to find the most common value of Y and then select a case that has that value. For example, if we wanted to study capital punishment today, and we have particular interest in studying the method of capital punishment, then we might consult the latest report on capital punishment to learn that lethal injection is the most common (although not the only available) method. If we are interested in studying the relationship between X and Y, then we want to find the most common relationship between X and Y, and we want to find a case that has that relationship. Because this case is representative of other cases we might care about, it is also expected to be generalizable to those other cases we aren’t directly studying. By that, I mean if we come up with a new theory from our data on this typical case, that theory can reasonably be expected to be true and observable in many other cases. Additionally, typical cases are often good test cases. That means, if we test some existing theory on our data from our case, and the theory really is true, that should be clear from our case. This case selection strategy is what a lot of people will assume we are going for, perhaps because people think the representative or typical case is the best (or maybe only) type of case we should study. They would be wrong. Indeed, very little research is actually done with the representative case. For example, most US-based research is based on some site within the United States or a dataset based on Americans. But the United States is not representative of the rest of the world—it’s not even representative of the western world or North America more specifically. Some US-based research might seek to be representative of other settings within the United States, but as our colleagues from other countries (or locals who study other countries) point out, we rarely acknowledge this implicit assumption.6 Moreover, what constitutes a typical case can be tricky if you have a heterogeneous population. I think this is harder the bigger the level you are working at. For example, we can probably imagine pretty easily what a representative public hospital might look like or a representative private school. But what exactly is a representative state within the United States? What is a representative country? For example, if we wanted to study capital punishment, in the 1950s or 1970s, what would be a typical state? One that is slowly getting rid of capital punishment, as many states were doing at the time? One that uses the electric chair or one that uses the gas chamber—since both were pretty popular? One that has a (relatively) big death row or a small death row? Since there were a few states that fit any combination of these factors, we could select whatever combination actually

122

Chapter 6

is most common. For example, if more states use the electric chair than any other method, most states have a small death row, and most states are slowly reducing their number of executions, then we could try to find one that fits all these criteria. If no such state exists, then find a state that fits most of these criteria. Or we might decide that we really need more than one case. But that requires a different, multi-case strategy. The point is typical cases are sometimes hard to find.

Extreme Moving in the opposite direction from the typical or representative case, the extreme case is “unusual.” Specifically, it is unusual because it has an “extreme value” of the X or the Y variable of interest (Seawright and Gerring 2008, 301). Sticking with the example of wanting to study capital punishment (say, in the 2000s this time), we might be interested in a state that executes a very large number of people (e.g., Texas, but for years one might have also looked at Virginia or Oklahoma) or a state that has the death penalty but rarely uses it (Oregon, Pennsylvania, Kentucky, Idaho, to take a few examples). We might also be interested in death rows, so maybe we want to look at states with very big death rows (California, Texas) or very small death rows (Washington, Colorado, Maryland, South Dakota, and several more). Ideally, we select an extreme case that has the most of whatever we are interested in so we can study it. For example, if I want to study the death penalty in practice, I should study a place where there is a lot of it, not a place where there is very little of it. If, however, I’m interested in what drives execution rates, I might be interested in studying a state with a very low execution rate to see what is keeping that state’s rates so low. I might actually want to study several states with low execution rates because there are probably different mechanisms working in different states: a governor’s moratorium; a legal prohibition from a high court; a buildup of appeals that need to be reviewed before executions can proceed; a limited drug supply for lethal injection; a low appetite for punitiveness among jurors, prosecutors, or judges, etc. I could also repeatedly perform case studies and compile a list of these mechanisms, stopping when I stop finding new mechanisms—or saturation. As Seawright and Gerring (2008, 301) note, an extreme case would violate the advice to not sample on your dependent variable (a critique we’ll discuss in Chapter 11). But that’s fine because that advice does not apply here. As this sampling strategy demonstrates, there are legitimate reasons to sample on the dependent variable when performing a qualitative study.

Starting on the Right Foot 123

Deviant A deviant case is kind of like an extreme case—in fact, there are some examples of cases that are both deviant and extreme—but, more specifically, it’s a case that is considered “surprising” or “anomalous,” especially in light of the empirical or theoretical literature (Seawright and Gerring 2008, 302). Here’s an example of one that is both extreme and deviant: If we wanted to study capital punishment worldwide, the United States would be an extreme and deviant case. On the worldwide scale, the United States executes a large number of people and usually only follows certain Middle Eastern countries and China. If we limit our scope to those countries deemed first world, western, or part of the Global North, or if we pay attention to certain treaty groupings, the United States is again an outlier either simply by retaining the death penalty (most such countries have abolished it, retain it for a very limited number of offenses, or have an active moratorium) or in terms of its execution rate (Japan, for example, retains the death penalty but uses it rarely by comparison). Thus, whether we are interested in an extreme case based on the sheer values of Y (if Y is retention/abolition, retention is an extreme value; if Y is the execution rate, a high execution rate is an extreme value) or a deviant case based on our theoretical expectations (capital punishment is currently surprising in a Global North country), the United States would be our case.7 I like another dimension of deviant cases, which is the way in which they successfully reject norms. Norms exist in all sorts of settings—from kids at school to organizations in a particular line of work to expectations of how politicians are expected to run for office. If norms are strong, how do we account for an aberrant policy, group, or organization that defies those norms, especially if they do so for a lengthy period of time? This is still a deviant case in the sense of theoretically unexpected, but it’s also a nice use of the term “deviance” in the way that term is typically understood—it’s a weird case (e.g., Becker 1963). And of course, this is the method I use in my book aptly titled, The Deviant Prison: My prison of interest was aberrant, exceptional, heavily criticized, and I seek to explain why it kept doing what caused it to be so different and criticized.

Influential There are two types of influential cases. First, a case can be influential in a statistical sense, which is what Seawright and Gerring (2008) mean. If we were to run a correlation or regression analysis, the extreme values of this influential

124

Chapter 6

case would shape the correlation statistic or regression coefficient; if we took this influential case out, we would get a very different correlation statistic or regression coefficient. (By contrast, you can remove an average case with little impact.) In the incarceration context, California would be that influential case. Recently, the overall US incarceration rate started to come down, but mostly because California’s incarceration rate came down (at least initially). That’s right: California incarcerates so many people that its trend is largely responsible for a national decline. In reality, there was no big trend away from incarcerating—states were pretty split (Turner et al. 2015). So that’s a statistically influential case: It’s either an extreme value or a very large part of the population, and either of those can shift the overall findings. But most of us don’t think that way, so instead I’m going to focus on a second type of influence—the kind that usually comes to mind: the bellwether or trendsetter. We Dirtbaggers (and others) love to study influential cases of this kind. In my field, there are lots of books written on the state that leads the way in some big kind of penal change. For example, there are lots of books on Pennsylvania and New York in the late-eighteenth and nineteenth centuries because these were bellwether states in those years (first Pennsylvania, then New York). In the twentieth and twenty-first centuries, our focus shifted to California and other sunbelt states like Texas, Florida, and Arizona (e.g., Cummins 1994; Irwin 1970, 2005; Lynch 2010; Perkinson 2008; Reiter 2016; Schoenfeld 2018; Simon 2014). Keep in mind that influential cases might have a different causal mechanism than non-influential cases. In fact, diffusion research and neo-institutional theory have this assumption built in: They distinguish between the innovators or early adopters (basically the trendsetters and their immediate acolytes) and the late adopters or laggards (the ones that are slow to follow trends or buck the trend). In some of my work, I’ve drawn on this theory to criticize our disproportionate attention to bellwethers. Focusing on bellwethers or influencers is problematic because, this theory suggests, there is a really big difference in why innovators (a kind of trendsetter or influencer) do something and why everyone else copies them. Basically, there are the innovators who come up with a solution to a problem; they are copied by some other early adopters who also have that problem, and they want to try out the solution. But over time, enough states (or other jurisdictions) adopt the proposed solution—which, it usually turns out, is not super effective at solving the problem. Those states that haven’t adopted the solution look like they are lagging behind and can be

Starting on the Right Foot 125

criticized, so they adopt it just to avoid that criticism, even though they don’t need the solution. Additionally, there is a difference in how different states use the solution—if they really needed it, they use it; if they just adopted it for show, they use it superficially (DiMaggio and Powell 1983; Meyer and Rowan 1977; Rogers 2003; Rubin 2015a, 2019; Tolbert and Zucker 1983; Willis et al. 2007). In fact, in my field of punishment studies, it’s kind of a running joke that about half the literature (okay, probably not actually half, but a lot) is on California. California is not just an extreme case—it has a very large prison population, and it tends to pass some extreme policies—but it’s also an influential case and not just statistically speaking—it’s often seen as a major trendsetter throughout the twentieth and twenty-first centuries. The problem is California is far from typical, and what we see happening in California, even if it’s influential, might not actually tell us much about what’s happening in other states. This isn’t an absolute problem—it’s just a problem now. I mean, the first few studies on California were really well positioned (and, honestly, the other studies continue to be well positioned). The issue is that, as a field, we’re not studying enough of the other types of cases to get a good, rounded understanding of penal dynamics. It’s a bit of a collective problem rather than a criticism of any particular study.

Theoretically Prominent/ Prominent in the Literature This method involves selecting a case not because it is the best case in which to analyze a particular phenomenon, but because it is particularly well known within the literature. Selecting this type of case is not a methodologically driven choice, so Seawright and Gerring (2008, 296) do not include this type in their list of standard strategies. In fact, selecting a theoretically prominent case is not a particularly common strategy—at least not explicitly. (It is actually pretty common; it’s just that people don’t usually explicitly say this is why they studied a case.) Nor is it often used as the primary justification for case selection, but it can be a consideration. There are two reasons why you might want to do it. One reason is to “revisit” that case. This is a strategy common in anthropology where scholars see if, upon further investigation, the original findings still hold (Burawoy 2003). The other reason to select a case from the literature is just because people tend to get excited when you study something that is familiar but say something new about it. So even if that’s not a methodological reason to select a case, it does position your study well.

126

Chapter 6

For example, sociolegal scholar and sociologist Katie Young (2014) examined the legal consciousness of Hawaiian participants in a recurring cockfighting game. Her award-winning article examines how these participants’ understandings or conceptions of law are shaped by those beliefs held by members of their group or community—what she calls “second-order legal consciousness”—an important contribution to the legal consciousness literature. But one of the things that makes this article particularly exciting is she is examining an important concept (legal consciousness) in a canonical context— the cockfight, enshrined for sociologists, anthropologists, and socio-legal scholars by Clifford Geertz’s description of a Balinese cockfight that opens his classic book on culture (1973). In fact, although Young is not testing Geertz’s claims, she does explain that the cockfights she studies are fairly different from how Geertz described them—which is not terribly surprising given Geertz’s theory about how games reflect the power relations of their overall society (Young 2014, 500). Young could have come up with her concept of second- order legal consciousness from any number of settings—which is part of what makes it such a strong concept—but her piece is extra enjoyable because it is so similar to a key example within a classic work. While I personally like this strategy,8 do make sure that people aren’t bored with the case because there is just so much done on it already. People aren’t always stoked to read another study about a case that they recognize from the literature. Remember all those people studying California punishment? Well, some readers are getting a bit bored (and a little mean) and express their opposition to seeing “yet another study” on California. I’m not one of them, but I’m from California, so . . . my view is not representative.

Random This is kind of a weird selection strategy for qualitative methods. In fact, Seawright and Gerring (2008, 295) do not include this type in their list of standard strategies; but they do explain why one typically wouldn’t use randomly selected cases. For mathematical reasons like those enshrined in the Law of Large Numbers and the Central Limit Theorem, random sampling works to produce samples representative of a larger population, but only with big enough sample sizes. If we randomly sampled one, two, five, or even ten cases, we would not end up with a representative sample. Random sampling is not magic. That said, there are times when we don’t really care if we use a representative case, and choosing randomly is probably fine (but boring). For example, if I want

Starting on the Right Foot 127

to study prison labor, and I know prisoners are put to work in all prisons (for the sake of illustration, let’s say that’s true—in reality, there are some important exceptions), I might go ahead and pick a state or a prison at random. I might end up picking a totally ordinary prison (to the extent there is such a thing), in which some prisoners work in a factory-like setting but more work around the prison in the kitchen, laundry, and so on. Or I might end up picking a really atypical prison like Angola, aka Louisiana State Penitentiary, which is housed on a former slave plantation, is home to a largely African American prisoner population (most of whom are serving life sentences and will die in prison), and primarily employs prisoners in fieldwork, for which they get paid a few cents per hour. Now, my analysis might differ substantially between these two settings. Most obviously, race might play a much bigger role at Angola, and the extent to which I consider the overtones of slavery would differ. These differences indicate the range of variability I could expect from randomly selecting my case from a highly diverse universe of cases. But some things might be similar across the two prisons: for example, the extent to which work helps to structure the day, creates a change of scenery, creates hierarchy among the prisoners, creates opportunities for resistance and the exercise of agency, and so on. If those are the things I care about, I could go ahead and randomly select a prison to study because it doesn’t really matter which prison I study. Of course, it’s probably better to select a typical prison—figure out what is most common and select one of the many prisons that fits that description. But I could randomly select a prison if I wanted to; I just couldn’t claim any of the benefits of random sampling that one can claim when doing quantitative data collection.

Convenient Good ol’ convenient cases—the ones you choose because they make your life easier, saving yourself time, money, or the need to be away from your home. People have strong (often negative) feelings about convenient cases. Indeed, Seawright and Gerring (2008) do not include this type of case in their list of standard choices either—because this too is not a methodologically driven choice. Nevertheless, it is a viable option. When should you use a convenient case? Your reasoning might be similar to when to use random selection—if it doesn’t actually matter which case you study (because the phenomenon you are interested in is pretty common), why not save time and money by studying it close to home or wherever you would rather study it (maybe you’d rather do some fieldwork in a nicer place?).

128

Chapter 6

Snow and Anderson illustrate this point in their study of homeless people in Austin, perhaps revealing a dash of frustration in having to justify their case selection when studying such a general phenomenon: Research proposals may make a case for why homelessness should be studied in one city or another, but in fact almost all academic researchers have chosen to study the homeless in cities where they have academic or research affiliations. Our case is no different. . . . Other rationales for the study could be provided— for example, most prior research on homelessness had been conducted in large, northern cities, so conducting a study in Austin offered an opportunity to learn about the problem of homelessness in a moderate-sized, sunbelt city—but they would be pure embellishment. The problem was at our doorsteps, it was regarded as a serious one, and our interest in the daily routines and survival strategies of the homeless could be explored as well in Austin as anywhere else: all reason enough for our study. (Snow and Anderson 1993, xii)

To further illustrate this point, however, Snow and Anderson explain after their opening vignette—a rich panoramic snapshot of life at the homeless shelter within Austin’s Salvation Army—it “could have occurred in any of America’s urban centers during the 1980s” (1993, 6). These researchers are right. Sure, the details of these people’s struggles and survival strategies may vary from city to city—Austin’s homeless residents may deal with a greater combination of sunburn, heatstroke, and rain whereas a different city might have a greater combination of snow, sleet, and hail, for example. Likewise, the exact composition of the population may vary—Snow and Anderson describe seven different types of homeless people, based on factors like how long they’ve been homeless and what led to their homelessness—and perhaps a different city would reveal an eighth or ninth category and omit one of the original seven. But the general tendencies of the research (for example, one of their central points that homelessness is actually a complicated phenomenon with many causes and does not represent a monolithic population) are expected to be largely the same across other sites. So even if the details vary, we really don’t care, because that’s not the important part of this research. People will sometimes assume that if you study something in your backyard, you did so purely for convenience and irresponsibly so. (The concern here is that your convenient case isn’t a “good” case—that is, one that is “generalizable.”) If you haven’t chosen a case out of convenience but it looks convenient,

Starting on the Right Foot 129

be sure to explain your motivations. And if you did select it out of convenience, be sure to address why this is not a problem—or what possible biases, if any, it introduces. Beyond choosing something conveniently located in your backyard, you might choose a case that is convenient in the sense that there is a lot of data. In fact, George and Bennett (2005, 69) explicitly say that this is not a valid reason for selecting a case. Again, this will depend on your project goals: If you want to test a theory, you should select a case that is the best available test case, not a case that has a lot of available data but is a bad test case. But if you are doing a new study on a concept, mechanism, or process that is fairly common, you have flexibility in what cases you can choose, so why not study the one that has an amazing amount of available data (especially if those data are nearby, such as online or in a local archive).

Interesting As with the convenient case, Seawright and Gerring (2008) do not include this type in their list of standard choices. George and Bennett (2005, 69) also say this is a bad reason to select a case. But again, really, where’s the fun in that? And research is supposed to be fun. First, it’s worth pointing out that many of the options we’ve already discussed can double as interesting cases—deviant, extreme, influential, and common in the literature are going to spark a fair amount of interest, and that’s a good thing. (Remember the Theory–Policy Matrix—if you have a case that people are intrinsically interested in or captivated by, that’s going to be a better project than one on an obscure or boring case.) Second, if you are doing the type of research that could easily be conducted on a random case (not because random sampling is good but because it really doesn’t matter what case you use if your underlying interest is so general) or on a convenient case, why not go for the interesting case? There are two caveats to keep in mind: First, if you select a case that is interesting to you and only you, that’s dangerous territory to be in (unless you are already selecting a case for another reason and it also happens to be really interesting to you and only you). It’s very easy to get lost in the trees and forget about the forest, or to think that your trees are the most interesting trees to ever grow on this planet, but to everyone else they are just trees. Second, if you have selected a case only because it is a really interesting case, but your goal is to test a specific theory, that’s also a problem. Your really interesting case might not be

130

Chapter 6

the best case for that purpose; remember that case selection for theory testing has its own set of rules. The point is that selecting an interesting case is not an invalid approach, but it’s one you should use for those broader research questions and not the narrow theory-testing questions.

Options for Multi-Case Studies The following strategies are used only for multi-case studies or what are more often called comparative case studies. They also tend to be more explicitly motivated by some sort of theoretical concern and causal inference, including a previously theorized relationship between two or more variables—especially, the role of one variable (often identified as X) in shaping, influencing, or causing some outcome (often identified as Y). These case selection techniques are particularly helpful for theory testing—something people don’t usually associate with qualitative research. However, these methods can be used for both theory testing and theory generating—and, as is often true, generally involve a bit of each.

Diverse A diverse case study will have two or more cases that represent very different values (or maybe all the different values) of some variable or variables of interest. For example, if we want to study schools, we might make sure we include a public, a private, and a charter school. If we are interested in gender interactions, we might want an all-boys, an all-girls, and a coed school. If we are interested in race and ethnicity, we might want a school that is mostly White (or mostly White, Asian, and Middle Eastern), a school that is mostly Black (or mostly Latinx or mostly Black and Latinx), a school that is fairly diverse, and any other common combination. Notably, the more variables of interest, the more cases we will need. For example, if we care about gender and race, we would want an all-girls mostly White school and an all-girls mostly Black school, an all-boys mostly White school and an all-boys mostly Black school, and so on. (In some cases, we might not find a case that matches the variable combination we are looking for.) This case selection method provides a somewhat representative sample because we are taking into consideration the variation along key dimensions of interest. In fact, Seawright and Gerring (2008, 301) note, “[T]he diverse case method probably has stronger claims to representativeness than any other small-N sample (including the typical case).”9 Claiming representativeness is

Starting on the Right Foot 131

great for general theory testing as well as theory generating research. But this method also lets you do something specific for causal inference: It lets you see how your outcome variable varies across levels of your variable of interest. Here’s an example from a study by Kitty Calavita and Valerie Jenness (2013) of California prisoners’ legal consciousness. They studied prisoners from three separate prisons in California. In order to make sure they interviewed a range of prisoners experiencing a range of conditions, they selected a low-, medium-, and high-security prison. Selecting these cases not only provided variation in the range of prisoners, but it also enabled the authors to see how prisoners’ legal consciousness varied across security levels. They found that, generally, prisoners in higher security prisons had a stronger legal consciousness, which the authors attributed to the way in which high-security prisons are even more saturated with law (their theorized mechanism behind prisoners’ robust legal consciousness) than prisoners in low-and medium-security prisons. To make their study even stronger, they might have added a fourth prison—a women’s prison. However, given the small (but growing) size of the female incarcerated population, how few women’s prisons there are (and thus a lack of variability in security levels), and that their main concern was the relationship between security level and legal consciousness, it’s understandable that they didn’t go that route. Instead, it would make for a perfect followup study that someone else could do—see if the same relationship holds by interviewing incarcerated women from three different security levels (which may involve three separate prisons or may just focus on different parts of the same prison, if we can’t find three distinct women’s prisons with different security levels). This would be a useful way to learn if the relationship holds outside of a sample, particularly for a different sex.

Most Different Selecting diverse cases is a similar technique to selecting the most-different cases. The most-different selection technique is one of two selection methods aimed at causal inference provided by John Stuart Mill (1872, ch. 8). The most-different (or least-similar) method—sometimes called Mill’s Method of Agreement—involves comparing two cases (maybe more, but usually two) that are very different from each other except that they are similar when it comes to one variable of interest and the outcome variable. It’s bit easier to show this visually, so check out Table 6.2. Cases 1 and 2 have the same values of X3 (the independent variable we care about) and Y (the outcome variable) but differing

132

Chapter 6

Table 6.2 Most Different. The two cases have the same outcome (Y = 1 for Case 1 and Case 2) and the same value of the independent variable of interest (X3 = 1 for Case 1 and Case 2), but they are different in all other respects (X1 and X2 have different values across Case 1 and Case 2). Variable

Case 1

Case 2

X1

1

0

X2

1

0

X3

1

1

Y

1

1

values of X1 and X2 (other potentially important independent variables). According to Mill and others who use this method, we should expect that X3 is causing Y. But, of course, as qualitative scholars, we don’t just look at correlations and assume the outcome is based on that correlation, even in a tightly controlled comparison. Instead, we investigate and look to see if X3 is actually playing the role we think it is—and not just the role it looks like it’s playing. To think through the intuition of this strategy, imagine two very different graduate students on the academic job market. They have different interests, different personalities, different publication records, but both get good jobs; they also share the same advisor. We might want to know what role the advisor played in them getting jobs. We’re not assuming that it was the advisor, or even all the advisor. Nor are we saying anything about the possible mechanism: Does the advisor have a special formula she shared with her students about how to write extremely good cover letters for job applications? Was it the advisor’s name or letter of recommendation that was seen as sufficient? Did the advisor just train the students really well? Does the advisor have a knack for picking really good students? We don’t know any of these things, but we’re going to investigate; basically, we’re saying there is smoke over that valley; let’s see if there is fire there or if that’s smoke that blew in from elsewhere.

Most Similar The most-similar method (sometimes called Mill’s Method of Difference) involves comparing two cases (maybe more, but usually two) that are very similar to each other but differ according to one variable of interest and the outcome

Starting on the Right Foot 133

Table 6.3 Most Similar. The two cases have different outcomes (Y = 1 for Case 1 but Y = 0 for Case 2) and have different values of the independent variable of interest (X3 = 1 for Case 1 but X3 = 0 for Case 2), but they are the same in all other respects (X1 and X2 have the same values for Case 1 and Case 2). Variable

Case 1

Case 2

X1

0

0

X2

0

0

X3

1

0

Y

1

0

variable. Now, check out Table 6.3. Cases 1 and 2 have differing values of X3 (the independent variable we care about) and Y (the outcome variable), but they have the same values of X1 and X2 (other potentially important independent variables). According to Mill and others who use this method, we should again expect that X3 is causing Y.10 But, as qualitative scholars, we investigate whether that’s actually the case, rather than just taking the correlation as sufficient evidence. So that’s what this method lets us do: select cases that let us see if there is anything to that correlation. Let’s say I want to study nineteenth-century prisons in the United States. Scholars have said things like the economy, race relations, and capital punishment are important for explaining the early prisons. Say I’m specifically interested in examining what led states to adopt these prisons at a given time: why a state is an early adopter (i.e., adopted a prison before 1835) or a later adopter of prison (i.e., after 1835). That’s going to be my Y. I find two states that have very similar economies, very similar race relations, but different approaches to capital punishment. One state severely restricted their use of capital punishment really early on; the other state still used it quite extensively in the 1850s. The state that relies very little on capital punishment also was an early adopter of the prison; the state that still uses a lot of capital punishment is a late adopter. So I think these two things are related: It’s consistent with the reigning theory that prisons basically replaced capital punishment. This is a great case selection setup for testing whether this theory is accurate.11 Realistically, though, it is really difficult to find such cases. Particularly when we’re working on bigger levels like states and countries, important

134

Chapter 6

differences are going to correlate, making it difficult to select for cases that are truly most similar. But even working on a more local level, where we have lots of options—such as when studying neighborhoods, schools, hospitals, prisons, or Fortune 500 companies—we can probably find pretty similar organizations or settings to compare, but they might be in very different places, which introduces more factors that might explain a different outcome and not tell us as much about the variable we actually care about. If you can find one, that’s great; if not, there is one other strategy to try.

Before and After Okay, this strategy is actually just a variant of the most-similar strategy, but it does not actually involve two separate cases—just the same case over time (a clear “within-case” comparative study). This is a fairly good, but not foolproof, way to keep a lot of the key variables the same while only changing one variable of interest as well as some outcome variable. I really like this research design because, in theory, it produces a clean comparison. In practice, it works best if your variable of interest—the thing that you are interested in because you think it impacts the outcome variable—is some sort of exogenous shock. By “exogenous,” I mean outside of the people or place you are studying. It’s something they didn’t have a hand in but were kind of forced into. By “shock,” I mean something that’s pretty sudden, at least from the perspective of whomever you are studying—for example, the advent of a recession or depression, the declaration of war (unless you are studying the people who declare war), the passage of a national law (assuming the locale you are studying had no say in that law and the law doesn’t reflect some major cultural shift your locale also experienced), or the discovery of a mayor’s embezzlement (unless you are studying the watchdog organization that called the mayor out).12 If you are studying policing practices or community volunteer work at a local level, any of these exogenous shocks I just mentioned could have pretty important consequences. The downside is that usually we’re interested in other variables of interest that aren’t easily associated with exogenous shocks. The reason why I emphasize the utility of exogenous shocks is before and after cases don’t work as well if the key variable of interest changed because some other variables changed—having lots of things changing at the same time makes causal inference really difficult. It is no longer a clean research design. Example time! Let’s say you are interested in looking at the effects of California’s

Starting on the Right Foot 135

three strikes law, passed in 1993. Maybe you are interested in overall punitiveness, and you want to see how the new law shaped people’s ideas about punishment, so you compare news reports, op-eds, or community meeting minutes for signs of punitiveness before and after the law passed. Or maybe you are interested in what prison life is like, or how court cases are processed, before and after the three strikes law. In both cases, the before and after approach makes sense. Sure, it would be great to be able to compare California’s situation to another state, but—come on—what state is comparable to California? Moreover, California’s three strikes law is particularly interesting—and therefore incomparable—because it was the most extreme version in the country (Zimring et al. 2001). So any comparison to another state will be tricky because (1) California (enough said) and (2) the magnitude of the law. But, for the second example (looking at how life behind bars changed), the before and after approach is probably going to be more effective than the punitiveness study. Here’s the (not necessarily insurmountable but still tricky) challenge: It’s possible that other things changed around the same time as the three strikes law’s passage. Most importantly, it’s possible those other things caused any observed shift in punitiveness (or maybe the shift in punitiveness and the law’s passage), rather than the law causing that shift. In fact, it’s often the case that something that at first seems like an exogenous shock is actually the product of a lengthy process that several people or groups saw coming (and consequently started to change their behavior).13 More insidious, it’s often the case that something else caused the so-called exogenous shock and that something else also had consequences—so it’s actually the effects of that something else that we’re seeing, not the so-called exogenous shock. The really tricky part of all this is if the true mechanism of the change is subtle or hidden in some way, even narrowly focused but good qualitative research might not be able to identify it. Let’s make this discussion concrete again. The three strikes law’s passage was unlikely until a little girl, Polly Klaas, was kidnapped and murdered by a repeat offender. Polly’s murder rocked California—she was kidnapped out of her bedroom in a White suburban neighborhood, and the media frenzy was epic. The public often becomes more punitive after high-profile crimes like these. So, one challenge is figuring out whether California was the same place before and after Polly was kidnapped and murdered. Having lived through that transition, not far from where Polly lived and died, I can say it wasn’t, at least

136

Chapter 6

for me and people around me. That’s a problem for our study: That transition happened really close in time to the three strikes law’s passage, so what we think of as the pre–three strikes period is really also the pre–Polly’s murder period. If I’m studying public punitiveness, it’s going to be difficult (but not impossible) to be able to say whether it was the three strikes law or Polly’s murder that caused a shift in public punitiveness. As a qualitative researcher, you’re so much better off than a quantitative researcher studying this question with survey data alone. You aren’t just looking at correlations.14 But you also can’t just look at whether the tone changed in whatever texts you’re reading: You need to look at how people talked about things, what references they used, and so on. For example, you would need to see how much Polly comes up and what role her murder played, or the fear of similar future cases, in people’s punitiveness. You just have to be really careful in order to address this challenge as you collect and analyze your data. And remember you may face critiques from some people who are attuned to more quantitative approaches: They will be skeptical of your potentially “confounded” approach, but in reality you have the tools to deal with what looks to them like confounding. The added work of addressing potential confounders is one drawback of the before and after approach, but confounders aren’t a major factor in all such projects. If we are interested in the other example—how prison life changed before and after the three strikes law passed—the confounding of Polly’s murder might not matter as much; it might have been less salient than the influx into California prisons of long-termers and lifers after the passage of the three strikes law. So think about what other things are changing and how important they’ll be for your ability to compare the periods before and after the change you have chosen. There is one other problem with before and after cases: They work best with some methods, and they are difficult to do with others. For example, unless you luck out and live through some important exogenous shock while doing your fieldwork without losing access to your case(s), it’s hard to do this type of project as an ethnography.15 (For example, the three strikes law was passed in 1993. There’s no way I can go back and do an ethnography of the before- passage period.) Maybe there is some important change happening right now, and you want to study its effects. You could still do an ethnography to explore the after period, but you would have to use different methods to explore the before period, which might introduce some biases in what you are able to

Starting on the Right Foot 137

observe. This isn’t to say you can’t do it; you just have to be really careful about what you claim and be really aware of your limitations. Indeed, you might be better off using textual/archival research and interviews focusing on retrospective accounts and seeing how those documents change over time or how people reflect on the differences as they remember them. In general, if you need to use different methods, it’s best if you can have at least one method that lets you do data collection in both the before and after periods. *

*

*

This chapter has discussed the considerations and strategies for case selection. Whether you think of your project as a case study or not, the questions in this chapter are helpful for setting up (and later defending) your study. Sometimes—like when you have very specific, somewhat narrow goals for your study (like some projects focused on causal inference and hypothesis testing)— you need to put a lot of time and attention into these considerations, especially before you dig in and collect your data. Other times, these are things you can think about before you collect your data, or they are things you return to when you are presenting your work and explaining why your case—that you just really wanted to study—is legit. These considerations are also helpful reminders about what sorts of claims you can hope to make with your study and when you might need to be more cautious or limited in what claims you can make. In the end, if you realize you really wanted to make some other claims than the ones you can reasonably make with the case you have already chosen, you can always just add another case and collect more data. And now we turn to figuring out how to collect your data!

7

Flaking out the Rope How to Check Your Sample

Fl ake A method of untangling a rope in which the rope is run through the climber’s hands and allowed to fall into a pile on the ground. Useful when preparing a rope for coiling, or before starting a lead climb, to ensure the rope is fed cleanly and without twists. Often called “flaking out” a rope. Wikipedia (2020), “Glossary of Climbing Terms”

Sometimes, You Don’t Screw Around So much of climbing is about doing your own thing, doing what feels right, and not really giving a shit about what other people think, especially people outside of the climbing community. Particularly when there are seemingly stupid rules standing between you and your climbing, it might make sense to break those rules—or at least bend them as much as possible if you’re a bit more risk averse.1 But there are certain situations when you have to follow certain rules so you don’t die. For example, if you are tying a knot, you tie it in a particular way; otherwise it can come loose while you’re falling and that can go really, really bad. Likewise, if you are belaying someone (that is, holding onto the rope while they climb so you can catch them if they fall), you keep your breakhand on the breakline so that you don’t accidentally let the rope slip through your belay device while your partner falls, thus failing to stop their fall, which is also really, really bad. And before you trust your life to that rope, you flake it out to make sure there are no misshapen or ragged parts. Signs of wear like that could indicate that your rope is not safe to use and that it could, conceivably, snap, which, again, would be really, really bad. And you always, always tie a knot at

138

Flaking out the Rope139

the end of the rope so it doesn’t just slip through the belay device, thus severing your ability to not fall. How can you tell these actions apart from others where it’s okay to do your own thing? The key is to distinguish between style and safety: When it comes to style, you can tell people to go fuck off if they say you’re doing it wrong; when it comes to safety, you don’t screw around. Likewise, there are some aspects of research design where you need to follow certain rules or do things in a particular way. Certainly, as we have already seen, when it comes to case selection, there are some types of research questions where you need to follow the rules more—but most Dirtbaggers aren’t asking those questions, so the rules are kind of just suggestions. However, even the most hardcore Dirtbagger needs to follow some rules when it comes to sampling. You want to be really deliberate about questions like to whom you will talk, what times of day you will watch certain spaces, or what documents you will read. You want to be deliberate about these questions because you want to be super aware of what or who is getting left out of your observations. Sometimes, you won’t care about the things getting left out, and other times you will care a lot. But you don’t know how much to care until you start to think about it and do some checks. That’s why you want to be deliberate and not screw around when it comes to sampling. *

*

*

This chapter examines the issues you need to think about pretty carefully when it comes to your data collection. For starters, we’re going to discuss how you decide what data to actually collect. Next, we return to one of the banes of a qualitative scholar’s existence: the question of how much data are enough. But rather than worrying about what other people think is the answer to this question, we will answer the question on our own terms. Finally, we talk about what you can do to really think through the limitations of your data and how to make your project stronger. Skipping these steps can (justifiably) open you up to criticism, but doing them carefully will protect you against some bad falls.

Identify and Justify Your Sampling Strategy In every qualitative study, you need to decide on the strategy by which you are going to collect your data—specifically, what criteria you will use to include some data in your study and leave other data behind. I say collect your data,

140

Chapter 7

but we are really talking about sampling—even though sampling sometimes brings to mind things like random sampling. (If you’re thinking now, “This isn’t for me; I don’t sample,” jump ahead to the next section and come back.) Every study has at least one sampling strategy, whether it’s called that or not. I’m also going to use the term “population.” Just like when you select a case from some universe, your sample will come from some population. The term “population” can be a bit confusing because your population may or may not be a collection of people (what we usually think of when we think of populations). The population from which you sample might consist of people (some subset of which we plan to interview), or it might consist of physical documents in boxes or digital text like tweets and blog posts (some subset of which we’re going to read), or times of the day or days of the week (during some subset of which we will observe a place). So as part of your decision of how you should sample, you first want to identify your population. You might actually have multiple samples from different populations, but most of the time, we focus on one or more samples from one population.

Common Strategies There are many strategies one can use to create a sample (that is, the collection of people, places, or documents you are analyzing in your study) from some larger population (all the people, places, or documents you could have analyzed in your study). Each of these strategies is a legitimate approach to sampling, but their utility ultimately depends on your research question and the characteristics of your population of interest. Random sampling means using a random-number generator to select from some population (whether we mean people, facilities, states, etc.). Stratified random means randomly sampling within various bands—for example, if you want to make sure you have enough people from each state in your national sample, you would stratify your sampling by state (i.e., randomly sample within each state). These two versions are easiest to do when you know your full population—you have a list of numbers to call in a phonebook (if those still exist), or you have the names of all of the students in a school. A random sample of a sufficiently large size is often thought to be statistically representative of the underlying population. If your study does not allow you to know the full population (such as homeless people, for example), other methods are necessary. Ethnographers and people doing interviews often use snowball sampling (sometimes called chain-referral

Flaking out the Rope141

sampling): They start with one contact, interview them, and then ask for other people that contact would recommend the researcher should also interview; the process repeats, as each subsequent interviewee recommends other names. (Ideally, the researcher starts with more than one initial contact to ensure a broad range of people, not just friends of friends, which can provide a somewhat narrow sample unless the population of interest is already small and fairly homogeneous.) Carefully done, snowball sampling can produce a diverse dataset that works well for conceptual generalizability. Less careful versions can produce an unexpectedly homogeneous sample (even at an unknown level of homogeneity) that can be less useful for either statistical representativeness or conceptual generalizability. A purposive sample is like the non-random version of stratified random sampling: It refers to selecting people (or things) according to some pre-specified criteria, such that the researcher has some reason for including someone or something in their sample. Reasons might include a particular type of diversity, nodal points in networks (people who seem to have a lot of connections to other groups), or particularly important actors. Similar caveats to snowball sampling apply here. A convenience sample often refers to interviewing (or observing or analyzing) whoever (or whatever) is present at a particular place at a particular time. (Rightly or wrongly, it’s often used to describe any form of sampling that isn’t random, stratified random, or similarly formulaic.) Convenience sampling can occur at different levels, from the site(s) one studies to the people or documents one studies. Again, there are better and worse versions of this sampling technique: Interviewing the first five people you see might make sense under some circumstances and not under others. Finally, you can take a comprehensive sample that includes the whole population. But that will require more explanation from me.

But I Don’t Sample! (Yes, You Do!) Your method of data collection might not feel like sampling at all because you are collecting all of the data you can possibly find (as you might at an archive or in ethnographic observations), which, technically, is not sampling because sampling means taking a subset of some larger collection of data. In fact, a lot of Dirtbaggers don’t “technically” engage in sampling because they tend to work with the full population. But you need to be able to articulate and justify why you analyzed the data you did, which is why it’s still helpful to think about your data collection as sampling.

142

Chapter 7

For example, you might be reading and analyzing all the newspaper articles about sexual assault published in the New York Times between 2000 and 2010, all the documents in relevant collections at an archive, all the posts on a particular forum since it began, all the statutes authorizing prisons in the Early Republic. I’d still call this sampling—you’re just using a more comprehensive sampling technique. In these cases, the key sampling question is not whether to use random, snowball, purposive, and so on, but about scope conditions; that is, what boundaries or constraints are you are imposing on your data collection? It’s pretty rare that we actually look at the full population because there is always some choice involved about who, what, where, or when. For example, if you are looking at some phenomenon over some period or within some jurisdiction, then you have to justify the period or jurisdiction. Why the period between 2000 and 2010? If you are interested in all reports on sexual assault across the United States, why are you looking at reports in the New York Times and not the Washington Post, the San Francisco Chronicle, or the Toronto Star? What are you gaining or losing by focusing on this newspaper and not another? How are you defining “sexual assaults”? Legal and popular definitions can vary substantially across time, place, and groups. Why look at prison authorization statutes from the Early Republic (and not some other period), and how are you defining that period (people use different years for certain historical periods— the Progressive Era, for example, can be defined in about five different ways). And sometimes we don’t get to make the choices about the who, what, where, or when ourselves. Particularly when working with an archival collection, that collection is actually a sample that someone else has assembled from some larger population—and we don’t always know what choices they made to create that sample. Here’s another example. People don’t often associate ethnography with sampling, but ethnographers are, in fact, sampling. In most cases, ethnographers aren’t observing all times and all places that they could; instead, they are observing smaller portions of what they could, in theory (if they were superhuman), observe. At the very least, they have to decide not only what places to observe but also when—what times of day, days of the week, and months or seasons of the year. Let’s use my social control on the subway example. Here, I’d want to look at a map of the subway station as well as a train schedule. I’d also want to collect any city reports I can find about the subway, particularly if they have information on the demographics of the users and frequency of use at different times

Flaking out the Rope143

for different stations or lines. I’d want to map out the different possibilities of when I might go out and do data collection—times of day, days of week, which trains/lines/stations. What am I interested in and when am I likely to see it? I don’t want to go only when I’m likely to see it, but I do want to make sure I go when I think I will see it. Just like when we mapped out the universe of cases, I want to map out all the times and places I might possibly observe people in the subway.

Whatever You Do, Be Systematic— and Assess Your Limitations One key consideration for deciding how to sample is whether we are engaging in some sort of systematic sampling. Remember our definition of empirical? Systematic observation and analysis are what make qualitative work empirical, useful, and (hopefully) believable. Whether you use one of the standard methods of systematic sampling—random, purposive, snowball, convenience, or comprehensive—or create a new sampling technique, you need to be systematic. Basically, we can’t just go milling about collecting something here and something there; we need to be conscientious about our strategy. For that, we need a plan, even if it’s a broad or somewhat vague plan. Here’s why you need a specific sampling strategy that lets you be systematic: You always need to be aware of your limitations and do all you can to check their impact on your ability to make the claims you want to be able to make. If you can’t say what criteria you used to collect your data, it can be tricky to assess any potential weaknesses in your data collection. It’s worth noting that these may or may not be real weaknesses—that is, you might turn out extremely useful insights—but if you don’t go through these steps, people can be pretty critical, and then they won’t appreciate (publish, listen to, follow) your insights.

Why There Are So Few Free Soloing Deaths Everyone knows that free soloing is pretty risky: If you fall, you die. But very few people have died while free soloing (e.g., Honnold and Roberts 2016). Why don’t more people die free soloing? The answer is people have a really good awareness of their limitations. Some people don’t, but as soon as they start to climb up any meaningful height, they realize, “Oh, shit, this is really scary.” Then they climb back down. (If they’re too nervous, they might fall and hurt themselves, but they usually don’t get high enough to actually kill themselves if they do fall.) Really practiced free soloists have a much better sense

144

Chapter 7

of their limitations: They don’t want to die, so they don’t free solo beyond their limit. If they don’t feel ready—if they don’t have confidence that they are really climbing well below their limit—they don’t do the climb. Being aware of your limitations is a matter of life and death when climbing, but when it comes to social science research, it’s also incredibly important.

The key question for assessing the limitations of your sample is: What is the relationship between your sample and your population? This is also where you think about things like representativeness and generalizability. Maybe your population is a unique group in some larger population that you don’t really care about. That’s fine! Just make sure your sample is representative of the population you do care about. With a lot of qualitative work, though, getting a representative sample will be tricky. This consideration should be built into your sampling decision—and, ideally, how everyone evaluates sampling decisions. For example, one important consideration when it comes to sampling is how difficult is it to find your population of interest? If you are studying supermax ex-mates—that is, people who were formerly incarcerated in supermax prisons—or homeless people, you are working with a pretty difficult-to-find population. On the one hand, you can be forgiven2 if you don’t have a perfectly representative sample of your larger population—in part because no one is likely to ever know how representative your sample is because we don’t have systematic data on that larger population. There is no database of supermax ex-mates or of homeless people. But while some people recognize the difficulty of this research, a lot of other people can be a bit persnickety about how representative (and therefore believable) it’s going to be. If you know your sample isn’t perfectly representative even of the population you are studying, spend some time thinking about the limitations of your sample. Specifically, think about what ways your potential sample might be different from the larger population of interest and whether you actually care about these differences. Then, take steps to reduce those differences you care about. Let’s say you plan to sample from one location. A potential concern would be that you might end up with a pretty homogeneous, possibly unique group who, when analyzed, yield a unique answer to your research question when you were hoping to get a more representative answer reflecting the larger, heterogeneous population. If that’s the case, then one modification would be to

Flaking out the Rope145

sample from multiple, very different sources or locations. You might not get a representative sample exactly, but you can at least ensure that you get a more diverse sample. For example, in his study of formerly incarcerated people trying to get work, Phil Goodman (2020) sampled from four, very different reentry service providers in order to collect a diverse sample. The empirically verified assumption here is that people with different personalities, criminal histories, and demographics might gravitate more toward one type of reentry service facility than another, so sampling from multiple service providers captures that variation. The people he interviewed might not be perfectly representative, but they’re going to provide a fair amount of the variation you would see in that population (Figure 7.1). You can check this outcome in your own sample by comparing the demographics (or other variables of interest) across your recruitment sites: There will be some overlap, but getting somewhat different samples will reassure you that you’re getting a fairly diverse sample.

Figure 7.1 Sampling. Having different starting points can give greater coverage of the variation within the population of interest. The box signifies the population of interest and the four circles represent separate (but sometimes overlapping) samples.

146

Chapter 7

Now, let’s say you are actually interested in a fairly narrow or homogeneous population, such as homeless women who use public and NGO services. In that case, you can limit your sampling to public and NGO service centers and focus on making sure you get a representative sample from these centers by using diverse strategies to include people in your study (e.g., using fliers, referrals from staff, and asking clients to recommend you to their fellow clients). And if you’re really lucky, you might even be able to use the full population from these centers (if the service centers cooperate, and you come up with a population inventory that your institutional review board approves). But unless you are comparing these groups to others—such as homeless women who don’t go through these services but find other ways of making do—you don’t need to try to work on sampling from other locations. It always comes down to the goals associated with your research question.

It’s (Usually) Okay If You Address These Limitations as You Go To be honest, you won’t necessarily start off this way—that is, make all these decisions at the very beginning of your study in your “research design phase,” before collecting any data. As I’ve indicated before, maybe it’s better if you do, but you don’t really have to. You might actually start off by going to the subway on your way home from school/work, and maybe you notice some interesting things. So you go to the subway again during some free time just to see if you notice those things some more. Now you think there might actually be something of interest so you plan to do your project. This is when you do the more systematic approach and think about your sampling strategy. This is the nice thing with qualitative data—you can kind of play around a bit before you get serious so you can have a better sense of things before committing yourself to a project. We call these early forays into the field “pilot studies,” sometimes jokingly (if we weren’t careful) and sometimes seriously (if we treated it as a small-scale preliminary study). Some ethnographers (e.g., Goffman 2001) argue you shouldn’t do it this way because your senses will be dulled when you actually start recording your fieldnotes—you will no longer notice things you noticed your first time going to this place. I think that, as long as you are systematic in your field notes (see Chapter 8), both during your pilot study phase and later, this is less of a problem. Indeed, lots of people do research on settings familiar to them. As with

Flaking out the Rope147

all possible critiques, though, it’s one to be aware of, so reflect on how it might affect your study and take steps to minimize its negative impact. There is an important exception to this idea that you can be fairly flexible and fix mistakes (if we want to call them that) later: anytime you have some sort of data that you can potentially taint. It’s really hard to taint your data by reading documents or unobtrusive observing, but it’s really easy to do it with interviewing. With interviews, you want to settle on as much of your research design (including the questions you use) before you start to interview people. While you can always change your questions and the manner of questioning later, you don’t want to do your pilot study with really important participants (either because you might only be able to interview them once or because the questions you asked tipped them off to a particular way of thinking, and you can’t unring that bell). So if you are doing interviews, or using any other type of data you can taint, plan ahead and do a pilot study on friends and family members or a small sample of possible participants rather than the people you really want to include in your final study.

Alternative Measures of Enough: Thick Description and Saturation Perhaps one of the most common challenges for researchers of all kinds is the question of how much or how many. How many more articles do I need to read in my literature review? How many cases do I need? How many interviews, hours in the field, boxes of documents do I need? Particularly given our preoccupation with size (especially quantity), this can be a pressing problem. A lot of people think more (bigger) is always better, as we saw in Chapter 2. However, our ideas about the requisite quantity of data can also be warped by things like procrastination and general anxiety. What do I mean by that? Sometimes, people try to do more legwork—read more articles, collect more data, keep editing their manuscript—out of fear of moving onto the next step in the process. Once you start any of these tasks, it’s pretty easy to keep going. Inertia is fantastic. You just do today what you did yesterday. It’s comfortable and comforting. But it must come to an end. So how do you know when you have enough (in this case, enough data)? While people sometimes give numerical answers—the actual number of interviews or documents you need to analyze—I’m going to borrow a different approach. It’s an approach that has been normalized for ethnographers and (implicitly) for historians but is by no means limited to them—lots of

148

Chapter 7

interviewers follow this approach, too. Importantly, this approach can conflict with the general sentiment that more is usually better and the implicit or explicit assumption that you will (or can) specify how many people you plan to interview or documents you plan to analyze before you start your data collection. Instead, this approach holds that you basically collect enough data to figure out what is going on. So how do you know how much data that will be? This is where two concepts from ethnography (such a helpful field) come in: thick description and saturation. One of the beautiful things about qualitative research is that it puts you in a position to offer thick description. “Thick description” is a term popularized by the classic anthropologist Clifford Geertz. I’ll be honest: For the longest time, I thought thick description meant what it sounds like—rich description (you know . . . instead of a concise summary of something that happened, you get an account of what happened with lots of details). But that’s not actually what Geertz meant. Instead, he’s interested not just in the factual account of what happened or what something was like but the underlying meanings—basically, an account needs to include some amount of interpretation so the reader can truly understand what was happening. Geertz uses the example of winking: You need to know something of the context, and you need to be able to make some inferences to determine if this is “an involuntary twitch” or “a conspiratorial signal to a friend.” So while you could simply describe what happened as “two boys rapidly contracting the eyelids of their right eyes,” doing so would lose the meaning behind it—that the contracting eyelids contained some message between the two boys (Geertz 1973, 6). Thick description occurs when you explain both what was happening and its meaning or significance. Using thick description as your goal transforms the question of “How much?” into the more helpful question of “How much data do you need to produce a thick description of something interesting?” Obviously, you could probably go into the field on the first day and collect enough data to write a really rich description of something. For the sake of illustration, let’s say you witnessed a fight in the school you are studying, and you could write a beautiful description of that fight. But you want a thick description—not just a beautiful description, but also the context, embedded meanings, and significance. Realistically, you probably can’t produce a thick description after one day in the field (and certainly not enough to support an article you can get published in a good journal). I don’t mean you can’t use the data from your first day in the field, just that you don’t want to stop there. In fact, there are some important

Flaking out the Rope149

questions you want to answer about whatever you saw on your first day: Is it normal? Why did it happen? What were its consequences? If you spent more time in the field, talking to teachers and students, and maybe looked at some of the school records, you might learn that fights are incredibly uncommon at this school. You also might learn that the fight emerged because of growing tensions between two students. If you stay long enough, you might see that it had downstream consequences a few weeks later. Writing all that up provides a much more interesting and potentially useful discussion of this one fight. There’s more though. Thick description isn’t just about a beautiful description with some context. It also needs to be related to something else of interest, especially theory. If you simply witness a fight one day, you might not have enough information to see how that fight actually illustrates an important mechanism by which students establish their dominance. Fighting might not be the most common mechanism by which that happens, but the episode might encapsulate a lot of the themes you see more often in smaller, less dramatic episodes of gossip, verbal throwdowns, and maneuvering. For each, maybe gossip illustrates one theme really well, and verbal throwdowns illustrate another, and so on, but the fight actually has all three themes present, making it a particularly rich example, even though it’s not all that common. That’s good thick description connected to a larger theoretical understanding of social dominance in schools. Let’s go back to the question of how much/many—how much time in the field or how many documents to analyze. I propose the answer to that question is: However much is enough to produce thick description. Sometimes that can be a few weeks, sometimes a few years. Or sometimes it’s three documents, and other times it’s three series of documents. In my first book (building on my dissertation), there was an ongoing tension between some of the prison administrators I studied. It took multiple document series—in particular, the warden’s log, the meeting minutes of the board of inspectors, and the diary of a local penal reformer—to be able to piece together the underlying tension and be able to tell a coherent story. This story ultimately involved interesting episodes that punctuated the tension, illustrated its causes and consequences, and related it to an important point I was making in my book about internal disagreements over how the prison should be managed. Hopefully this discussion illustrates that numbers are not a guarantee, nor a requirement, for good qualitative work. You might interview 100 people and still not have enough to offer a thick description. Maybe your interviewees are

150

Chapter 7

really reticent about the topic you care about. Or maybe your interests and research question have changed, and you don’t have as much information about this new question as you need. On the other hand, you might read three documents from different people, each with a different view (different biases, different goals, different backgrounds) and have enough to write a thick description of an event. Sometimes I’m pleasantly surprised by how few documents I had to read to get a good understanding of something, and other times I’m still lost after reading a whole series of documents, plus some other sources. (I should add the caveat that even if I have a good understanding of something like an ongoing dispute between two people or a particular event, that understanding still isn’t enough to answer to my overall research question.) But in general, something will only make sense when you have a greater familiarity of the context, lingo, routines, and practices of the people involved—for that, you usually need to collect a lot of data. So what is the measure we use, if not numbers, for figuring out how much is enough? Let’s say you can write multiple thick descriptions—is that enough? How do you know, really know, when to stop? The term that ethnographers and some interviewers will use is “saturation.” The idea here is, after a while, you keep getting the same answers from your interviewees, or you keep seeing the same refrain in your documents, or you keep observing the same patterns in your setting. Basically, you stop deriving new insights. Let’s say you’ve interviewed a fairly diverse group of 20 people. We’ll assume your sampling procedures make sense for whatever your project is. Even though it’s a diverse group, most people are answering a particular question— one that you really care about—pretty much the same way. Yes, there is some variation in the details, but really, the consistency is pretty clear. If you’re not overly interested in the variation, you might have enough data. Now, with a historical example—or anything where you are looking over time—or where you are generally more interested in comparison, this can be a bit tricky. With historical work, things might change over time, and you want to be sure you are attuned to that. But you can still let the idea of saturation guide you. For my book, I read and took detailed notes on roughly 50 years of meeting minutes for the local penal reform society that was very active in getting Eastern (my prison) established and (later) was active in fighting with that prison’s administrators for control over the prison. There were thousands of pages of documents, a fair amount of which was not super crucial to what I was interested in. However, I also know that my interests change over time, and I

Flaking out the Rope151

don’t always know ahead of time what will be important. While I knew I could always return to the archives (funding permitting . . .), I wanted to make sure I recorded enough to get a good sense of what was happening without giving myself carpal tunnel or ending up with overly copious notes that would be nearly impossible to read repeatedly during the analysis stage. A lot of the meeting minutes were formulaic with the same things happening every meeting. I would usually record the first several instances of something that showed up. I would also record when something changed (or when there was a particularly detailed or otherwise different version of something I’d seen before). But, after recording these types of entries, I would stop recording more unless it was central to my interests. This gave me several examples of everything that happened regularly (including some examples with really good detail), as well as some extraordinary events, plus a good sense of how things changed over time. Obviously, using this method meant I could not quantify my results. I can’t say how many times the penal reformers reported, for example, that the prisoners at Eastern were too cold (a common complaint). At least for my purposes, quantification would be unnecessary, plus the archival data I was working with was not the type of data that should be quantified. (While some things, like who showed up to the meetings, was regularly and probably fairly accurately recorded in the records, other things like what happened at the prison according to the reformers can only be seen as a series of snapshots—not a perfect record of what really happened there.)3 But it was enough for my purposes. I didn’t use this strategy for everything. I recorded every instance of things that I was particularly interested in. For example, because it was central to one of my interests, I recorded every time the reformers reported that prisoners were out of their cells. I wanted to be sure that I could get a general estimate of how common this was. I don’t mean I wanted to be able to quantify it and present seemingly authoritative counts of it (for reasons I mentioned above); instead, I wanted to have a general sense of it—if it showed up several times a year or even once a year, that indicates it wasn’t a one-time thing, but an ongoing if not quite routine practice, at least from this dataset (I could compare with my other datasets). More importantly, I wanted to know how attuned the reformers were to this practice and how they talked about it—especially how angry they seemed about it. From other documents, I know it happened regularly, but the reformers only seemed to fuss about it in some periods while they ignored it in other periods.

152

Chapter 7

Had it turned out that this thing I cared a lot about was really common in the data, I probably would have followed another strategy. I might have just recorded the first few instances in their entirety, and then, thereafter, I might have made a shorthand reference (e.g., OOC for “out of cell”) each time it came up. Then, I would record any examples that stood out or any changes. But endlessly recording every example would have been inefficient as the information would be redundant. These strategies can seem odd—why leave data uncollected? But we don’t really need to collect every instance of something. We need to collect just enough to understand it so we can produce an accurate thick description. Indeed, when it came to my write up, most of those examples that I had recorded were unnecessary. I needed to know when this practice first started happening (that is, when the reformers first recorded it themselves), roughly how often it happened (I don’t mean an exact number, but I knew that it showed up often enough to be considered common and eventually routine), and a general sense of what was said about it (not much, typically). I recorded so many of these instances because I thought I might want to use it for another project down the line. I also thought there might be some change later in the record, and it would be useful to have every instance so I could compare how things changed—not only in frequency but substance. But there wasn’t really such a change. Basically, after seeing this happen, say, ten times in the records, I’d already reached saturation—I didn’t really need to keep recording this practice except for any particularly unique examples of it. Here’s another reason, grounded in ethics not empirics, to follow this saturation approach. Let’s say you are interviewing people who have experienced some form of trauma. Now you’re asking this person to talk with you for an hour—maybe more, maybe less. You’re asking them to relive or to think again about that trauma. For some people, that might be helpful, even cathartic. But that won’t be the case for everyone. Let’s say you set out to conduct 400 interviews, because you want to make a policy impact, and you know that policy gatekeepers are skeptical of qualitative research so you want the force of numbers. But it turns out that by your 60th interview, you’ve reached saturation. The remaining 340 interviews are unnecessary. Why take up more people’s time and emotional energy to collect more data that would basically only replicate what you already have? What if about 5% of the additional data you would collect is not just variation in detail but actually new information? Let’s also say it’s not super important information that’s going to change your findings

Flaking out the Rope153

or conclusions. What was the human cost of that new but not super important information? In fact, these days, some university IRBs/REBs will reject an application to conduct such a project for precisely this reason—we have an obligation to our research participants not to abuse their willingness to help us out and to ensure that we subject them to no more harm than is necessary for the study. Interviewing people beyond the point of saturation is an example of exposing people to more harm than is necessary and a good reminder that more is not always better.4

Building in Checks: Confronting Selection Bias in Your Data Some Parts of Your Research Design Are That Important When we discussed case selection, I emphasized that for most Dirtbagger projects, you have a lot of flexibility about what case you select. Unless you plan to do a more deductive, theory-testing, or comparative project driven by a relatively narrow research question, you can pretty much choose the case that speaks to you—as long as you can defend it later, of course. But when it comes to how you collect your data within that case, you need to think carefully (before, during, and after your data collection) about what data you have access to. Let’s go back to my example of prisoner behavior that is often described as resistance, but that I like to call friction. Recall that this is a study of a nineteenth-century Pennsylvanian prison. My data’s characteristics—historical, come from Pennsylvania, and take place in a prison—don’t really affect (at least, not negatively) my ability to contribute to the larger conversation about resistance and how people behave in extreme circumstances. However, the fact that my data are historical and come from a prison does impact what behaviors I’m actually observing. My data sources for this project were all text-based: records from the warden’s log, some letters from the prisoners, a diary from a local penal reformer, the meeting minutes of a local reform society, and the testimony and final reports for a legislative investigation into official misconduct at the prison.5 Let’s think about these records more carefully. Who made these documents? Why did they make them? What voices am I not hearing? What similar documents did I not get to read? Basically, what is getting included in my sample and what is not getting included? These considerations were driven home to me when I was doing my fieldwork at the American Philosophical Society (APS), an archive in Philadelphia.

154

Chapter 7

I was handling the penal reformer’s diary (his name was William P. Foulke), which was a little bound book, roughly the size of my hand, filled with tiny writing (I sometimes used a magnifying glass to read it). What stood out about this book was that the edges of its pages were charred. I found out from the archivists that this book had survived a fire before making its way to the APS. I sat for a minute and reflected on this. In order for this book to sit before me, several things had to happen. First, an educated middle-class man had to be moved to write down his thoughts regarding his visits to the prison—a privilege his social standing enabled him, by the way.6 Second, he or his family members had to decide to keep the book in which he recorded his thoughts. Probably, it ended up in a trunk in an attic for a long time. Third, it had to survive a fire. (Were there other diaries he kept that didn’t survive the fire?) Fourth, his descendants (or whoever ended up with the property in the proverbial attic) had to decide to turn over their stuff to the APS, which also requires them realizing or at least thinking that their ancestor might be of interest to someone. Fifth, the APS had to do what it does behind the scenes to make sure the diary could be viewed—some archival objects are deemed too fragile to handle. Sixth, I had to show up to that archive, which was a combination of finding out that the APS had a collection on “my” prison and my ability to get funds to stay there (not at all guaranteed). Thinking about it like that is really humbling, but prepare yourself because it gets more intense. The prisoner letters I used also came from the APS. The main collection of letters was a correspondence between two prisoners—a 17-year-old White woman, Elizabeth Velora Elwell, and a 20-year-old Black man, Albert Green Jackson. (Many nineteenth-century prisons held both men and women, although usually the two populations were kept separate.) Most White women and most Black men in the nineteenth century were illiterate. Most people who ended up in the prison, regardless of gender or race, were illiterate (the prison kept records on incoming prisoners’ ability to read or write). For this series of letters to exist, this couple had to know how to read and write. Part of the reason this correspondence was so amazing, though, was prisoners were prohibited from seeing one another, let alone speaking with one another. In reality, prisoners worked around the prison, which frequently led to socialization, as the letters helped me demonstrate (I found this in other sources as well). But then the question becomes: How the hell did these letters survive? Wouldn’t the prisoners have tried to destroy the letters? Apart from one missive from Albert, only Elizabeth’s letters survived.

Flaking out the Rope155

I never found out how the prohibited letters made their way into the archive. I suspect the lovers were discovered at some point (they did get a warning from one of the guards), or at least the letters were discovered, which probably made their way into a file in the warden’s office. Since the people who ran the prison were extremely diligent about recordkeeping, the letters must have been kept with other documents or maybe mixed with other records and then turned over to the APS. Another possibility is the letters were discovered by or given to a penal reformer or one of the other prison staff whose documents went to the APS a century-plus later. At this point, you might be thinking, sure, cool story, but I don’t study history and I don’t go to archives, so how does this relate to my research? (Framing!) First of all, you can run through these questions about what data you have access to, and what data you do not, for any type of data collection. What physical spaces and audible conversations do you have access to? Who is willing to let you interview them? To what extent are the people you are watching or interviewing putting up a front of some kind, at least initially, if not for the full duration of your data collection? In other words, what types of selection bias are you dealing with in terms of what you even have access to (as opposed to what you chose to examine)? I like these examples because they highlight a general limitation we all have to reconcile: We can only analyze the data we have access to, and that is a form of selection bias. Studying prisoners reminds you about the layers of power, control, and privilege that go into a document that gets retained. Studying them in historical context illustrates the remnants of multiple power hierarchies and the layers of serendipity required for documents to survive in a form that someone like me can analyze today. Think of historical prison documents as an extreme case that highlights shared challenges across types of data collection. There’s an even more extreme case, however, that helps us illuminate the underlying factors that determine what data we have access to. Historians and African American Studies scholars who study slavery and enslaved people have made the most progress in thinking about these questions of built-in selection biases. Society’s power hierarchies don’t just determine who has a voice but also what is said about those whose voices are not recorded. As Saidiya Hartman (2008) has illustrated, the groups of people not given voice are frequently discussed only in their most traumatic moments: For enslaved girls, “The archive is . . . a death sentence, a tomb, a display of the violated body, an inventory of property, a medical treatise on gonorrhea,

156

Chapter 7

a few lines about a whore’s life, an asterisk in the grand narrative of history” (Hartman 2008, 2). Those of us who want to know more about such disenfranchised groups are often left with the records maintained by the powerful— records that, Hartman notes, are made possible by violence. Moving beyond those records is exceedingly difficult. Using traditional empirical methods, our choices are to stick to the existing data and acknowledge their many holes and challenges or simply avoid the subject because the data are so limited. In either case, the result is unsatisfactory and “replicates the very order of violence” we seek to describe and analyze (Hartman 2008, 14). It is important to be aware of, and reflect on, these power imbalances that shape the data we have access to. In addition to the normative problems they potentially raise, they are also empirically problematic. Power imbalances present one of the major sources of selection bias in a large part of the data out there. And remember, this challenge is not limited to qualitative data. You know all those state-collected datasets that limit racial identification to Black and White (as many datasets did for decades), only provide for one racial category per person (as many datasets still do), and only include two options for sex or gender? (Don’t get me started on how we define and collect data on certain crimes.) There are rarely easy fixes to these biases; the most important thing we can do is be aware of them and take steps to mitigate them wherever possible. Importantly, reflecting on the various biases in your data is something you should do while you are designing your project, while you are collecting and analyzing your data, and while you are writing up your findings. (In fact, the most rigorous checks might happen after you’ve finished the main portion of your data collection.) One way of addressing possible selection bias in your data is to strategically collect data from other sources to supplement your primary dataset or as part of your primary dataset.

Multiple Sources of Data Either as part of your initial research design or as a subsequent check on your original dataset, you should collect data from a variety of sources, perhaps using different sampling strategies. There are three main ways to collect this data or reasons why you want to collect this data: triangulation, counterfactuals, or testing competing explanations. • Triangulation: You might want to draw from other sources of data that shine light on the things you are interested in but in different ways (e.g.,

Flaking out the Rope157

interviews and documents, observation and interviews, public documents and private documents). You’ll want to use the intuition of case selection and sampling for each of these data sources. • Counterfactuals: As you start to develop ideas about what’s going on in your case, you might want to collect more data on other similar cases so you see if the explanations for your case hold up. You don’t have to do a formal comparative study, but you do want to check out these other, similar cases to see if some of the things that you see in your case are different or similar to what you are seeing (sometimes you hope they are similar, and sometimes you hope they are different, and sometimes the results put you on a new path entirely). To study these other cases will require more sampling from other data sources or populations. For more on counterfactuals, see Chapter 10. • Alternative explanations: You also might want to collect more data about your case so you can test other, competing explanations for what you are seeing. Basically, if you think a particular series of events is responsible for something you are interested in, think about what other explanations there might be—or what other people think explains it or what the academic or policy literature says explains it. Since you might not have set out to study these things—maybe your research question changed after you were in the field, as we’ve discussed is common—now you need to think through how you can find data on these other explanations so you can say whether you see support for them or not. (If you can see support for other explanations as well as your own, that can make it tricky for people to believe your explanation is more important or even viable.) For more on alternative explanations, see Chapter 10. Each of these strategies requires more data collection, which in turn means more sampling. Even if it’s not your primary focus, you still need to collect data in a thoughtful, systematic manner. Now, let’s turn to the strategy that you are most likely to use and that is central to sampling considerations, especially when critics want to ask about selection bias in your study.

Triangulation Triangulation can mean several related things, but each involves searching for answers using multiple “probes.” In a research context, probes—sources of data—can mean different types of data collection (immersive ethnography,

158

Chapter 7

interviews, document-based research) or different data sources (three sites for observing people; three sites for recruiting interviewees; three sets of newspapers or three sets of documents like legislation, court cases, and newspapers). The idea is each probe (data source or type of data collection) lets us see one side of things but leaves other stuff hidden; to see that hidden stuff, we need another probe or two (Figure 7.2). We need to triangulate. (I keep using “three” because it’s triangulation, but any number greater than one works.) To help build intuition about triangulation, let’s start with an example that will be most familiar to you from shows about the military and law enforcement. If you’ve ever watched a crime investigation–type show, you know that investigators will often try to track down a suspect or victim by “triangulating” their cell phone signal. This involves calling the number (or just seeing where it was last active) and finding the three nearest cell phone towers; these towers form a triangle, which represents the suspect’s or victim’s general location—not the exact location, but a smaller range than everywhere.

Figure 7.2 Triangulation. By combining three different probes, we identify different elements that would otherwise be hidden if we looked at only one probe. Additionally, some of the data will duplicate, which increases our confidence in those findings and helps us with thick description.

Flaking out the Rope159

In the research context, triangulation can strengthen a single study by looking for different sources of data or different perspectives for analyzing whatever it is we are interested in. If this sounds familiar, that’s because we’re using the same intuition when we use snowball sampling on a diverse population: We want to have several starting points that come from diverse areas to ensure we don’t just stick to one person’s network but get a more diverse sample. (Here, the starting points are the different probes.) Likewise, if we are looking at documents, we might have reason to rely heavily on one type of document or document series, but we typically want to supplement that with other types of documents. (Now, each type of document is a probe.) The goal is to avoid gathering a one-sided (or single-probe) view of reality. No matter whether we’re using documents, people, or observations, different probes may tell us conflicting stories, which could mean that there are inaccuracies in one of the probes (people frequently say they do things, but then it turns out they didn’t do them), or it might mean there is a misunderstanding or a difference in perceptions (which might not be a lie but an important finding about how people interpret the world). Sometimes the conflict is just additional information that was missing from another probe. Using multiple probes lets us write fuller accounts, richer narratives, and thicker descriptions. For example, in my book on Eastern State Penitentiary, I relied on a diversity of sources, including both public and private documents. By public documents, I mean documents that people outside of the prison could access, like the annual reports that came from the prison and from the various penal reform groups active at the time. By private documents, I mean documents that were difficult or impossible for outsiders to get their hands on—things like labor records, meeting minutes from the prison’s board of inspectors and from the local penal reform society, the warden’s journal, a penal reformer’s diary, and letters illicitly exchanged between prisoners. In addition, I used other public documents that were more national in scope, like newspapers and a popular literary magazine. This variation across public and private documents is important because some documents are written with a particular agenda in mind: The annual reports were basically public relations pieces for the prison, and they wouldn’t air the prison’s dirty laundry—information I could find in the private documents not intended for circulation. In fact, what people said in public was often different from what they said in private; but it was also helpful to see when they said things in private that they also said in public. For example, prison

160

Chapter 7

administrators sometimes privately recorded their hopes that the prison would do well, suggesting that they genuinely wanted that outcome—it wasn’t just something they said publicly to look good. Also, people frequently have different access to reality and react accordingly: For example, prisoners might report different things to the penal reformers that they wouldn’t say to the prison administrators. With the help of other documents, like those written by the prison guards, I could get a better sense of which statements—what prisoners said to the administrators or reformers—were accurate. While you can do research with one type of data or one source of data, usually you need a good reason for that. For example, there are some historians who will write a deep analysis of a historical person’s diary. Likewise, people have certainly analyzed articles from one newspaper in a particular period (contemporary or historical). These types of studies, though, are usually interested in these sources not so much as windows into reality, but for what they represent. For example, there is a lot of research that looks at how crimes, accused criminals, and victims are depicted in the news. This research (rightly) doesn’t expect that these depictions are accurate; instead, the research is about things like when is a particular crime covered (not all crimes are covered in the news), how is the crime covered (what language is used?), what victim or perpetrator characteristics are mentioned (is an accused criminal’s race mentioned? is a rape victim’s clothing or occupation described?). While such analyses can rely on media coverage alone, it can be useful to compare that coverage against documented crime rates and trends; it helps us see what journalists or media sources consciously or subconsciously thought was important—and what sorts of biases these preferences reveal. But triangulation isn’t just about checking for inaccuracies or differences of opinion. A lot of times, qualitative researchers want to create rich (and thick) descriptions of a setting, a population, or a concept, and for that purpose, one dataset or data source is not sufficient. For example, in their book on prisoners’ legal consciousness and on the formal complaint system available in California’s prisoners, Kitty Calavita and Valerie Jenness (2015) rely on this strategy of triangulation—they even use mixed methods (that is, they are using different sources of data and different data collection methods). They interviewed 120 prisoners from three prisons as well as 23 prison staff of different ranks. They also analyzed all the complaints filed by 292 prisoners (some prisoners filed more than one) in a particular year (Calavita and Jenness 2015, 8–12). The combination of these different sources of data gave them the prisoners’ view of

Flaking out the Rope161

how the complaint system worked, the staff ’s view of how it worked, and the official written record of how it worked. Part of their work focused on perceptions and how people feel about the complaint process (is it fair? can it be improved?). Keep in mind that, sometimes, we don’t care if someone’s perception is factually accurate; instead, we care more about why someone thinks something (also, people can easily disagree over questions of fairness). Indeed, this question of perception was central to the study by Calavita and Jenness. But part of their research also dealt with unearthing reality, and that’s when triangulation shines. For example, if the prisoners say prisoners never succeed on their complaints, and the staff say prisoners succeed all the time, the documents can show who is right (the prisoners, it turns out). Triangulation let the researchers give a rich account of this complaint system—both how it is perceived and how it is actually working.

It’s Iterative! As this section has hopefully made clear, it’s really difficult to specify all of your sampling procedures at the outset of your project. Instead, this element of research design is an ongoing part of your work. You want to keep certain questions in your mind as you go. Ask yourself how your data collection efforts—what data you choose to focus on—are shaping your ability to understand what’s going on. Ask yourself if you have enough data to give a thick description or to reach saturation, or if you are collecting more data just for the sake of “getting your numbers up.” Think about the ways in which your data are giving you a sufficiently complete picture, where their blind spots are (all datasets have blind spots), and how you can supplement your data with other data sources. The way in which you will keep coming back to certain questions is why we refer to this type of research as iterative—and it’s why research design is not a one-time thing that you decide at the beginning and never revisit. In fact, as part of the iterative nature of qualitative research, it’s helpful to keep in mind that you will probably collect multiple sources of data—and you won’t do it all at once. This is another reason why some of the concerns about selection bias and generalizability that we discussed in Chapter 5 are so often overblown. You might have a particular sample that is your primary data, and it might have selection bias problems or generalizability issues, but that sample is not your only data. Using triangulation (or adding in counterfactuals and testing out alternative explanations) is part of your research design, but they usually come up after you’ve set up your initial research design. Or in some cases,

162

Chapter 7

you might have known you wanted to use these techniques, but you didn’t have enough background information to even plan the details of how you would collect data in these ways. So whether you set out to collect data in these ways or not, you’ll need to adapt your initial research design as your project progresses. The iterative nature of your research design sometimes gets left out because when people talk about research design, they usually mean your original plan for your project (the one you put in your grant proposal or dissertation prospectus or just the one you initially wrote down for yourself and yourself alone). In practice, research design usually means the plan that you pretend you initially set out to do, but that you actually put together after stuff went wrong in the field. To be clear, that’s what a lot of people do—and that’s not wrong in most cases. We don’t talk about these iterations because the conventional approach is to perfect your research design prior to completing any research, except perhaps a pilot study. That is, you have a plan, then you collect some data to see if it looks like your research design and data will let you answer the questions you want to answer. If they do, then you go out and collect more data more systematically. If they don’t, then you revise your research design, field another pilot study, and repeat the process until you feel like your research design will work on a larger scale. The fact that this additional work (collecting and then analyzing data for triangulation, counterfactuals, and alternative explanations and generally revising your research design) gets left out also contributes to the misconceptions of qualitative research as small-n (or small sample size). Indeed, it’s another one of the reasons why I get annoyed when people dismiss case studies as only having one case. As Seawright and Gerring have pointed out, even single-case studies usually involve other cases: “[B]ackground cases often play a key role in case study analysis. They are not cases per se, but they are nonetheless integrated into the analysis in an informal manner” (2008, 294). This has other implications, including “that the distinction between the case and the population that surrounds it is never as clear in case study work as it is in the typical large-N cross-case study” (Seawright and Gerring 2008, 294). As we’ve already seen, my book on Eastern State Penitentiary was a case study. But in the course of my research, I also collected data on the full population of prisons (all 30-plus prisons in existence by 1860), and extra data on all three prisons that adopted (and later rejected) the unique form of long-term solitary confinement. My research design was not explicitly comparative, nor

Flaking out the Rope163

was it explicitly a full-population study—it was a case study—but I did have data on other cases because that helped me to better understand my case and to understand why it was different. *

*

*

Qualitative methods allow for a certain amount of flexibility. That said, there are certain things you need to do to make sure your research does what you want it to do. This chapter has focused on sampling. You might think about this issue a fair amount and then go off into the field to collect your data, or you might go into the field, collect some data, and then adjust your research design in an iterative process. As we have seen, you won’t have answers to some of these questions—for example, how much data is enough—until you are knee- deep in data in the field. We will continue to talk about the ways in which research design, data collection, and analysis are going to be tangled up in the following chapters because, remember, it’s iterative.

8

Bivvy Time The Fieldwork Model of Data Collection

Bivouac Also bivy or bivvy. A camp, or the act of camping, overnight while still on a climbing route off the ground. May involve nothing more than lying down or sitting on a rock ledge without any sleeping gear. When there is no rock ledge available, such as on a sheer vertical wall, a portaledge that hangs from anchors on the wall can be used. Wikipedia (2020), “Glossary of Climbing Terms”

Roughing It? My first experience with real fieldwork— traveling to an archive in Philadelphia—was a fairly luxurious affair. I had a residential fellowship for one month at the archives of the American Philosophical Society—a scientific organization, formed before the American Revolution, that was housed in a series of historic (and extremely stately) buildings next to Independence Hall (where the Continental Congress met and signed the Declaration of Independence and other important stuff). By contrast, most archives are in the middle of nowhere inside fairly shabby, uncomfortable public buildings. If you have any dietary restrictions—say, you’re a vegetarian—doing fieldwork by traveling to an archive is kind of challenging because small towns don’t have the widest food selections, at least if you are used to a university town or living in a big city where being picky about your food is pretty common (I don’t mean that judgmentally—I’m super picky about my food). But I lucked out. Monday through Thursday, I showed up at 9 am when the archive opened and worked in a wood-paneled room lined with bookcases, white crown

164

Bivvy Time165

molding, and brass chandeliers, all encased on one side by a giant glass wall— truly mixing the historic look with a modern security apparatus. It was a beautiful setting. At lunch, I went to pick up delicious takeout food and brought it back to the break room—similarly well-appointed with a giant wooden conference table and large (almost floor to ceiling) windows common to the eighteenth-century architecture that characterized the APS library and many surrounding buildings. On Fridays, I visited other archives within walking distance. At one point, my partner flew out from California to visit me—before this trip, we had never spent more than 24 hours apart, so it was a big deal to spend a whole month on opposite sides of the country. Since I had been renting a room in a local woman’s home, we decided to stay in a hotel. We ended up staying the weekend at the Four Seasons—while this was something of a splurge for two grad students, it was during the great recession so, somehow, it was one of the cheaper hotels with vacancies. My month at the archives was the opposite of “roughing it,” which is how fieldwork more often feels. *

*

*

This chapter discusses the process of collecting data in “the field,” which I define broadly to include any place you collect your data. I have adopted this language of “fieldwork” and “fieldsite,” terms frequently associated with ethnographic work, because I have found ethnographic research to be a broadly useful model for all qualitative research, even for those of us doing online or archival research. In this chapter, I review specific fieldwork strategies that I have found useful. Some readers, who have never conducted ethnographies, will recognize these strategies, because these techniques are not actually unique to ethnographers. However, most of the non-ethnographic methods texts I have come across have not said much about the mundane realities of data collection (while this is something ethnographers excel at).

Opening up Fieldwork Since I’m using a non-traditional definition here, it will be useful to first explain what most people mean by “fieldwork.” In many disciplines, fieldwork generally refers to collecting research “out in the field”—as in, out in the real world that we are trying to study. As an example, Wikipedia (increasingly our go-to

166

Chapter 8

source for quick and dirty information) currently defines fieldwork explicitly as outside one’s normal academic space—“the collection of raw data outside a laboratory, library, or workplace setting.” This definition seems fairly common. Notice that libraries—and we could add archives—are explicitly not one of the recognized fieldsites. In the social sciences, the term “fieldwork” usually describes research with living humans—not ancient papers and books. Despite the otherwise broad definition, social science fieldwork primarily refers to ethnography (typically defined as including some sort of participant observation and long-term embeddedness) or possibly interviewing. For example, the collection of essays in Robert Emerson’s (2001) Contemporary Field Research is specifically focused on “ethnographic field research.” At least this label implies there can be other types of fieldwork than ethnographic alone, but people don’t usually call archival work fieldwork, at least not in my circles— although some people certainly have (e.g., Harris 2001). But it’s more common for people to think about archival work and fieldwork as distinct endeavors that can be fruitfully combined (e.g., Gracy 2004; Lorimer 2009). Second, and relatedly, when people talk of fieldwork, they often think of classic or contemporary ethnographies in which a researcher travels to an unfamiliar place. Their fieldwork takes place in a foreign setting—either literally foreign because it is a different country or figuratively foreign because, although part of the same society, it represents a place apart: an extremely privileged walled-off world of Fortune 500 executives (Morrill 1995); an inner- city neighborhood just a few minutes’ drive from downtown but still a world apart (Contreras 2012; Rios 2011); a school classroom so unfamiliar to most adults (Flores 2016; Irwin and Umemoto 2016; Morrill and Musheno 2018); or a public restroom in which men seek out and have anonymous sex (Humphreys 1975). These places are often viewed as exotic—sometimes with particularly problematic colonialist overtones of treating people from other societies or certain marginalized groups as an “other.” Certainly, ethnographers have also stayed close to home, studying a familiar place, either familiar to mainstream society (usually read as White and middle class) or to them personally (in what is often called “insider” research, somewhat problematically defined as researchers studying “people like them”—which is fairly typically only recognized as such when the researcher and their participants belong to a category of visible minority and then it’s derisively called “mesearch”).1 Third, fieldwork often involves a question of access. In many cases, you need permission, or at the very least some measure of good will, to be able to

Bivvy Time167

collect data in your fieldsite. Within organizations—businesses, prisons, hospitals, schools—the researcher needs formal permission from organizational officials (and sometimes their lawyers) and sometimes, especially for prisons and hospitals, by the organization’s research ethics boards (e.g., Calavita and Jenness 2015; Flores 2016; Morrill 1995). Within communities large or small, you need good will and semi-informal permission: No one can really stop you from hanging around a public space (although there are sometimes laws about this that can be enforced differentially depending on what you look like), but you’ll have an easier time making inroads if community leaders give their okay and even introduce you around to show you can be trusted (e.g., Eason 2017). In smaller, more intimate settings—like places where people hang out or when you are following around a group of friends—access also needs to be granted, again to show that you can be trusted; this too is enabled by a particular “informant” or member of the group who introduces and signs off on you, sometimes explaining what you are doing there (e.g., Contreras 2012; Goffman 2014; Lee 2016; Rios 2011). By contrast, with the type of fieldwork I primarily do— archival work or computer work—you don’t really need someone’s permission. Some private archives are closed to the public, but you can apply for access; in many ways, this is easier than getting a stranger to trust you. Finally, fieldwork, however it is defined, often includes a fair amount of discomfort. Usually, this discomfort involves more than just being separated from one’s (furry or human) family and home, familiar food, and control over one’s schedule. In the extreme, it can involve dangerous conditions like getting detained, handcuffed, held down, or kicked by police (Goffman 2014; Rios 2011); worrying occasionally that some of your participants might try to mug you, especially in the early days of building up trust (Contreras 2012); going on ride-alongs with law enforcement officers to parolees’ homes, especially when the officers have guns drawn (Rudes 2008); spending time (voluntarily) in a prison (involving a standard consent that you are on your own in the case of riot—although chances of that are pretty low); being forced to walk back to town several miles away in southern summer heat, made even more dangerous if you happen to be a Black man whose mere presence walking away from a prison breeds suspicion among some locals (Eason 2017); or being horribly injured and assaulted by one of your participants (Huang 2016). Obviously, discomfort is not a necessary condition of fieldwork. Plenty of people have studied spaces of privilege or beautiful settings, or they have gone into “the field” during the day and were able to sleep in their own bed

168

Chapter 8

at night. But compared to stories of violence and danger so common in the ethnographic literature, other kinds of discomfort from less traditional forms of fieldwork might seem downright simple: back pain, eye strain, and emerging carpal tunnel from sitting in front of the computer for ten hours a day. (Although, one of my colleagues, I won’t say who, was shot at when they tried to use an archive in France. Don’t worry: They were unharmed, if a little shaken.) Despite these potential differences between conventional versions of fieldwork and whatever method you are using, it is helpful to think of your data collection as fieldwork. When I say fieldwork, I mean any time you are collecting data, whether that is in a fieldsite of the kind used in ethnographies (e.g., a neighborhood, a faraway village, a school, a hospital, a business), a café in which you interview your participants, an archive in your own city or in another country, or even sitting in your pajamas in front of a computer as you download documents—current or historic newspapers, official documents, primary sources from an online database, pages and pages of comments in an online forum or in response to some news article. While each type of fieldsite will have its own challenges, discomforts, and possibilities for data collection, they all have the act of data collection in common. It is during this act that I find thinking like an ethnographer to be particularly useful, even if you aren’t putting yourself through the same difficulties that come from literally leaving your regular life behind for one or two years. And that is the fieldwork model that I will lay out in this chapter.

How Do Fieldworkers Do It? Ethnographers have fieldwork down to a science (or an art, depending on your point of view). I’m going to highlight two particularly useful strategies they use to construct what I’m calling the fieldwork model of research:2 first, collecting your data reflexively, and second, taking fieldnotes. (I would actually add triangulation, but we already talked about it in the last chapter!) These strategies also help you to think through how to be systematic when collecting data, even when it kind of feels like you’re just going out there and running through a forest without a trail or a map.

Reflexivity One key technique or characteristic of ethnographic fieldwork that I think is broadly applicable is ethnographer’s careful attention to reflexivity. “Reflexivity” can mean different things, and it is sometimes used in connection or

Bivvy Time169

interchangeably with “positionality.” Positionality is similar to reflexivity, but I tend to use positionality—and I think others do as well—to mean what your social position is, as determined by such things as your gender, race or ethnicity, education, and status as a researcher. But when I (and most folks) say reflexivity, I mean deep awareness of how you are the instrument of your data collection, and therefore you cannot be separated from the process of data collection. You can be reflexive about your positionality—that is, about how your position affects your data collection—but also about many other things unrelated to positionality. Ethnographers and interviewers have been particularly thoughtful about how their own perceptions, histories, personalities, and training shape their data collection and analyses (as we’ll discuss and embrace in the next chapter). And this kind of makes sense that they would be the leaders: When you are the instrument of data collection, you need to pay attention to what work you as the instrument are doing. So, for example, ethnographers and interviewers have been particularly attuned to the ways in which how they look, behave, and dress—and more generally how they are perceived—will shape what information they may access. That is, someone might say something different to me than they would say to you, or they might let me see something they won’t let you see. In either case, there are epic discussions abounding about how your identity impacts your data collection. Indeed, these aren’t always simplistic discussions: Just because you look like someone (e.g., have the same gender, race, or ethnicity) doesn’t mean they will share things with you; there might be important cultural differences or other status differences, and you the researcher might not be entirely aware of them.3 This emphasis on reflexivity comes out of some streams of research variously called critical, post-modern, feminist, and decolonializing—intellectual trends that really gained ground beginning in the 1960s and again in the 1980s but remain on the fringes in some fields in the United States (less so elsewhere—they are pretty popular in Canada, France, and many Latin American countries, for example).4 Part of the goal is to recognize that claims to impartiality, objectivity, or neutrality—ostensibly key characteristics of science and, more generally, what is found under the banner of positivism—are always problematic. As an example, in anthropology, there is a long tradition of some ethnographers adopting colonialist ideology and being fairly judgmental of what they see as “primitive” practices of people beyond what they recognize as “civilized” by Western standards. Reflexivity emerged in part to reconcile with

170

Chapter 8

this history and think through how to continue to do anthropology, but without being so colonialist about it. While some of this more reflexive research has been critical in a somewhat normative sense (i.e., by explicitly or implicitly disapproving of the scholars who made these colonialist statements), there is also an underlying empirical point. Traditionally, we have treated some research as unquestionably accurate, especially research conducted by members of a dominant (often White and male) group, but that research usually reflects the dominant group’s biases and norms and, consequently, is frequently inaccurate. Feminist criminologists, for example, have pointed out that, to the extent that most theories of what causes crime are accurate, they are only really accurate for male criminals, because the researchers who created these theories studied only male criminals most of the time and just extrapolated their findings to women. This habit is one of several longstanding male biases in criminological research that has traditionally been (and remains) dominated by male researchers (e.g., Chesney-Lind and Chagnon 2016; Lowe and Fagan 2019). This is not a bash on criminology; it’s a problem in many fields (and this is just gender; in every field, we could also add in race and other status differentials).5 Interestingly, this critique—that theories generated from male-only data are faulty—is considered a critical or feminist point; therefore, it is usually discussed as a normative critique of positivism. However, a positivist could easily use this reasoning to call the original research faulty (see, e.g., Harding 2009; Intemann 2010). I mean, if your sample only includes men, it’s kind of silly to think your findings can extrapolate to women in most contexts, particularly when there are some important differences in your variables of interest. It’s like trying to say your quantitative survey on California will generalize to Rhode Island! Despite the normative origins of the critique, it has important empirical or positivist ramifications. Coming from the other side of the normative or epistemological spectrum, some people have used this concept of reflexivity to criticize qualitative research from a positivist point of view as invariably flawed. If you and I can go to the same fieldsite and, using the same research question and methodology, collect different data, how is that reliable data? I think of it this way: We’re each measuring something, or maybe different versions of the same thing (kind of like the overlapping probes used for triangulation). We will necessarily see overlap and differences. With qualitative research, it doesn’t mean that my account or your account is wrong—they might both be correct. Unlike some quantitative

Bivvy Time171

researchers, we’re not trying to say this is the only answer, but instead there are multiple answers.6 We might both be measuring some important social process, but I identify this important mechanism by which it takes place, and you identify this other important mechanism by which it takes place—both can be correct. This is a fairly post-modern point in that it recognizes that there can be multiple truths and that different people are going to have different experiences of the same thing, not in the (problematic) sense that two eyewitnesses might disagree about whether a suspect was wearing a red shirt but in the (reasonable) sense that we can both be present and have very different experiences or reactions to what we saw, and both are valid and accurate. What I find interesting about this critique—recognizing how different researchers can present different research but seeing this as a deep flaw in the research—is that it disproportionately falls on qualitative research. As the feminist criminologists’ critique points out, non-critical or “traditional” research is very much prey to these same issues; they just tend to be less reflexive about it. In fact, it’s a major problem in quantitative research. In survey design or stateor organization-curated datasets, for example, the variables produced often reflect the dominant group’s concerns. For example, for a very long time, criminal justice data only included two race categories (White and Black) and no ethnicity or nationality information. What about Asian people, or Latinx people, or people who identify as having multiple racial or ethnic identities, or who identify themselves by their national heritage rather than by a racial category?7 Also keep in mind those determining who was Black or White were usually not the people being categorized. Moreover, this designation usually followed the “one-drop rule”—in the United States and Canada, this means if you have “one drop” of African blood in you, you are Black (Jordan 1974). (Just to illustrate, there are famous court cases from the 1940s and 1960s about self-identified White people who had one Black great-grandparent, got married to another White person, and were charged for violating anti-miscegenation laws.) In some Latin American countries, it’s the opposite: If you have one drop of Caucasian blood in you, you are White (Davis 1991). The point is, if these surveys were created and fielded by a more diverse group, we would see different data collected, something more sensitive to the diversity in the country and perhaps to the diversity of how people designate themselves.8 Ultimately, in all research, our worldviews, experiences, as well as the unstated and often deeply embedded assumptions that go into our research shape how we collect and analyze data (e.g., Boroditsky 2017).

172

Chapter 8

The reflexivity in which ethnographers have been so careful about engaging is a good habit for all researchers to practice. To strengthen the empirical value of our work, we should recognize how our own preferences, biases, and bodies shape how we collect our data—and later on, how we analyze it. But we should also embrace the way in which our unique approach to our data collection allows for new insights that others might have overlooked.

Letting Your Own Style Shine Through There are a lot of variations in how climbers climb. Climbers use different styles on different routes, but they also tend to gravitate to one or the other depending on their body type, strength, and personality. At age 19, pro-climber Margo Hayes became the first woman to climb a 5.15a route (La Rambla, which only about a dozen men had climbed before because 5.15a is ridiculously hard—holds are very shallow and really far apart, and the climbing often requires crazy body positions to maintain balance). Male climbers could use a combination of pure strength and height or wingspan advantages (when technique was not enough). Since she is relatively short (5’3”) and petite, with different musculature than most male climbers, Hayes used a different style of sending the route. Drawing on her excellent flexibility and lengthy experience with gymnastics, she followed a much more gymnastic approach, doing moves her male counterparts couldn’t do. (The video of her send is really quite incredible. Seriously, google it.) This creativity and physical ability helped her send La Rambla, putting her in the pantheon of extremely elite climbers (Reel Rock 12: Break on Through 2017). Rather than forcing her climbing style to mimic others, she used her own approach and was amazing.

Fieldnotes What might perhaps be ethnographers’ most useful strategy (from my perspective anyway) is their habit of routinely taking detailed notes about what they see, felt, or generally experienced in the field. These are called fieldnotes— basically notes to yourself about what happened that day or a particular moment you want to remember and come back to.9 For ethnographers, fieldnotes are a basic tool of the trade. Interviewers also use fieldnotes, but theirs include transcriptions of parts of their interviewees’ responses (even if the main

Bivvy Time173

responses are being audio-recorded) in addition to notes about what the interview setting was like and other ideas that came up during or after the interview. But even if you are not having face-to-face contact with the people you are researching—because you are reading their interactions online, in newspapers, or through archival documents—you should still take fieldnotes. There are several key components or rules of thumb for writing fieldnotes: First, record the stuff you observed—make notes about your interview, the documents you read, what you see people doing. This is your data. Ideally, you take lots of notes while you are in the field collecting data and then, later in the day, annotate and update those notes. You also might annotate these notes when you are reviewing your data—for example, transcribing photos of documents or listening to your audio recording of your interview. For example, you realize someone’s tone of voice changed, and you want to note that in the transcript. Or you remembered some detail that you didn’t record at the time. I like to keep these annotations in a separate category in the same note—stuff I recorded in situ (that is, while I was there and actively observing my data) is separate from the stuff I thought of later, just to be careful about mixing in stuff from memory versus stuff that I took down originally, in case there are any conflicts. In the case of archival work, I’ll summarize or transcribe documents (depending on how important I deem their substance), but I’ll also have space in my fieldnotes for ideas that came to me as I was reading these documents, or ideas that I thought of after I physically left the archive but that were inspired by the documents I read that day. Second, record your data collection techniques. How did you collect your data? For ethnographers, this might be things like questions they asked of people in a place, but it also might be things like where they were standing and for how long. It might also include things like what they were wearing—because, remember, that might affect data collection. For people doing archival or internet-based research, recording your data collection techniques is definitely important: What boxes of data did you use? What websites did you visit? What search terms did you use? What document series were you using? How far did you get that day? Are there photos, screenshots, or downloads that you took? Since I often take pictures at archives (at least where cameras are allowed), I’ll write down the range of photos or how many I took so I can match up my notes to the photos. Why do you want to include notes on these things? These notes are essential for keeping track of your methods and your reasons for using certain methods, looking at a particular source or web page, talking to certain people, and so

174

Chapter 8

on. Projects can go on for a long time. We might spend months or years collecting data. But even if you’re pretty quick about it, there is still the publishing process, which can easily take two years and usually more. You might get a reviewer (someone reading your work and giving you feedback and deciding whether your research should be published) who asks for more information about your interactions with the people you asked to keep diaries or about your specific instructions about what to include in those diaries. If you rely on memory alone, you risk losing that data (it’s so easy to forget things) or remembering incorrectly (I can’t tell you how often that happens for me). So having notes on this sort of thing is really important. And it might not end there. You might decide you want to return to the project and supplement this earlier data collection with new data. Maybe you want to do comparative work; it would be important to remember exactly how you did it the first time, so you can follow the same procedures and not introduce new bias into your data collection. Remember, comparative work requires a bit more rigidity in research design. Finally, record your own reactions to your data and setting. This goes back to our discussion of reflexivity and the awareness that you are the instrument of data collection. Fieldnotes are a great medium in which to pay attention to your own response to the data you observe, even when these are emotions or confusion. Your emotional reactions can cue you into things that are important, but you don’t yet know how or why. They can also alert you to particular examples in your data that can resonate well with your readers—if you had a particular reaction to an example, other people may as well. Emotional reactions are great for sharing understandings: Visceral reactions help memories stick in your mind. This is partly why TED talks are so popular and why so many of them are personalized; there’s actually science behind it that suggests these things help the speaker and listeners’ brain waves to sync up (Gallo 2014; Hasson 2016). Something similar must be going on when you read an emotionally evocative story. So record those reactions—and, very importantly, whatever triggered that reaction, too. As much as possible, you want to record all of this stuff as soon as possible— while it’s still fresh and new, so during or immediately after your data collection. Now a caveat. I said you want to record it while it’s fresh. Yes. Good. But. Sometimes, recording your notes as you go, or immediately after you collected your data, is just way too hard or even impossible. There are some settings in which you don’t have access to any note-taking device—not your phone on which to jot a text message to yourself, a notepad, an audio-recording device,

Bivvy Time175

a laptop, or even a pen and scrap paper. Alternatively, maybe you took your fieldnotes but want to add in your annotations while they are fresh, but you just drove for two hours to get to your site, interviewed people for six hours, and drove back another two hours, and you are wiped out. It’s okay to go to bed and do it in the morning. Some people will judge you for that but fuck them—sleep comes first. Each of these strategies—recording your data, recording your data collection techniques, and recording your reactions to your data and settings—will be helpful regardless of which method you are using. Sure, ethnographers have perfected the fieldnotes technique and written most of the guidebooks out there about taking fieldnotes (see, most especially, Emerson et al. 1995), but that doesn’t mean the rest of us can’t use them—or borrow ethnographers’ tricks to streamline whatever note-taking we are already in the habit of doing. What does this look like in practice? It can vary from project to project. When I went to the archives in Philadelphia the second time, I had this approach pretty nailed down. I used one document as a running list of my fieldnotes (Figure 8.1). I began the document with the collections I planned to use during my visit as well as an approximate order of most important to least (in case I ran out of time). Then each day was clearly marked off in my notes, and I included what I looked at that day and how long I stayed at the archive. Because I was also photographing the documents for later transcription, I listed how many photographs I had taken that day so I could make sure my records matched up later. I also included brief notes for every document that I photographed (designated with “photo”); sometimes, if I was really in a hurry, it would just be the date and the fact that a photo was taken. In the evenings, after leaving the archives for the day and after dinner, I would check my data. Mostly, I’d upload my photos and make sure I had them all and sometimes extend my transcriptions for those photos I didn’t fully transcribe at the archive. When doing these checks, I would add other notes, sometimes about the data themselves and sometimes about my data collection. For example, one evening (included in the fieldnote shown in Figure 8.1) I realized that I did not have a photograph of the date of one of the documents (one of the entries) so I added, “I think I forgot to photograph the date.” I also recorded how things changed over time. On my first day, I recorded some notes about how I was approaching the data collection—always photographing up to the next line after whatever I had wanted to collect to ensure that I had not accidentally clipped an image, for example. As my trip was

176

Chapter 8

Figure 8.1 My Fieldnotes from the Archives. This fieldnote contains information about how I collected the data that day as well as my preliminary transcript from the materials I was using. It also indicates what I photographed.

coming to an end, my method changed because now I was scrambling to make the most of my remaining time. I made a note of this change because I wanted to know that any differences in the data I collected were due to choices I made, not because of the original data. Fieldnotes come in handy with other types of data collection. When I was recording the warden’s log for Eastern State Penitentiary, I had a digital copy of the archival records that I had scanned from microfilm and then transcribed into a Word document. The document I was using to transcribe the archival records was my fieldnote. It was more than the transcript—it also contained

Bivvy Time177

my notes. My notes included anything about the records that would not be immediately clear in my transcript. For example, the handwriting would often change as new wardens came and went, or the current warden was out sick or traveling and someone else temporarily maintained the diary. When this would happen, I recorded the handwriting change in my notes so it was clear that someone else was maintaining the log. Sometimes, particularly if the warden was out sick or traveling, the pages might get out of order, so I would also record that in my notes (Figure 8.2). Sometimes, taking fieldnotes is tedious and boring, but it’s still necessary. Beyond providing you with a record of your data, allowing you to restate (or recreate) your methods, or creating early cues to what’s important, your fieldnotes, however inconsequential they may seem at the time, can be extremely

Figure 8.2 My Fieldnotes for an Archival Transcript. The original image of the warden’s log and my fieldnotes, containing the transcription of that same text along with my embedded notes.

178

Chapter 8

useful. There might be something in them that you don’t know is important until later. Maybe something happens several months down the line that shines a new light on some interaction you had or article you read that you didn’t think was significant. Maybe something happens that seems familiar, but you can’t remember why—until you reread your earlier fieldnotes and start to see a pattern happening over time. Whatever it is, these fieldnotes themselves are a form of data. *

*

*

This chapter has borrowed from prototypical fieldworkers the mentality and tools that sharpen your qualitative data collection. Specifically, what I’m calling the fieldwork model highlights the importance of reflexivity and fieldnotes as key parts of the data collection process. Whether you see yourself as a fieldworker (because you are doing an ethnography or conducting interviews) or not (because you are working from a computer or in an archive), the fieldwork model can be really useful. If you started to notice that data collection in the fieldwork model already bleeds into analysis, and that there isn’t a sharp distinction, you are correct! For the qualitative social scientist, data collection and data analysis pretty much go hand in hand.

9

The Crux Content Analysis, Analytic Memos, and Other Tricks

Crux The most difficult portion of a climb. Wikipedia (2020), “Glossary of Climbing Terms”

But Will It Go? “It goes, boys!” That’s the phrase Lynn Hill famously said when she reached the top of The Nose, El Capitan’s most iconic route, in 1993. The Nose is one of the most difficult climbing routes in Yosemite Valley—so difficult, in fact, that for decades after its first ascent (via aid climbing), no one believed it was possible to free climb it—that is, climb the rock itself rather than using pitons and nylon loops as in aid climbing. Free climbing on a route like this is super hard because one has to hold onto shallow or sloping handholds and smear one’s foot on ripples in the rock rather than stepping on good holds—meaning, it takes excellent technique, balance, and mental focus. Hill was the first person to free The Nose at a time when few thought it was even possible. In climbing jargon, the phrase “it goes” means a route is, in fact, climbable. Often when someone is working on a first ascent, people will ask them, “Do you think it will go?” meaning, “Do you think it’s possible?” That sense of doubt about a project will be imminently relatable for many researchers, but qualitative scholars especially. Going into the field—however you define the field—can be both exhilarating and terrifying: exhilarating to think about what you may find, and terrifying to think you might find nothing at all. For quantitative scholars, this moment may pass quickly, in the time it takes for a regression to run (anywhere

179

180

Chapter 9

from a few seconds to several days with really big datasets), to find out whether the results are interesting or not, significant or not. For qualitative scholars, it tends to drag on. Whether interviewing people, observing situations, or reviewing documents, one can collect and analyze copious amounts of data and not yet see an exciting pattern, an answer to the question, or something that jumps out at them as important. Sometimes, something is interesting, but it doesn’t feel germane to the underlying research question. Sometimes, you read a giant report thinking it will be useful, but it doesn’t provide the types of insights you thought it would—it talks about this but not that. Or you interview someone, maybe many someones, but there are only a few quotations that felt useful. Or you go to an archive that contains multiple volumes or boxes of what sounded like a really useful collection of documents and instead realize they contain only fairly superficial descriptions of the thing you are studying. In situations like these, it’s entirely natural (and common) to think the worst: “Oh, no, I’m not going to have enough data.” “I’m not going to get good enough insights.” “My project is going to fall short.” All of the anxieties and insecurities right up to “I’m not going to finish my dissertation/get a job/get tenure” or “I’ll never get published in that journal I really want to be published in” or “The press will pull my contract!” In my experience (and many other people’s), this is part of the research process. I don’t think I’ve done a single project that didn’t include at least a moment (and more like hours, days, weeks, and more) where I regularly thought, “Oh, shit, this was a bad idea. What have I done?” This experience is common for two reasons, I’d say. First, because anxiety and insecurity are simply rampant throughout academia, we tend to be particularly sensitive to any additional stressor. Second, qualitative research lends itself particularly well to creating the conditions of insecurity and uncertainty. There is so much you don’t know at the outset. It often takes so long to collect your data. There is such a lengthy period between designing your study (if you go that route) and really getting to analyze your data. Data analysis is not particularly straightforward and formulaic—even with a set of tools, there is still a fair bit of flexibility in how you use them, and judgment calls are necessary. Eventually that doubt goes away, but at the outset, when you don’t yet know enough about your project and there are too many unknowns, you might lack the confidence to counteract the doubt. The longer one works a project, the more doubt may arise—until eventually it falls back on itself when your knowledge of the project gives you enough confidence to counterbalance the doubt.

The Crux181

Doubt on the Dawn Wall When Tommy Caldwell set out to climb El Capitan’s Dawn Wall, he had to envision and then execute an even harder route than The Nose. It ultimately took seven years, but he completed the climb in 2015 with Kevin Jorgeson. In his book The Push, Caldwell describes many moments of doubt about whether it was even possible to complete the route he had envisioned. This doubt was palpable when he was interviewed for a documentary called Progression about the obstacles and training that shape various projects— essentially the hard work that goes into major accomplishments. In that documentary, Caldwell says that the Dawn Wall will be a great project for the next generation of climbers and that he was happy to have laid out the route—indicating that he didn’t think he himself would be able to send it. The interview took place not long after another unsuccessful season of trying to send the route and a particularly low moment in Caldwell’s personal life. But as he kept at it, and he learned more about the route and developed new techniques for this unprecedented difficulty of climbing, his confidence grew. Eventually, and despite repeated failures but also slowly accruing successes—for example, falling off the same pitch dozens of times but finally sending it and moving on to the next pitch—he increasingly believed it was not only possible, but that he and Jorgeson would be the ones to complete it. By the end, others were sure, too, and a documentary film crew, as well as dozens of reporters, showed up for the final push that lasted several weeks, from late December to early January.

This feeling of anxiety is why I’m calling this chapter—or rather, the analysis stage it describes—the crux. It’s not that analysis is the most difficult part of the research project, but it is the central hurdle one must overcome, and it’s usually the source of most of our trepidation about a project. Additionally, researchers sometimes do things that exacerbate the challenges of analysis: Either they treat data collection and data analysis as two distinct steps that take place consecutively (first you collect your data, then you analyze it—and you can’t analyze the data until you’ve collected all of it!) or they don’t really analyze their data systematically (they read their fieldnotes, transcripts, official documents, etc., and then think, shit, there is nothing here). This chapter is about correcting this mindset—one might even say these “mistakes”—in order to produce better research and make the whole experience less stressful. Basically, the

182

Chapter 9

sooner you stop just looking at the cliff face and get your hands dirty (analyzing your data), the better the rest of the climb (project) will feel. *

*

*

This chapter discusses the central tools you will need as a qualitative social scientist to analyze your data. While there are certainly more advanced analysis tools, content analysis (open and focused coding) and analytic memos (notes to yourself with varying degrees of analysis) will get you through most projects. Designed and perfected by ethnographers, these tools are once again broadly applicable, whether you are conducting formal interviews, using archival data, or reviewing websites and online documents. They allow you to systematically review your data and, as you do so, keep track of the many insights your mind will be swimming with.

Qualitative Data Collection Is Qualitative Data Analysis Fieldwork is fun: You’re collecting data, and that’s a really good feeling. There is so much potential, so much hope. (The doubt seems to hit most before you start collecting your data or right after you’re done collecting, and it’s time to do a more concentrated analysis of that data.) You also have moments or even whole days where you are elated because you found a really good nugget of insight into a problem—you found a rich document or you had a really deep interview—or maybe you just made a connection while writing up or annotating your fieldnotes for the day. But there is the looming cloud that is the analysis you have yet to do—at some point, you will return from the archive, finish your marathon of back-to-back interviews, or exit your fieldsite and have to sift through all that data and say something smart. As I said, this is the crux of a research project. The trick is, however, not to actually make it (or think of it as) a distinct part of your project, and this is the absolute beauty of qualitative methods. I’ve just described data collection and data analysis as though they are two separate steps, but they aren’t. Already, while you were in the field—at the archive, taking an interview, reading the transcript of an internet forum, observing a neighborhood park—you were analyzing your data when you wrote your fieldnotes. You were already writing down insights—and that, too, is analysis. Moreover, in a lot of cases, you will go back and forth between collecting one type of data and analyzing another. While writing my dissertation, I first systematically analyzed a series of annual reports for the prison I was studying and

The Crux183

then went to the archive and collected more data that I analyzed both there and repeatedly after leaving the archive. I wrote short notes in my main fieldnote document while reading and transcribing the archival documents, but I also reread the material the next morning or in the evenings and weekends and wrote longer memos about themes or other insights that struck me while reading. In all sorts of ways, in practice, data collection and data analysis are (or can be) blurred together. By integrating data collection and data analysis, instead of breaking them up into two distinct steps, we both enrich our insights and stave off anxiety. We all know the anxiety that builds when we put something off—the longer we put it off, the more anxious we get. If we treat data collection as this mass of work we must do before we can get started on the even bigger mass of work that is analysis, we set ourselves up for massive anxiety. We might even convince ourselves that we have to collect more data, essentially making more work for ourselves just to put off what is now the really scary task of data analysis. So don’t do that. Doing the steps described in this chapter will let you start analyzing your data immediately, which will let you start to recognize possible insights—it won’t wipe out the doubt, but it will mitigate it much faster than if you don’t use these techniques.

You Need to Do More Than Just Re-Read Your Data A grad student once came to me totally disheartened about their dissertation. It was a great project, but they were stuck on a central chapter. They’d done these awesome interviews supplemented by collecting some archival documents. When they recapped what they’d found, I was really impressed and quite stoked for them. Why, I asked, were they bummed? “I don’t have any findings!” they said. “What do you mean? You just told me you have awesome findings,” I replied. After some more back and forth, I realized the problem: They hadn’t actually analyzed their data—at least not systematically. They’d read over their data, but they hadn’t done anything beyond that. For that reason, it wasn’t clear to them that they had findings. It was a bit like throwing a bunch of veggies into a pot and expecting it to immediately materialize into soup—you have to cook it first for it to look and taste like soup. It didn’t feel right to this student because they hadn’t actually done the real work of analysis. I suggested they go ahead and do the things I discuss in this chapter and then see how they felt about their findings.

184

Chapter 9

Here’s the thing: Let’s pretend this student hadn’t mentioned those nuggets of insight that I called awesome findings. Let’s pretend that they did their interviews and archival data collection and then were disheartened because nothing stuck out in their data. Actual analysis of the kind I describe in this chapter would help unearth the nuggets those data actually have to offer. My view of qualitative research is that, unlike quantitative research, as long as you’ve set up your project well or you build in some checks later, you will always get some set of publishable findings.1 Your findings may not be as exciting as you wanted, but there will be something there that you can analyze and write about. The challenge is just finding it and writing it up in a way that is interesting to others. Analyzing your data in the way I describe in this chapter virtually guarantees that you will have something to write about.

Content Analysis: Open and Focused Coding Content analysis means different things to different people. For some, it means looking at the frequency of words or phrases in texts (whether official documents or interview transcripts). That’s not how I’m using it. Instead, I mean coding: assigning a label, tag, or “code” to some piece of text. (This is sometimes called analytic coding or ethnographic coding.) This type of analysis is often associated with Grounded Theory (Charmaz 1983; Glaser and Strauss 1967), a technique by which researchers let the data guide them completely in their analysis and theory generation without getting polluted by the expectations of existing theories. Basically, a scholar using a Grounded Theory approach will read their interview transcripts, ethnographic fieldnotes, or archival documents and, rather than imposing categories from existing theoretical frameworks or the literature on their topic, they let the data themselves suggest relevant categories. However, I’m not going to use Grounded Theory as my model here because, as Robert Emerson et al. (1995, 144) point out, you can use content analysis with or without existing theory. And, as they have also pointed out, it’s really hard to forget theories once you learn them.2 So, instead, following Emerson et al., I’ll talk about performing content analysis both with and without existing theory. Basically, you can use this chapter whether or not you like Grounded Theory.3

The Intuition Content analysis should be particularly intuitive for academics—we already do a fair amount of coding when we read the literature. For example, when we

The Crux185

take and organize our reading notes, we often sort the information according to tags or categories: I have folders in my computer containing articles from the different literatures that interest me. Likewise, I sometimes write in the margins of a book or article a term that signifies that the stuff I just read is related to something—for example, I’ll commonly write “Foucault” next to anything relating to the transition from punishment focused on the body to punishment focused on the soul. Content analysis should be doubly intuitive for Millennial and Zennial academics. These generations have grown up with the idea of tags (in blogging) and hashtags (with Twitter and Instagram—and whatever comes next). Sometimes I’ll even use hashtags when I’m talking or lecturing; while living in Canada, I would say something and then add “hashtag American privilege,” essentially making explicit some subtext about how, as an American, I take certain things for granted that folks in Canada can’t or don’t. I’m sure some people find this obnoxious, but it’s a way of making a somewhat serious point somewhat humorously (or humorously to me, anyway). Whether we are tagging something in Instagram or in our speech, we are relating some image, some statement, some story to a larger theme or idea. Every image or statement can have multiple hashtags (or themes); it’s not like a check-the-box mentality where something must fit in one box alone, and you can only have one label to describe it. Tagging is much more . . . flexible (notice the theme?). Also, if you don’t like the labels you have available or that others use, you just make/add your own. Content analysis is just like that. What’s the goal here? Why are we doing this? The goal is to end up with a list of quotations from our documents—whether interview transcripts, our fieldnotes, or official reports—organized by a particular topic, theme, or other idea. For example, I had a list of all the escapes (both successful and attempted) from Eastern in a 40-year period reported in the warden’s log. I also had a list of every quotation in the prison’s annual reports where the prison administrators defended their prison (this was a very long list of quotes). Figure 9.1 shows a small excerpt from one of my lists of quotations in which the administrators defended themselves by talking about how humane they all were. I had a lot of lists. I then reviewed these lists to better understand what was going on. For example, I could read these lists to figure out changes over time, such as how the prison administrators used different strategies to defend the prison against criticism over time (Rubin 2021). Or I could look at my list of all the escapes and what was recorded about them and find some commonalities, like the way

186

Chapter 9

Figure 9.1 The Goal of Content Analysis. This excerpt from a list of quotations illustrates what we’re working toward when we do content analysis.

in which a lot of the escapes were made possible by the prison itself—literally, its architecture—or by the prison regime—such as by providing work tools that the prisoners then used to enable their escape (Rubin 2017b). Additionally, these lists of quotations from our documents are extremely useful when it comes to writing things up, first in analytic memos (as I discuss below) and for later publication.

Some Terminology Before explaining the steps you should follow when performing content analysis, I want to first be really clear about what coding actually means. You are essentially highlighting, flagging, or otherwise identifying some set of text and associating it with some concept or theme—and sometimes a person, period, or place. Although you don’t have to, I’m going to use hashtags to distinguish codes. (Some people prefer to use words in all caps, which I’ve certainly done sometimes. But I’ve found hashtags are helpful because they are easier to search for.) There is a lot of variety in what you can code—and what you can name your individual codes. For example, when I was coding the meeting minutes of a group of nineteenth-century Philadelphia-based penal reformers, I used the code #flaky to mean “failing to show up, produce report, taking a long time” (this was a super common theme for these volunteers who led busy lives). I’ve also used other codes that are probably self-explanatory: #prisonersex,

The Crux187

#guardmisbehavior, #escape, #defensive, #claimingexpertise. Some of these were descriptions of particular events or episodes, and sometimes they were my gloss on something that was being said, especially how it was being said or what the subtext seemed to be. Sometimes I used a placeholder code for really exceptional instances—I called these #gold because they seemed to be really valuable analytically. As you might already have guessed, some of these codes are purely descriptive and have nothing to do with existing theory; other codes, however, can be informed by theory. Depending on your proclivities, you might have somewhat theory-relevant codes, but it’s okay if all or almost all of your codes are driven entirely by your data. That’s how the Grounded Theorists do it. Each of these codes gets associated with various pieces of text. Sometimes, the text might be a single word; that’s not always very useful, so more often it’s a phrase, a sentence, a paragraph, or more. It can be a bit unwieldy to label an entire page or document with a single code, but you can and you might. Sometimes people even code images. Notably, you can use more than one code at a time. For example, if I’m reading a document that includes a story about a guard helping a prisoner escape, then I could include both #guardmisbehavior and #escape to label or code that story. Alternatively, if I’m coding a reformer pamphlet, I could code the entire thing “pro-PS” or “anti-PS” (for whether they supported or opposed the Pennsylvania System that Eastern was famous for and that I was interested in explaining). You can group your codes into what are sometimes called “coding families” (although different software uses different terminology, so you might end up using the terminology of whatever software you prefer). For example, I used a coding family that I called #purpose, which I defined as: Statements suggesting the purpose or goal of punishment, prison, or of a particular activity within prison (including the “benefit” of a particular practice, change, etc.). These may be explicit statements that the purpose of punishment is X or more implicit statements (e.g., practice Z helps us to achieve purpose X), or discussions about what should be the case and what shouldn’t be the case.

This was a pretty big family because prison administrators and penal reformers listed so many different purposes of punishment in the documents I read—that in itself became a finding, by the way. Figure 9.2 is a list of just some of those

188

Chapter 9

Figure 9.2 An Example of Codes. These terms and their descriptions are examples of the codes I created and compiled in a codebook.

reasons that they recited, according to my coding scheme. These codes and their definitions are taken directly from the codebook I used during my dissertation research. So now I should explain what a codebook is. The concept of a codebook will be familiar to anyone who has ever used quantitative data. In statistical datasets, a codebook includes the variables— their short-hand names as well as their meanings—and the values associated with them. For example, HPY might be “Happy” and it has values of 1–6 (where 1 is low, 5 is high, and 6 is NA or not applicable). Qualitative scholars use codebooks, too, and they are pretty similar; in fact, the major difference is the qualitative codebook is unlikely to have a bunch of numbers in it. A codebook for qualitative researchers is a document that lists all your codes (by the names you use for them) along with their definitions for a particular content analysis. You might also include some examples in there, but some people argue that’s a bad idea because it can limit your focus to only those types of things and foreclose different examples of the same code. I’d say that

The Crux189

can be a problem early in the coding process, but it’s not necessarily a problem later when the codes are pretty well settled. So what is that process?

The Coding Process The basic process behind content analysis can be summarized in five steps, although you might vary these steps a bit depending on your project. I’m going to start by assuming you have all your data collected already, but really you can do this as you are still collecting. 1. Identify a Representative (or a Diverse) Sample Set of Documents. Whether you are using interview transcripts, ethnographic fieldnotes, archival documents, or internet text, these are what I mean by your documents. If you are using more than one type—ethnographic fieldnotes and interviews, for example, or different sets of archival documents (in my case, the annual reports and the warden’s log)—you want to do this separately for each type of document.4 Within that document type—your interviews or my annual reports—give yourself a diverse set of documents to start with. When I had a collection of about 80 annual reports to read, I choose one from each decade to start. 2. Code Everything: Open Coding. On your first read through each of these documents in your (maybe representative, maybe not) sample set, start coding everything. When you are doing open coding, you are basically “open” to everything—every possibility, every element in your data. So you code everything that you can. Because you are coding everything, you are going to end up with a gigantic list of codes. Don’t get too attached to them. 3. Pare Down Your Coding List. That giant list of codes is your preliminary coding list. This list will be too big to work with. My preliminary coding list, in which I also included working definitions for each code, ran to 13 pages and approximately 150 codes.5 Way too many codes to keep track of.6 Also, a lot of them turned out to be not very important, and I still was going to add a lot of other codes. So at this stage you want to go through your list and think about what is really interesting and what codes are actually related to your interests. This is where you pare down your list as needed. Remember, you are a Dirtbagger, so you might come

190

Chapter 9

back to your dataset and use some of those codes for a different project, but for now, which codes really stand out to you? 3a. Optional Rinse and Repeat. If there is a lot of variability in your documents, you might do some open coding on another sample and see if you are coming up with the same codes or new ones. It’s likely you’ll come up with new codes even if your documents aren’t that different. 4. Zero in on What’s Important: Closed Coding. With your pared down list, go through and code your full sample of documents. This is where you systematically use your codebook to code the documents that you have. You are focused only on the codes you have decided are important. You are now “closed” off to other possibilities or elements in your data. Keep your codebook handy during this process. Before I start coding each time, I like to reread my entire codebook so the codes are fresh in my head. You will miss things, but that’s okay because of the next step. 5. Rinse and Repeat. You always want to code your documents more than once. Depending on the number of codes and your use of coding families, one strategy you might use is to read each document looking just for one coding family, or some subset of your codes, at a time. This is helpful for not getting overwhelmed by how many codes you have. Additionally, as you code new documents, you are likely going to add codes or split them up. Every time you do that, you need to go back and recode your earlier documents, looking for instances of those new codes. Your coding won’t be perfect, but it’s good to try to be as consistent as possible across documents. Now a warning: Coding takes a long time, especially when done well. All these recodings I’m saying you need to do . . . they add up. Plan in advance.

Practice Time! To distinguish between open and focused coding, let’s practice with an example. The following is an excerpt that I transcribed from the warden’s daily log at Eastern. Use this excerpt to practice open coding for a moment. Basically, code everything that you see here by writing some hashtags in the margins. You might add brackets or draw some bubbles around some of the text, or you could use different-colored highlighters or pens (but keep in mind that you usually end up with more codes than available colors). Get ready, though. It’s going to get messy.

The Crux191

1831 2nd month, 8, 1831: “Went to Harrisburg on business of the Institution and returned on the 17th” 3rd month, 15, 1831: “Proceeded to Harrisburgh, and obtained the passage of two Laws for the enlargement of this and the building of a new County Prison, and returned on 29th.” 4th month, 6, 1831: “Irwin Lean [a guard] received two stabs from one of the prisoners (No. 10) not dangerous done with his shoemaking knife. Dr Bache [the prison’s staff physician] visited” 9th month, 2, 1831: “Mem Bacon, Cox, Bradford, & Hood [members of the prison’s Board of Inspectors] visited prisoners. The French Commissioners, De Beaumont & De Tocqueville, also here to see the prison” 10th month, 13, 1831: “French Commissioners visited.” [AR: I stopped spelling out the month here] 10.15.1831: “The French Comm. Here examining the Institution, in company with the Inspectors.” 10.17: “One of the French Comm. here” 10.18: “De Tocqueville and de Beaumont French Commissioners visited.” 10.19: “The French Commissioners here.” 10.20: “The French Comm. visited.” 10.21: “Judge Cox and French Comm. here” 12.3: “The Board of Inspectors met except Judge Coxe all present, passed resolution directing me to appoint a nurse a matron and directing me to receive all prisoners from the county for Larceny and that the convictions should not be discharged on first day [i.e., Sunday].”

Have you done it? Maybe read it a second time to see if you missed anything. Okay. There are a lot of different ways you can code this—and remember, there’s #NoOneRightWay! The first step to discuss is simply which codes you decided to use. For example, I would use codes like #wardenstravel, #visitors (maybe split this up between #inspectorvisits and #guestsvisit to distinguish between the recurring if

192

Chapter 9

periodic visits of the prison’s inspectors and the week of visits from De Tocqueville and De Beaumont), #boardwork, #violence, #inmatemisconduct, and #NewLaws/Rules/Regulations (or maybe split this into #laws and #internalrulesregs to distinguish between the 3.15 entry and the 12.3 entry). You might have come up with different terms for these things or focused on other things entirely. You might also add codes for specific officials’ names (e.g., Bache, Bacon, Cox, Bradford, Hood, Beaumont, Tocqueville). When I’ve given this excerpt to students to code, no one ever comes up with a code for misspellings (for example, Harrisburg is spelled two different ways in the excerpt, as is Cox), but if one was interested in non-standardized English, that might be a relevant code. The next question is to determine what pieces of text you would attach each of these codes. Here’s how I would do it (but again, yours might look different): 1831 2nd month, 8, 1831: “Went to Harrisburg on business of the Institution and returned on the 17th” #wardenstravel 3rd month, 15, 1831: “Proceeded to Harrisburgh, and obtained the passage of two Laws for the enlargement of this and the building of a new County Prison, and returned on 29th.” #wardenstravel #laws 4th month, 6, 1831: “Irwin Lean [a guard] received two stabs from one of the prisoners (No. 10) not dangerous done with his shoemaking knife. Dr Bache [the prison’s staff physician] visited” #violence #inmatemisconduct 9th month, 2, 1831: “Mem Bacon, Cox, Bradford, & Hood [members of the prison’s Board of Inspectors] visited prisoners. The French Commissioners, De Beaumont & De Tocqueville, also here to see the prison” #inspectorvisits #guestsvisit

10th month, 13, 1831: “French Commissioners visited.” #guestsvisit [AR: I stopped spelling out the month here] 10.15.1831: “The French Comm. Here examining the Institution, in company with the Inspectors.” #inspectorvisits #guestsvisit 10.17: “One of the French Comm. here” #guestsvisit 10.18: “De Tocqueville and de Beaumont French Commissioners visited.” #guestsvisit

The Crux193

10.19: “The French Commissioners here.” #guestsvisit 10.20: “The French Comm. visited.” #guestsvisit 10.21: “Judge Cox and French Comm. here” #inspectorvisits #guestsvisit 12.3: “The Board of Inspectors met except Judge Coxe all present, passed resolution directing me to appoint a nurse a matron and directing me to receive all prisoners from the county for Larceny and that the convictions should not be discharged on first day [i.e., Sunday].” #inspectorvisits #boardwork #internalrulesregs

Obviously, a lot of this is minutiae and might not be super important. We won’t really know until later what is important, but we can start to narrow it down based on our interests. If I’m doing a project on prisoner resistance and rule violations, the only code I might want to keep is #inmatemisconduct. If I’m doing a project on the internal regulation of the prison, all except #laws might be relevant. If I really don’t have a strong sense of what my project is focused on, I might code some more documents as part of my preliminary sample to see what turns up. But keep in mind: You can always go back and recode when you have a better sense of where your interests lie. Again, not super efficient, but it will get the job done.

Variations So now that we’ve discussed the steps and practiced coding, let’s talk about variations in the process. You don’t have to follow the steps that I laid out exactly—we’re Dirtbaggers after all. In fact, the steps you might follow are pretty similar, but slightly different, if you are starting your analysis while you are still collecting data. Let’s say you are coding your first three interview transcripts, but you have more interviews to do. This is a common strategy and not a bad idea, but it means you don’t necessarily have a good-sized representative sample to start with. That’s fine—good even! Instead, take the documents you have and start with those for step 1. You might continue to do open coding on each new interview transcript as it comes in and wait until after you’ve finished your interviews and finished open coding each interview transcript before moving on to step 3 (assembling your pared down coding list). Alternatively, you might go back and forth between open and focused coding: open coding what comes in, tweaking your coding list, and doing focused coding for each interview. This can get a bit messy, and it might mean constantly adjusting your coding list, but

194

Chapter 9

that’s going to happen anyway, truth be told, up until your last round or two of focused coding. For example, when I was doing my focused coding, I both split and combined different codes over time. What I had been separately labeling #humanitarian, #benevolent, #progressive, and #mild, I ended up combining into #positivecharacterizations. What I had been coding as #defensive (my term for anytime it just seemed like the administrators were a bit defensive or were specifically defending their prison), I split up into #scapegoating (blaming someone for a bad outcome), #excuses (blaming some thing or situation for a bad outcome), and #statistics (using statistics to claim that things weren’t so bad), which referred to the dominant ways they defended their prison. This was really necessary because, honestly, my codes were sometimes kind of ambiguous. I didn’t always use clear labels, and sometimes something that seemed crystal clear when I first identified it was a bit fuzzy when I encountered it a few weeks later. (And it turns out this is a pretty common occurrence among students and colleagues of mine.) This is why recoding is really helpful: You can clarify your thinking with ongoing exposure to your data. As this discussion suggests, coding is, to some extent, a subjective process. Different people will likely code things differently. By different people, I’m including you today and you one month from now. As you read different texts, gain new experiences, talk to different people—either in your normal life outside of research, as part of your academic training, or in your fieldwork—your interests can change, what you find salient can change, and what you are simply attuned to noticing can change. But there are also ways to make sure you are being consistent. Let’s say you coded your interview transcripts, ethnographic fieldnotes, or focal texts in temporal order (that is, the order in which you did the interviews, took the ethnographic fieldnotes, or the texts were generated). You might have coded the last half differently than the first. That’s okay—but code them again. Especially if you are interested in how things might change across time, place, or groups, you might want to recode things in a different order, just to make sure any trends you see related to any of those different times/places/groups really is related to those differences and not how you felt while coding or what you were attuned to at that time. I like to read documents in temporal order so I can trace their development but then recode the documents in randomized order just to get rid of any change over time in how I code things; if I do continue to change how I code, it at least won’t be correlated with the date of

The Crux195

the documents and order of events. (Remember our discussion of threats to internal validity? This is a less talked about version of those, but recoding checks like this help us avoid such problems.) In general, though, inconsistent coding is less of a problem as long as you have really clear coding categories, and you code each document several times. That will help you to code consistently. In fact, that’s how people use content analysis for quantitative analyses. They’ll have teams of coders that they train to code, and then have more than one person code the same document so they can compare the coding across coders. The rate at which different coders agree is called “intercoder reliability.” The higher the agreement the better—because quantitative analyses are about the relationship between two or more variables, and you want to make sure the observed relationships are accurate and not just a reflection of who did the coding or when. If you aren’t doing any sort of quantitative analysis, you don’t need to do something like that. However, if you are nervous about how reliable your coding is—if even after you recoded you feel like you still are systematically missing or overcoding something—it doesn’t hurt to hire an undergrad or two to go through a sample of documents and code according to your codebook to see if they turn up results that are similar to yours.7 This can be a source of peace of mind or an indicator that your coding is off. But don’t feel obligated to do something like that—following the five steps of the coding process should be sufficient.

Technologies When you actually start coding, you wouldn’t do what I did above and just add a hashtag (or your preferred method of denoting a code) next to a sentence. (Okay, I do that sometimes when I’m in the archive and just want to flag something, but I don’t do that as part of my systematic coding process.) You want some technology that will help you to do this more easily. A lot of people who do content analysis use computer software, sometimes referred to as CAQDAS (computer-assisted qualitative data analysis software); programs include ATLAS.ti and NVivo, among others. These programs basically make it much easier to do the really messy work of coding. One thing I should clarify is that while these programs have an “autocoding” function—in which they will automatically code a word, sentence, or paragraph with a particular word or phrase in it—they won’t actually do the coding for you. (And generally, you want to be really careful about autocoding. It’s a blunt tool and not really what content analysis of the kind I’m describing is about.)

196

Chapter 9

While these programs can make your life easier, some of them are really expensive. Some have trial copies you can use (for a short period or indefinitely with a lower functionality), some have academic licensing options that are a bit cheaper, and sometimes you can get the software for free or at a discount if your university has a licensing agreement. Computers in your college library or computer lab also might have one or another of these programs installed. Some of these programs are pretty awesome. But, if the cost is a barrier, if you are intimidated by computer programs, or if you just don’t like any of these programs (some of us are certainly frustrated by the limited functionality of certain programs—I used one that made my computer crash repeatedly), there are alternatives. The most low-tech option is to create a document (in, say, Word or just a text document in your computer’s built-in text editor) and then use copy and paste. You might start with your codebook, make each code into a heading, and then copy and paste every piece of text into the appropriate spot (under the relevant heading). That might mean pasting the same piece of text in multiple places when there are multiple codes of interest. The output will actually be pretty similar to what the output would look like if you are using the software. (See Figure 9.1 again for an example of my list of quotations for a specific code I had used.) If you have a lot of data, this approach can get out of hand quickly. One alternative, then, is instead of listing the quotations under each heading (for each code) is to put the quotations in footnotes or endnotes in your document, where the footnote or endnote is linked to a brief description of the code. This lets you see each code and then skip ahead to the associated quotations (Figure 9.3). That can still get messy so another version is to just have a different document for each code. Keep them all in one folder on your computer so you can keep track of them and also search the folder. This way makes each document more manageable. A different approach that I like is an online blog.8 I use Blogger (which is free), but others will do. Very important: Change the settings so that it’s private—you don’t want someone stumbling across it. When using a blog for coding, you have different options. For one, you can create a different blog post for each document. First, copy the document’s whole text into a blog post and then code the text by adding hashtags in the text. Then you can use the search feature in the blog looking for all instances of a particular hashtag. The

The Crux197

Figure 9.3 Coding Using Footnotes. One low-cost way of generating a list of content produced by codes of interest (in this case, my list of quotations) is to paste each relevant quotation into a footnote associated with the code in a new coding document.

advantage here of using hashtags instead of the blog format’s built-in tagging option is (a) it keeps the hashtag with the specific quotation you intended, and (b) you can’t search for two tags at the same time, but you can search for two or more hashtags at the same time. One downside is you don’t end up with a list of all your content associated with each code (in our example, the list of quotations); instead, this setup will pull up every document (blog post) with that hashtag or combination of hashtags, and you will have to go through each document to see the relevant quotations. Inefficient and clunky, but ultimately effective. That said, this method really works best for shorter documents or segments of text. You could also break up your documents into smaller segments— for example, split up your interviews by interview question (and name the blog posts something like “Interview with A: Question 25”) or your big text into smaller text chunks (e.g., “Warden’s Log–Wood, 1831”). Another downside is that you need internet to access your data analysis.

198

Chapter 9

Other people use Excel, and I have too on occasion. As you read a document, if you find content (in our example, a quotation) you want to code, you can copy and paste it into a line in a spreadsheet. Next, put an x in the column(s) that has the relevant code. See Figure 9.4 for an example of what this looks like, using the portion of the transcribed warden’s log we used for the practice coding. As you might imagine, this approach can get pretty unwieldy if you have a lot of codes; it’s hard to see more than ten codes at a time with this approach. Instead, you might create a different spreadsheet for each coding family, which means the same quotation might appear in different spreadsheets. In that case, just title the spreadsheets by the coding family’s name. A final option that I’ve used is Google Forms. This free Google feature lets you create surveys and such, which are then linked to Google Sheets (meaning you can see the answers to your surveys in a linked spreadsheet). You can set up a form with a question for citation information (what document and page number a quotation comes from, for example) and then add in options for each of the codes, ideally sorted by coding family. This can be helpful for your focused coding because you can review each code before moving on (to make sure you didn’t forget any codes, which can happen—especially when you have a lot of codes in your codebook). For every entry, you can check as many boxes (signifying the different codes) as you want. See Figure 9.5 to see what one of my forms looked like when I experimented with this approach. In this example, I had set up the form so my undergraduate RA could code a subset of documents to see if the trend I’d been seeing in the data was recognizable to someone else. The point is that there are a lot of technological options, some of which are free, for making the coding process easier. Each approach, including some of the expensive software, can be a bit clunky. Use the technology you like best. For some people, that’s printing things out and using pen and paper and scissors, but that seems to be less popular now with all the computer technologies available and our growing preference for going paperless. Keep in mind that you might use different technologies for different projects. Each one can be more or less useful depending on the amount of data you have, the quality of your data, the type of data you are using, and whether you are working with coauthors or research assistants. So don’t feel as though you need to stick to the same technology.

Figure 9.4 Coding Using Excel. Using the transcribed warden’s log, we can code using Excel, with the x’s in the columns next to each entry signifying the presence of the theme or code at the header.

Figure 9.4 Coding Using Excel. Using the transcribed warden’s log, we can code using Excel, with the x’s in the columns next to each entry signifying the presence of the theme or code at the header.

a

Figure 9.5 Using Google Forms. (A) Excerpt from customized Google Form with citation information and space to copy and paste a quotation from Eastern’s annual reports.

B

(B) Excerpt from customized Google Form with boxes for each code within the coding family characterizations.

202

Chapter 9

Analytic Memoing While you perform your content analysis, you will also want to write analytic memos. Analytic memos are like fieldnotes in a lot of ways. When I introduced fieldnotes, I said they were like notes to yourself. The same is true of memos. The word “memo” is actually short for “memorandum,” which comes from the Latin “something to be brought to mind” or basically something to remember. That’s really all a memo is—you are writing down something you want to remember. Your memos are there for you and you alone, so you can put anything you want in them. Moreover, because they are just for you, memos don’t have to be fancy or long—they can look however you want them to look, as long as they are useful to you. So how are analytic memos different from fieldnotes? I often use the terms “memos” and “fieldnotes” interchangeably, because the difference can be blurry, but I’m using them as distinct ideas here to help with clarity (hopefully). Basically, you might write analytic memos while you are in the field, so in that sense they can be thought of as a kind of fieldnote. But you might be more likely to write them after you are no longer collecting data—when you are rereading your fieldnotes and other data or when you are doing your content analyses. And you will probably write more of them in that stage. Overall, we can think of analytic memos as having less to do with data collection and more to do with your insights arising from the data.

Short Analytic Memos The idea of an analytic memo might be a bit intimidating—they sound so impressive, to me anyway—but they don’t have to be super analytical, especially at first. Analytic memos can be long or short. I’ve written many one-sentence analytic memos; obviously, they are not particularly impressive, but they were extremely helpful in my analyses. Here are some types of things you might take memos on at the low-end of the analytical scale: something you want to remember, something you noticed that was new or different, a reaction you had, a suspicion or hunch that you are developing, a pattern you are noticing, and inferences you are starting to draw. As with fieldnotes, you are going to pay attention in your analytic memos to points of confusion and other emotional reactions you have to your data. These reactions might be the beginning of a smaller or new puzzle, and then your memos become more and more analytical as you start to solve this puzzle.

The Crux203

One key piece of advice: Always include some quotation or excerpt from your dataset (your ethnographic fieldnotes, the official report you are reading, a newspaper article, an interview transcript, etc.) that set you off on this idea. It’s happened to me so many times—I’ll have a really strong reaction to a piece of data, write down some insight without the original quotation or context, and then the next day or the next week (or much, much later) come back to my memo and have no idea what I was talking about and have no way of recreating my insight because I can’t remember what piece of data made me think this way. Here are some examples of short memos I’ve written (excluding the original quotations that inspired them—you can see how confusing they can be without those quotations): Does this mean what I think it means? (10/25/2011) I think this is the first use of the term “penologist” (01/01/2012) This is kind of a sad passage. No one is paying attention to their reports and they feel under-appreciated, and that their reports go un-used. It’s so sad. I read your reports! (01/04/2012) Is this the first time they used the term “probation”? (01/05/2012) Propensity to theft isn’t mentioned anymore. There is a much greater emphasis on the role of environment. While he acknowledged human depravity, I took this as a religious we-are-all-sinners-descended-from-Adam thing. Instead, education, religious education, and parenting are important factors here. (01/05/2012)

Several of these examples were actually very important later on in my analyses. As you can probably tell, these memos aren’t useful on their own. (They aren’t even comprehensible without knowledge of my project.) In each case, I had to do more work, but they got me started. You take memos to help you interact with your data, get familiar with it, and start to put pieces together in a giant puzzle—and ultimately to remember the insights you came up with. Indeed, as you get really familiar with your data, you can start to record predictions. When I was reading Eastern’s annual reports chronologically, I was getting a sense for how the administrators behaved and when they felt like they had been backed into a corner. One time, I recorded a brief memo: “The language here is precise. I wonder if the Prison Discipline Society lobbied for this law at all. They haven’t mentioned it, but it is an interesting thought.” Essentially, I

204

Chapter 9

was speculating that the local penal reform society had lobbied for a law that was passed without the administrators’ knowledge or consent, which really bugged them. Nothing in what the administrators had said so far mentioned that explicitly—and I hadn’t yet read the archival documents kept by the penal reform society—but I’d read enough of the prison’s annual reports at this point that I felt like I knew the administrators. The software I was using time stamped this memo at 4:03:37 p m . At 4:07:50 p m , I recorded a second memo, “Yes! I called it! The Act was passed because the society did something to get it. This might explain the later distancing between reformers and prison authorities.” At this point, I had read an appendix to the report I had been reading where the administrators were explicit about what had happened—exactly what I suspected was going on. This episode turned out to be pretty important in my research (e.g., Rubin 2017a). It was also a cherished memory in my dissertation research because I was so stoked at how well I knew the people I was studying. As this last example might suggest, memos can be kind of fun to write. They can record your personal victories and essentially serve as measures of your familiarity with your data. But you can also use memos for other fun things, as we’ll see.

Longer Analytic Memos As you get more familiar with your data, you can start to write longer memos. These memos can be long thoughts and reflections, working through a chunk of data (e.g., a string of related quotations from a particular code or that relate to a particular event), or answers to earlier questions or puzzlements. They tend to run more toward the synthesis end of things—you’ve looked at a lot of data or maybe a lot of different data, and you’re starting to make bigger connections, or you are aware of your more detailed knowledge of your case. One time, when I was deep into reading the warden’s log, I was thinking about how different prisons in the 1830s were from how they are today. I was thinking about how I might explain this to an undergraduate class to help them better understand the time. So, I wrote a memo I titled, “An Introduction to Eastern’s Day and Age.” I listed all the things that had been jumping out at me—for example, that horse theft was still a common crime and that slavery was still going on (not in Pennsylvania, but in neighboring states and sometimes enslaved people’s owners would come to the prison to collect their “property” who had escaped and then been incarcerated for other crimes). I also wrote about the informality of criminal justice—how prisoners often stayed

The Crux205

longer than their prison sentence for a variety of reasons, often at their own request. I wrote about how women and young children were incarcerated in the same facility as men—something we stopped doing a while ago. Some of these ideas I ended up using for parts of my book—some for scene setting and others as part of the analysis—and for an article (Rubin 2017a). Some memos, like the one I just discussed, can be pretty lengthy. Indeed, you don’t have to write them in one sitting; you might come back to them and add more insights or details. I wrote an initial version of my “Day and Age” memo and then added to it as I came across other things I wanted to include and other quotations that illustrated some of the differences. Another reason you might have a really lengthy memo is just that you have more to say, which happens the longer you’re working on a project. As you start to notice more patterns, you can say more about them. Your memos will also get longer over time because you won’t just have more insights but you’ll have multiple examples to illustrate those insights. While your memos should always include whatever quotation or excerpt that got you thinking about something as I’ve already emphasized, your more advanced memos will probably have more data in them. I will sometimes include three to five quotations or excerpts from my data as I start to identify really choice quotations or several quotes that show similar things across different times or people. In the “Day and Age” memo, for example, I included several examples of the warden keeping prisoners past their sentence. I didn’t include every example—that’s what my coding was for—but I did include the more interesting or unique examples so I could show the diversity of cases in which this practice occurred. As you get farther along, your memos will reflect insights and quotations from several different datasets. To stick with the example I’ve been using, this would be the warden’s log and the annual reports or maybe also documents from the penal reform society that kept tabs on the prison administrators (#triangulation). I think of getting to this stage like assembling a 3D picture or constructing a gingerbread house. Earlier memos might give me enough for one wall, but these memos get three or even all four walls—maybe even a roof. I like the building metaphor, but it might be more accurate to use the Indian proverb of the three blind men feeling an elephant. The one who feels the trunk thinks it might be a snake; the one who feels the feet thinks it might be a tree; the one who feels the tail thinks it’s a paint brush. Maybe your earlier memos

206

Chapter 9

aren’t actually that far off, but these later memos are really where you are starting to realize it’s not a snake, tree, or paint brush, but an elephant. As you can probably tell, memos can contain your initial thoughts or an interesting line of thought, but they can be much more than that. In fact, they can actually be the first draft or an outline of some of your writing. You don’t have to think that way when you are putting them together—you might not even realize that’s what you have—but it’s good to keep this in mind later when you are drafting up your article or chapter. You don’t have to start writing from scratch if you wrote analytic memos!

Reference Memos A reference memo is a special type of analytic memo to record important reference information. For example, in historical or chronological projects, you might construct a timeline as a separate document and add important events to that timeline. For my dissertation, I kept an Excel spreadsheet because I wanted different columns representing different types of events like new laws, personnel changes, and so on. In most projects, it’s a good idea to keep a list of all the people who come up in your project. These might be people you interact with directly or people whose names come up once, twice, or repeatedly. If you are really organized, you might keep a list of where they come up. In my case, I didn’t keep a list of where they came up because people were pretty consistent so I generally knew where to look for them, and it was less important for my purposes. However, I did keep a list of all prison personnel, what their role was, and the dates during which they were employed at the prison, as well as other details that became relevant—their religion, so far as I could tell; whether they were also a member of the local penal reform society; and why they left the prison’s employ. These were important factors, and once I figured that out, I diligently kept track of that information (sometimes rereading my documents and doing some additional research to build a full table of information). Finally, you might also keep a “methods log.”9 This might be a distinct memo in which you add dated entries keeping track of your methodological choices. I sometimes keep these embedded in my fieldnotes, but it’s not a bad idea to have all of your methods notes in one place. In fact, if I’m downloading texts from an online repository, I make a specific memo about my search parameters, the date of my downloading, and other useful information. Such methods logs will come in handy when it comes time to writing up your methods section or appendix.

The Crux207

Technologies Where do you keep your memos? As with content analysis, you have a lot of options. Some CAQDAS have a memo option built into the program. You can also keep Word documents (or your favorite document preparation software) and store them in a folder (which you might call “Memos”). If you do that, I recommend using a brief but descriptive title and date in the document name even though you can sort the documents by date added/created; if you later edit the files or transfer these to a different computer or from your backup, that information can sometimes get lost. Another option is to keep memos on a private blog—again, tags and titles are helpful. Spreadsheets can also be useful. While I haven’t generally kept most of my memos in a spreadsheet, they are helpful for the reference memos—my timeline and personnel memos are both spreadsheets. You could also keep your memos in a hard-bound journal or binder, but I strongly recommend against doing this unless you diligently scan or photograph each page (and make sure it’s readable) because physical copies are really easy to lose—I’m assuming you back up your computer on a regular basis or automatically with some cloud. *

*

*

Qualitative research can be really stressful—you put in all this work and don’t know what you are going to find until you start analyzing your data, and then there is so much of it to get through. In that sense, it’s really the hardest part of the process. But the tools that ethnographers have developed, particularly coming out of the Grounded Theory tradition (namely, content analysis and analytic memoing), are useful for getting past the crux. They help you to process your data and develop useful insights. There are, certainly, more advanced analytical tools qualitative researchers can use, but content analysis and memoing can carry you through most projects. Basically, once you get through this stage, it’s fairly easy climbing from here on out. But the anxiety doesn’t exactly go away—you’re not done yet. So, in the next chapter, we talk about additional steps you can take to feel more comfortable (and be more convincing) when you make claims arising from these tools.

10

Placing Pro Making Causal Claims with Qualitative Data

Protection 1 The process of setting equipment or anchors for safety. 2 Equipment or anchors used for arresting falls. [Commonly known as pro.] Wikipedia (2020), “Glossary of Climbing Terms”

But How Do You Know You Are Right? When I was working on my dissertation, and even later when it was my book manuscript, I was really nervous about whether or not I had gotten it “right.” I had argued that the men running this highly criticized prison were deeply committed to, and responsible for retaining, their prison’s unique approach to incarceration—the source of the criticism. I also argued that, in contrast to many previous scholars’ assumptions, it wasn’t just ideological for them, meaning they didn’t retain and defend the system simply because they believed it was the right system. Indeed, I could never be exactly sure if they truly believed in their system: I think they did, but I also had some strong evidence that—even assuming they did believe that their system was superior, as they themselves frequently claimed—their support wasn’t just ideological. The system that these administrators defended was kind of a pain to administer, so they would do all sorts of things to make their lives easier, violating the system in the process. For example, they justified their prison’s use of solitary confinement because they believed prisoners should not have contact with other prisoners or anyone else. In reality, they would put two prisoners in a cell or give them jobs in other parts of the prison outside of their cells where they would run into other prisoners, in contravention of the system’s 208

Placing Pro209

emphasis on privacy and isolation. At least two prisoners were dating (they would rendezvous in the coal cellar . . .). While such violations of the system, and their consequences, were fairly common, the administrators would never discuss these practices publicly. So, if they weren’t even sticking to it in practice, why did the administrators keep their criticized and difficult-to-administer approach to incarceration? The part of my argument that I was always nervous about was the answer to this question. I argued that by retaining this highly criticized approach to incarceration, the administrators were obligated to defend the prison from criticism; that defense actually provided them with an opportunity to describe themselves in glowing terms and basically claim to be better than all the other prisons’ administrators (and, often, more knowledgeable than their state’s legislators and the local penal reformers). If they got rid of their approach to incarceration, their prison would be like every other prison, and they would be like every other group of prison administrators, with no particular claim to superiority. My argument seems counterintuitive: It is also difficult to prove. And yet I’m pretty sure it’s right. On the one hand, every time I read the prison’s annual reports, in which the administrators defended the prison and their approach to incarceration and talk about how it was superior, I would think, “Yes! I’ve hit the nail on the head—I’m totally right.” But when I would get ready to explain my argument to someone, I would get nervous. Was my interpretation accurate? How do I really know? No administrator ever came out and said we’re keeping this criticized system because it gives us a claim to fame. No one would—that would be kind of weird, and I doubt it was a conscious choice. So this is really just an interpretation. How, then, could I be sure that my interpretation was accurate or even fair? *

*

*

Interpretation or not, part of what I was doing in my dissertation was making a causal claim. People often forget that qualitative research can allow you to make causal claims. Most qualitative research makes causal claims of some kind. Even if you don’t see yourself doing causal inference because you aren’t, say, testing hypotheses or speaking in terms of variables, you are probably still making causal claims. “Why” questions are causal questions. Explanations are, at base, causal claims. Additionally, a lot of interpretations, for example, involve causal claims. In the case of my dissertation, I was saying that keeping the

210

Chapter 10

criticized system gave the administrators the opportunity to flatter themselves, which caused them to keep the system, despite the criticism. When making causal claims, or explanations, you want to make these claims as accurately as possible—and you want other people to believe them. You also want the confidence to believe them yourself. If you weren’t a Dirtbagger, this whole process might be a bit easier because there are well-established (and easily recognizable) methods of showing causality (although, honestly, people push these methods way too far without getting called on it). So we need some methods of showing causality of our own. This chapter discusses the tricks and tools you can use to establish causal claims and, ultimately, to give yourself—and your audience—confidence that you aren’t just making shit up. The more of these tricks you use, the more confidence you can have. I think of it like climbers laying down pro—the more nuts and cams you embed on the wall, the less likely it is that you will plummet to the ground if you miss a handhold and fall. One piece of pro might pop out if it’s poorly placed or there’s some loose rock, but if you have several pieces placed, you’re still safe. Likewise, the more of these extra steps you take, the more confidence you can have—and if you are wrong, the more you can be forgiven for believing you had it right.

The Challenge of Causal Inference One of the things I really like about qualitative studies is, in a lot of cases, you don’t usually need to worry about research design or the possibility of confounding when it comes to causality because you can literally observe some causal mechanism at play. It’s not hidden; it’s just there. Remember Katherine Beckett and Steve Herbert’s (2010) article on the trespass ordinances forbidding homeless people from using Seattle’s public parks? If not, they show how those ordinances create pains for the homeless people similar to the pains of imprisonment—deprivations (e.g., of goods and services, autonomy, safety, etc.) typical of the carceral experience described by Gresham Sykes (1956). Because the sanctioned homeless people couldn’t go to the parks where people leave out donated food and clothing, the homeless people were deprived of goods; because they couldn’t go to the Social Security office or Veterans Administration, they were deprived of services. We can see the causal link pretty clearly. In contrast, with a quantitative study, we might simply see summary data like person A is on a trespass ordinance and does not get their donated items; person B is not on a trespass ordinance and does get

Placing Pro211

their donated items. In the quantitative study, we would have to infer the link, but we wouldn’t know for sure. In the qualitative study, we can actually see the reason for the deprivation: The homeless people used to get clothes and food in the park, but now they are prohibited from going to the park. This causal visibility is one really big advantage of qualitative methods. Unfortunately, a lot of other qualitative methods projects, like my dissertation/book, ask questions where we cannot directly observe the causal process. In these sorts of situations, research design and additional analysis checks become more important. In this way, we become more like the quantitative scholars when we have to infer the relationship, and that only works if we’ve set up our research design (at the outset or retroactively with checks) properly. In this situation, we are making causal claims, but we have to engage in causal inference. Basically, we have to rely on logic rather than direct evidence of a causal relationship. This is particularly common when we engage in interpretation.

An Interpretation or Just Making Shit Up? Part of what made me nervous about the claims in my dissertation/book is I’ve always been a little uneasy about (and sometimes critical of) interpretations— people can reasonably read a situation differently. For example, in grad school, I would get coffee every morning from the café in the law school that was run by an entirely Latinx staff. I noticed that sometimes some of the law school’s non-Latinx faculty, administrators, or staff would come in and order their coffee or food in Spanish. At different times, I had different interpretations of what was going on: “That seems nice and friendly—it’s making this a more welcoming space,” or “That seems really condescending—are they implying they can’t take their orders in English?” I’d go back and forth, sometimes wondering if I should greet the barista and place my order in Spanish as well or if doing so would be insulting. I never really decided which, but I decided to play it safe and just keep doing what I had been doing. Other people might have one or the other of these interpretations, or another one entirely. But as pure observers, without talking with the staff, it would be difficult to know what exactly they are thinking and how receptive they are to this practice. Maybe if we watch closely, we can see them grimace or notice that they are less friendly with these customers, or alternatively maybe they brighten and seem genuinely happy when customers do this. This presumes they will react visibly and that we can accurately read their reactions. But what if they don’t really seem any different—what if they are equally friendly with all

212

Chapter 10

the customers, or at least the regulars (as they pretty much seemed to be). How do we know which, if any, is the correct interpretation? In general, I’m skeptical of interpretations based purely on one’s read of a situation without further data. For example, if a researcher simply described these interactions between the barista and customer, and gave interpretation a, but they gave no information about how the barista reacted and they never talked to the barista, I’d want to know how they decided interpretation a was correct and not interpretation b. In that situation, whichever interpretation they choose might say more about the researcher than the people of interest. But if the researcher spoke with the barista, other people who were around, or other people who have experiences with bi-and multilingualism—or if they are drawing on established theories or a literature that analyzes such experiences— then I’d be more likely to believe them. Basically, there needs to be something else there that can bolster the case. It might not be 100% probative—many questions can never be answered with 100% certainty, and that’s true in all fields—but it would move us closer in that direction. Here’s another example. There is a book about prison museums that I don’t particularly like. In this book, the author visited various prison museums and described their interpretation of these museums’ effects on visitors (for example, observing this exhibit, visitors feel somber and overwhelming sadness). The problem is the author is basing these perceptions of somberness and sadness on their own feelings—they didn’t talk to the other tourists to hear about the range of emotions people felt or didn’t feel. Sometimes the author would say a particular exhibit was intended to produce a particular effect on the viewer—but, again, they never interviewed the museum staff to find out what they were trying to do with the various exhibits. That type of account seems really problematic to me—and to others. In fact, it can be contrasted with another very well-done study, The Culture of Punishment, by Michelle Brown (2009). She also describes prison museums, but she does interview the staff to understand what choices they made and why. There is still room for interpretation, but it’s grounded in the data rather than just coming from the writer’s imagination. Going back to my dissertation/book, I couldn’t exactly interview the prison administrators, and even their private diary entries didn’t directly address the question I was really interested in. So how can I have any confidence in my argument—how can I be sure I’m not doing exactly the type of thing for which I have so little patience?

Placing Pro213

I was able to convince myself by building in a variety of checks. 1 In part, I followed the intuition underlying the causal inference literature, especially the literature by qualitative political scientists. They have developed the idea of process tracing—basically, using qualitative data to trace the variable or mechanism you think is causing an outcome of interest (e.g., Bennett 2008, 2010; see also Brady and Collier 2010). One way of doing this is to trace its impact to see if it behaves in the way you think it does—essentially creating a series of hypothesis tests. Note that we’re not going out and looking for evidence that confirms our expectations, but rather checking if the evidence we expect to see—and that should be there if our chosen explanation is correct—is actually there. That is, we are taking seriously the possibility that our expectation (or our interpretation) is wrong and looking for evidence that tells us if that’s the case or not. For example, we might think that overt racism is responsible for sentencing disparities in a particular court because we’ve seen that Black people routinely get longer sentences there. But after we observe this court, we realize that it’s not overt racism: No one is openly saying racist or racially coded things, and overall defendants of different races seem to be treated the same in interpersonal interactions. While there still might be racial bias, we don’t have any direct evidence of it—except for the outcomes. By exploring the role of a particular factor or variable—racial bias—and looking to see if it’s happening the way we think it is, we are process tracing. In this hypothetical study, we find it’s absent (at least as far as we can see—it still might be there but hidden) or not functioning the way we expected. That would be a failed test. The good news for our theory is this failed test would set us on a different path to examine a new variable with its own set of relevant hypotheses. So now we dig a little deeper for more subtle forms of racism like structural racism, or the way in which racial disparities are built into rules and social routines. We then realize that the Black defendants disproportionately have public defenders, who seem to try cases differently than private attorneys, leading to longer sentences. This is a form of structural racism—because in this jurisdiction race and resources are related, it creates inequalities in how the cases are processed in this court even though no one is openly racist. We can update our hypothesis—it’s not overt racism but structural racism that is causing disparities—and we can now support it by showing the causal mechanism by which structural racism operates.

214

Chapter 10

Note that we’re not just looking at a correlation (the Black defendants have public defenders). Because we are using qualitative data—ethnographic observations and maybe some interviews and document analysis—we can see the cause of the disparity in action (the public defenders tend to try the cases differently than private attorneys in a way that causes racial disparities). If it were just a correlation, we might not actually know what’s going on. We might see public defenders and race and come up with various explanations that we can’t confirm: Public defenders are inferior lawyers (mmmm . . .) and that’s the cause (nope); public defenders have low resources (generally true) and that’s the cause (maybe); the public defender’s office has a policy about how attorneys try their cases (could be true) and that’s the cause (could be); public defenders are overworked and therefore don’t care about their clients (also mmmm . . .) and that’s the cause (probably not). Instead, we can see that public defenders in this court try cases differently than private attorneys who can afford to call in expert witnesses and golf with the judge. (I’m making this study up, of course, but you can imagine something like this—and in fact there’s a lot of research on this topic showing these types of conditions.) I should mention for a moment that this strategy (process tracing) is basically what we’re almost always doing with qualitative methods. In fact, when I came back from my two-week bootcamp on qualitative methods taught by political scientists, I was enumerating the new things I’d learned to one of my sociology professors. When I mentioned process tracing, she had heard of it but wanted me to elaborate, so I did. Then she said, “Isn’t that basically what qualitative methods is about?” I responded, “Pretty much.” But the proponents of process tracing have added two things: They have formalized the process by enumerating specific steps to follow, and they’ve added terminology.2 (This formalization helps to demonstrate methodological rigor. Such techniques are especially common among qualitative political scientists, who face a greater- than-average skepticism of qualitative methods from their quantitative peers.) So while some people might intuitively be process tracing already, political scientists have mapped out a range of steps to follow, and these steps were in my head for thinking about how to make sure I was right. The rest of this chapter runs through some typical strategies of qualitative causal inference (including but not limited to process tracing), drawing heavily on the contributions of qualitative political scientists and to some extent econometricians, both of whom think a lot about causal inference.

Placing Pro215

Take Advantage of Typologies In Chapter 2, I introduced typologies, and in Chapter 6 we spent a lot of time talking about typologies as a technique for aiding case selection. Typologies can also be a useful analytical tool to describe your data or to make causal claims. There are two big ways you can use typologies descriptively or causally: with correlations and with necessary/sufficient logic. In the first case, we’re going to draw on some of the same ideas as you might find in frequentist statistics, even though we’re not using statistics or large numbers in the sense quantitative scholars would recognize (even if we have a shit ton of causal process observations!); in the second case, we’re relying more heavily on ideas you would come across in logic classes, but also in Bayesian statistics, again, even though we’re not using statistics or large numbers.3

Unearthing Correlations We often think of correlations as relating to quantitative research, but they occur in qualitative research as well. You’ve probably heard the saying, “Correlation does not mean causation,” and that’s right. And I keep making the point that qualitative research lets us go beyond correlations because we see the mechanism at work instead of just inferring that it’s there. But sometimes, when we can’t see the mechanism working, seeing a correlation can strengthen our confidence in our expectations or, when the opposite happens, guide us back into the right direction. One of my first studies as a junior professor—really, a spin-off from my dissertation—examined the rise of modern prisons in the United States during the 1820s–1860s (Rubin 2015a). Essentially, I had collected statutes, references in secondary sources (other histories of the period), and references in primary sources (documents from the period) that gave me the date of each prison’s opening (prisons did not always open the same year they were authorized). Basically, I had a table of dates, states, and prisons. (This is an example of one of those studies that uses numbers but isn’t really a quantitative study—there was no way I could run a useful regression when my n was less than 50—and less than 35 for most of the period.) Initially, my goal was just to map out the prisons to help illustrate how exceptional my prison was: to be able to say concretely how many prisons there were and how few followed the system of interest. But once I had the dataset compiled, I wanted to do more.

216

Chapter 10

Since I was using neo-institutional theory (DiMaggio and Powell 1983; Meyer and Rowan 1977) for my larger dissertation project, I had been analyzing my data with this framework in mind. One part of this framework involved examining the diffusion (or spread) of a particular technology or practice. Certain neo-institutionalists (especially Tolbert and Zucker 1983), drawing on diffusion research, pointed out that we should distinguish between those organizations that were at the forefront of new trends, immediately adopting whatever new technology was out there (“early adopters”) and those that waited to adopt a new technology (“late adopters”—although the really late adopters are called “laggards”). The distinction was important, neo-institutionalists argued, because early adopters and late adopters had different reasons for adopting the technology: Early adopters, the theory held, were motivated by a real need for the technology (it would solve a technological problem); late adopters, in contrast, were motivated by a perception that adopting the technology would make them look legitimate in the eyes of various stakeholders (regulators, funders, clients, etc.), so they would adopt it even if they didn’t really need it. Applying this theory to my data, I could see that there certainly were early adopters and late adopters. But in focusing on the early adopters—the states that adopted a modern prison before 1836—I realized that they weren’t really the states I would have expected based on some of the leading theories in the prison history literature (neo-institutional theory, after all, is not that popular in prison history—as far as I know, I was the first person to combine the two for this period).4 That literature usually focused on and described certain contextual factors that were most likely going to take place in large coastal cities like Philadelphia, New York, and Boston. But the early adopters included states from the coast and the frontier, from the north and the south, agricultural states and industrializing states, states with big cities and states with no cities or only small cities. Just to make sure I wasn’t missing anything—basically, I wanted to be a bit more systematic in looking for these correlations—I made a series of 2x2 tables or typologies. They looked something like Table 10.1, where I made the column headers “Early Adopter” and “Late Adopter” and used the rows to distinguish between categories like “North” and “South”; the o’s represent the states, except in the real version I used state abbreviations (and there were more o’s). These tables allowed me to systematically compare the timing of prison adoptions in the north and the south, in states with big populations and small

Placing Pro217

populations, and so on. When I looked at each of these comparisons, however, there were similar numbers of o’s (states) in each category, so there wasn’t any real pattern—the cells were roughly equal. If the theories had been accurate, I should’ve been able to see some pattern, like all the o’s were in both the north/ early adopter cell and the south/late adopter cell. I must have used 10 to 15 different variables, which the literature made me expect would be helpful, but to no avail: No patterns emerged, suggesting none of the variables I used were helpful for narrowing down the question of why some states authorized prisons earlier than other states. The standard explanations didn’t hold up. As I was reading through my primary and secondary sources, however, I started to notice a pattern I hadn’t looked for before. The really early adopters usually adopted a modern prison after their proto-prison (the first generation of state prisons) started to implode as prisoners set fires, escaped, and rioted as described by Rebecca McLennan (2008) and Michael Meranze (1996) in their books on New York and Pennsylvania, respectively. I saw this pattern in other early-adopting states, like Maryland (a southern state), as well. In fact, in several cases, the new modern prisons—in early adopting states—were explicitly designed to replace the older proto-prison facilities.

Table 10.1 Failed Efforts to Uncover the Difference Between Early and Late Adopters Using 2x2 Tables. This table sorts states, identified as “o,” between early and late adopters (columns). The rows indicate three different efforts to further sort these states according to other characteristics of interest; each pair of rows (north and south, big population and small population, urbanized and not urbanized) represents a different effort. Early Adopter

Late Adopter

North

ooooo

oooo

South

oooo

ooooo

Big Population

ooooo

oooo

Small Population

oooo

ooooo

Urbanized

ooooo

oooo

Not Urbanized

oooo

ooooo

Note: In my actual table, I used abbreviations of state names instead of o’s.

218

Chapter 10

I made a new 2x2 table, distinguishing between those states that had a proto-prison (most of which were pretty old by the time the new modern prisons opened) and those that did not (Table 10.2). The pattern held with very few exceptions—for the most part, late adopters did not have a proto-prison. What was really interesting is, when I found more data on those states that seemed to buck the trend, they were still pretty consistent with the overall trend but in different ways from the other states (e.g., Louisiana didn’t have a proto-prison, but the New Orleans jail was pretty similar to a proto-prison; Indiana did have a proto-prison, but it adopted it in 1820—the latest year I had allowed in the category—so it actually makes sense that they waited so long to adopt a modern prison). This 2x2 table strengthened my supposition, by supplementing my more direct evidence about several cases, that early adopters were motivated by a desire to replace their failing proto-prisons. Just to be clear, the 2x2 (or really 2x3) table, or typology, is not enough on its own to demonstrate causality—just as regressions on observational data aren’t enough even when they show significant results. There could have been a lot of reasons why early adopters of the modern prisons also had already adopted proto-prisons. For example, it might be the case that early adopters just like to have the newest technology available, whether that is the earlier proto- prison or the later modern prison. I had to have that additional research to put together that they really did have an underlying reason to adopt their modern

Table 10.2 Final Table from My Modern Prisons Research Proto-Prison

Modern Prison Authorization Date

(1820 or Before)

1820—1834

1835—1860

Yes

CT, DC, GA,

IN

Later

KY, MA, MD, NH, NJ, NY, OH, PA, TN, VA, VT No

IL, LA

AL, ME, MO, MS, RI

Source: Rubin 2015a.

DE, NC, SC

Placing Pro219

prisons—a problem adopting the modern prisons could solve: in this case, that their earlier prisons were failing and needed to be replaced. But making the 2x2 tables helped me both to figure out what was going on with my data and, in presentations, to demonstrate the relationship in a concise and powerful way supplementing my other textual data. While this example involves a relatively small number of states or state prisons, 2x2 tables can be used with different types of observations including interviews, interactions, bits of text, and so on. And keep in mind that you can use 2x2 tables in a variety of ways. They can tell you if you are on the right path by exploring the extent to which the relationships you see bear out: Does your 2x2 table, representing all the cases you can find, match what you expected based on your deep-dive analysis of a few cases? They can be useful for setting you on the right path: Do you think you know what’s going on, but your 2x2 table says that’s not what’s going on? People sometimes use 2x2 tables as stand-alone evidence, but here you have to be a bit careful about traditional issues relating to causality like selection bias that we discussed (and generally dismissed in Chapter 5 because Dirtbaggers are unlikely to use this type of analysis alone). If you systematically excluded certain characteristics from your sample, then that can limit your ability to draw inferences from these tables (but it doesn’t have to). Some of that flexibility from earlier—which I justified by our interest in discovering new variables, mechanisms, and processes—can hamper our ability to make causal claims here. So just be aware of what your data’s limitations are: If you basically have a full population (as I had with the states or as you may have if you are observing everyone in a classroom setting, for example), you’re probably okay; if you selected a unique sample or group, you have to think through things more carefully (or collect more, and different, data).

Identifying Necessary v. Sufficient Factors Typologies using 2x2 tables like the one I just demonstrated are commonly used with another strategy for causal analysis: necessary or sufficient factors. What I find really interesting and helpful about these analyses is they use a more complicated version of causation—one that sometimes gets overlooked in quantitative studies but is really familiar in qualitative studies. The necessary/sufficient analysis is like finding correlations, but much more complicated. Correlations are really straightforward: If this happens, then

220

Chapter 10

that happens, or if this happens, then that doesn’t happen, and so on. The logic underlying necessary v. sufficient analyses is sometimes like that, but it is also more like this: If this happens, that might happen, or just because this happens doesn’t mean that won’t happen. To be honest, this style of thinking makes my brain hurt a little bit, so I have to go really slowly when thinking about this— you might want to do the same. Additionally, if you get tripped up on the abstract version of this technique, it also helps to work with an example you know well.5 When someone says something (X) is necessary for something else (Y) to happen, we mean but for that something (X) happening that something else (Y) won’t happen. Or another way of saying this is Y only happens if X happens. Here’s an easy example: It’s necessary for an egg and sperm to mingle for a pregnancy to begin. (I’d say it’s necessary to have sex to get pregnant, but with lots of new technology out there, that’s not actually true.) By contrast, when someone says something (X) is sufficient for something else (Y) to happen, we mean that something (X) guarantees that something else (Y) will happen. Or another way of saying this is if X happens, then Y will definitely happen. It’s harder to come up with an example of this one because it seems like there are exceptions to many examples, so I’ll use a lie. Sometimes, parents who don’t want their kids to have sex will tell them if the kids have sex they will automatically get pregnant—parents then are telling their kids that sex is sufficient for getting pregnant. These two ideas, necessary and sufficient, might sound similar, but they are quite different. They can also interact in a lot of ways—for example, something can be both necessary and sufficient, or neither necessary nor sufficient, or necessary but not sufficient and vice versa (not necessary but sufficient). There are several of these patterns. Once again, 2x2 tables are helpful ways of figuring out these relationships. To make a 2x2 table, start with a basic square with four squares inside it (Table 10.3). Next, identify the two factors (or variables) you are interested in— I’ll use Y and X (by convention, Y is usually an outcome we care about, and X is the thing we think might cause it). Then, review your cases or observations: Just put a mark—an o, a little x, a checkmark, or a tally mark |—to denote every case or observation you have and where it falls in the table (I will use o’s in these examples). It helps to start by dividing your cases or data points into X is absent (X = 0) or X is present (X = 1). Next, for each category of those two categories, further subdivide your data points into Y is absent (Y = 0) or Y is

Placing Pro221

Table 10.3 Fill in the Blank: A Sample 2x2 Table X=0

Y=0

X=1

Y=1

present (Y = 1). (In the pregnancy example, the o’s would be people, and you’d divide them into the different cells depending on whether or not they had sex, and then whether or not they got pregnant.) Once you have your table filled in, you just work through the possibilities. That is, figure out whether X is necessary or sufficient (or not) for Y—by matching up the pattern you see with one of the standard patterns displayed in Table 10.4a–Table 10.4f. These patterns look a lot like correlations, but they are actually less straightforward, more complicated, or just more varied—and less linear—than correlations, so you (or at least I) have to do more work to remember the patterns or work through the logic. For example, there are two patterns that we may not recognize as causal relationships if we are just looking for linear correlations (that is, a diagonal pattern). These unexpectedly causal patterns are where the data fall in all but one cell. For example, when the X = 0, Y = 1 cell is empty (see Table 10.4a)—meaning we have no observations in which Y is present when X is absent, but we do have observations in which Y is present when X is present—this pattern suggests X is necessary for Y. When the X = 1, Y = 0 cell is empty (see Table 10.4b)—meaning we have no observations in which Y is absent when X is present, but we do have observations in which Y is present when X is present—it suggests X is sufficient for Y. Both routes—the necessary

222

Chapter 10

Table 10.4a Necessary v. Sufficient Logic: X Is Necessary

Y=0

X=0

X=1

ooo

(ooo)

Y=1

ooo

Table 10.4b Necessary v. Sufficient Logic: X Is Sufficient X=0

X=1

Y=0

ooo

Y=1

(ooo)

ooo

Table 10.4c Necessary v. Sufficient Logic: X Is Necessary and Sufficient

Y=0

X=0

X=1

ooo

Y=1

ooo

Table 10.4d Necessary v. Sufficient Logic: X Is Necessary but not Sufficient

Y=0

X=0

X=1

ooo

ooo

Y=1

ooo

Table 10.4e Necessary v. Sufficient Logic: X Is Sufficient but not Necessary X=0 Y=0

ooo

Y=1

ooo

X=1

ooo

Table 10.4f Necessary v. Sufficient Logic: X Is Neither Necessary nor Sufficient X=0

X=1

Y=0

ooo

ooo

Y=1

ooo

Placing Pro223

or sufficient version—lead to the X causing Y, but in different ways. Both routes also have various caveats to the idea that X causes Y, such as X can, but doesn’t have to, lead to Y (X is necessary), and Y can happen with or without X, but it definitely will happen with X (X is sufficient). But that’s only one set of patterns—there are others. In fact, some of the patterns you might recognize as correlations: What we think of as positive correlations (if this happens, then that happens—or if this doesn’t happen, then that won’t happen) is really another way of saying something is necessary and sufficient (see Table 10.4c). Likewise, what we think of as negative correlations (if this happens, then that won’t happen—or if this doesn’t happen, then that will happen) is another way of saying something is neither necessary nor sufficient (see Table 10.4f). You might have also noticed that we have some other patterns that aren’t correlations; these might be new to you. Indeed, someone looking for correlations in the data might dismiss these as signs that there are no correlations in the data. In fact, these patterns are quite important. When X is necessary but not sufficient (see Table 10.4d), we are saying that we see Y is present when X is present and Y is absent when X is absent, and Y is also absent when X is present; but we have no observations in which Y is present when X is absent. When X is sufficient but not necessary (see Table 10.4e), we mean Y is present in cases where X is present and also where X is absent, and Y is also absent when X is absent; but we have no observations in which Y is absent when X is present. In both cases, X is potentially important in causing Y, but in a more contingent or complicated—nonlinear—way. Now that your head probably hurts, let me give some examples. One involves research on prison riots, the vast majority of which is qualitative (prison

Note to tables 10.4a–f: In these tables, the o’s represent data points or cases—they could be prisons, states, schools, neighborhoods, people, groups, countries, classrooms, hospital wings, etc. In Table 10.4a and 10.4b, the parentheses indicate “optional” observations. That is, in Table 10.4a, the observations in the upper-right cell are irrelevant to the determination of whether X is necessary. These observations may or may not be there; their absence would indicate that X is both necessary and sufficient (see Table 10.4c), and their presence would indicate that X is necessary but not sufficient (see Table 10.4d). Likewise, in Table 10.4b, the observations in the lower-left cell are irrelevant to the determination of whether or not X is sufficient. These observations may or may not be there; their absence would indicate that X is necessary and sufficient (see Table 10.4c), and their presence would indicate that X is sufficient but not necessary (see Table 10.4e).

224

Chapter 10

riots are too rare to study quantitatively, although scholars have used mixed methods to study general levels of disorder). Some theories hold that bad prison conditions and incompetent administration cause prison riots. On the one hand, this theory is generally right: We know prison conditions play an important role—lots of prison riots take place in prisons with bad conditions, and lots of prisoners’ demands during the riots (if they get to the negotiating stage) involve requests for better prison conditions. But as Eamonn Carrabine (2004) has pointed out, we have far more examples of prisons with really bad conditions and incompetent administrators than prison riots—both now and since the prison’s emergence. So this theory doesn’t actually tell us why some prisons with bad conditions—and not others—have riots. This flaw with the theory doesn’t mean that the theory is wrong, though. If we were sticking with linear models of causality, we would have to reject the theory: There is only a weak correlation—too many negative cases (prisons with no riots) with the causal variable of interest (bad conditions). But we have lots of qualitative evidence that prison conditions and administration are important in causing or stopping riots, so we’d be incorrectly rejecting the theory. Instead, using a less linear form of logic, we can say that bad prison conditions are perhaps necessary conditions for a prison riot, but they are not sufficient. This allows us to recognize that they play a causal role but within certain limits. Basically, it tells us we need some other factors to explain why prison conditions sometimes lead to a prison riot and sometimes don’t. And that’s what Carrabine does in his book. Here’s another, more familiar example. Say we are interested in the Quaker religion’s role in influencing states to adopt a particular style of prisons—those using the Pennsylvania System of long-term solitary confinement—in the 1820s and 1830s. Four of the 30-plus prisons authorized in the pre–Civil War period adopted this approach at one time; all of them were in states associated with Quakers (Pennsylvania, New Jersey, and Rhode Island). This pattern might suggest that whether a state is a Quaker stronghold or not is an important factor for determining whether that state adopts this type of prison. How important? Well, it turns out that other states, like Delaware and New York, were also Quaker stronghold states, but they didn’t adopt this style of prison. This suggests that being a Quaker stronghold state is a necessary but not sufficient factor for adopting this type of prison. (Notice that this is different from saying a Quaker stronghold state is likely to adopt this type of prison because that’s assuming Quakerism is a necessary and sufficient factor. This is the difference between linear and

Placing Pro225

nonlinear logic.) Ultimately, it means something else has to be going on to make Quakerism relevant during the effort to adopt this type of prison. As with correlations, we use these patterns to get us on the right track or to see how our hunches work on a bigger scale than a few data points or groupings of data. We can also use them to reinforce other data. But we don’t use these 2x2 tables alone. (We’d still want to investigate: Do we see the Quaker religion coming up in those states’ decisions to adopt this style of prison? Do we see Quakers advocating for the prison policies that get adopted or participating in the debates?) Thinking about things as necessary and/or sufficient is helpful for identifying other causal pathways. It is also helpful for ruling them out. In my case, seeing that the Quaker religion was not sufficient was helpful for putting some constraints on the importance of the Quaker religion for adopting this style of prison.

Counterfactuals and Alternative Explanations There is another type of comparison that qualitative researchers use, and again some of the terminology comes from the quantitative side of social science. Even though this technique is relevant to theory generation, I’m going to start by explaining the logic with theory testing. In theory testing, whether qualitative or quantitative, there is a built-in assumption that we are always comparing some population or some set of events to a counterfactual. That counterfactual is basically this: What would the outcome have been if condition m had happened instead of condition n? For example, what would my life be like (outcome) if I had taken job a (condition m) instead of job b (condition n), or gone to school a (condition m) instead of school b (condition n)? We can represent that more formulaically by saying what would Y (the outcome) have been if X = 1 (m had happened) instead of X = 0 (n had happened). If we want to know the effect of X on Y (how the outcome would be different between m happening and n happening), we are saying we want to know: (a) What value would Y have in the world where X = 1? For example, what would the sentence length be if the defendant is Black? What would my life be like if I had taken job a instead of job b? (b) What value would Y have in the world where X = 0? For example, what would the sentence length be if the defendant is White? What would my life be like if I had taken job b instead of job a? (c) What is the difference in those two Y’s? For example, would the defen-

226

Chapter 10

dant have been sentenced to 20 months or 2 months? What are the differences between the two images of what my life would be like? In practice, we can never observe both of these situations (the values of Y where X = 1 and X = 0) at the same time because, for any given observation, X can only have one value at any given time. So we try to get close with good research design. Quantitative researchers interested in causal inference will set up experiments in which they randomly assign people to different groups (some get the X = 0 group, often described as the control group, and others get assigned to the X = 1 group, often described as the treatment group). In a properly designed experiment, we can compare the overall outcomes of each group to figure out the difference, or the effect of X on Y. For example, in audit studies, researchers will use two résumés that are identical except for one difference: One includes a name common among White people and one includes a name common among Black people. In this way, we can randomly assign a given racial identity (to fictional people) to see employer’s reactions to that racial identity relative to some other racial identity, and thus the effect of racism on an applicant’s job prospects. These creative studies require a question or set up where you can randomly assign something.6 More often, quantitative scholars use other tricks. One trick involves statistical matching: As with comparative multi-case study designs (see Chapter 6), scholars try to compare like observations within a large sample. Let’s say I’m interested in comparing the employment rate for Black people and White people, but I know there are a lot of confounding factors like different education levels—that is, other things might be going on to explain a correlation we see. If my sample is large enough, I can probably find some people who share some characteristics (e.g., age, gender, education) but differ on the one I care about (e.g., race). Instead of just “controlling” for these things by seeing individually how age, gender, and race play a role, we try to compare people who are as similar as possible (such as all 23-year-old college-educated White men compared to all 23-year-old college-educated Black men) and see how their employment outcomes differ. Another trick is using a regression discontinuity design. This approach statistically compares the outcome immediately before and after some important change. Again, we expect that the before and after periods, if we keep them

Placing Pro227

really close together, should be pretty similar in all respects except the change. For example, we might look at how much crime people commit right before and after their 18th birthday to see if being tried as an adult (which usually happens when you turn 18), and the longer prison sentence that would likely follow, deters people from committing crime. In theory, a longer prison sentence is supposed to be sufficiently scary to convince people that committing crime is a bad idea: If there is no difference in crime rates before and after people’s 18th birthdays—when your chances of a long prison sentence increase dramatically—that suggests the longer sentence doesn’t make a big difference on crime commission (Lee and McCrary 2017). In these cases, we’re coming up with counterfactual equivalents that we can actually observe—as opposed to the theoretical ones that we can never observe—so we can compare their outcomes and see if they are different. In the case of the résumés, the counterfactual is how would the same job applicant be treated if they were White instead of Black. In the case of employment outcomes, the counterfactual is the employment rate for a group of White people who are demographically and educationally similar to a comparable group of Black people. In the case of the crime rates, the counterfactual is the crime rate under a regime where you get tried as a juvenile (and thus have a limit to how long your prison sentence can be) rather than if you are tried as an adult (and are thus eligible for a lengthy prison sentence). We can use this strategy in qualitative methods as well by looking for similar cases to our case (or cases) of interest but that differ in important respects. That’s exactly what I did in my dissertation. While I was interested in this prison that retained a highly criticized approach to incarceration for more than 80 years, there were other prisons that adopted the same approach but then got rid of it (some after a few years, some after a few decades). One was in Rhode Island, one was in New Jersey, and—very helpfully—another was in Pennsylvania. I studied each of these prisons and especially what reasons led to them getting rid of this criticized system so I could see if those reasons were at all present at the prison I cared about. This analysis helped me gain confidence in my claim that the prison administrators were definitely important: In each of my counterfactuals, it was always the prison administrators who requested the legislature get rid of this approach to incarceration and authorize another system in its place. That never happened at Eastern: The administrators never asked the legislature to get rid of the criticized system. This finding

228

Chapter 10

strengthened my claim, based on other data from my case, that the prison administrators were responsible for retaining their unique approach to incarceration. This additional evidence was helpful because, based on the literature, I expected the penal reformers and legislators to be more important than the prison administrators. My counterfactuals were helpful in another way, because they also helped me to test out alternative explanations (see also Chapter 2). For example, one common explanation was that Eastern’s unique approach to incarceration required a lot of upfront expenses so the state wouldn’t want to lay out more cash to change the system—basically, they were stuck with this approach whether they wanted it or not. But the fact that there were three prisons that adopted and then got rid of the approach suggests that cost was not such an important motivation; clearly, they were able to replace their approach to incarceration despite the massive outlay of cash. In fact, one of the prisons, the other prison in Pennsylvania, was initially torn down and rebuilt indicating the state was okay with spending more money. (There were also logical flaws with this explanation, including the fact that it wasn’t super expensive to replace this system with the alternative approach.) I did this exercise with several other theories people offered until I ran out. Then, to paraphrase Sherlock Holmes (a favorite among qualitative political scientists interested in causal inference),7 by ruling out all the other options, the remaining possibility must be right—okay, I wouldn’t say “must,” but I’m a lot more confident in it after doing this exercise.

Apply These Tricks to Your Study So, step back and look at what you think you’ve figured out and think about what you would need to make your claim more convincing. I don’t mean more convincing as in “go find support for your claim,” but rather think about what you would need to do to test your claim. Kristin Luker (2008, 47) tells her readers to think about what they would need to convince their “smartest, nastiest, most skeptical, and meanest” critic. I wouldn’t go that far, particularly because I know there are some critics out there who will not believe what I say regardless of what evidence I bring forward. So maybe think about what a friendly but skeptical person in your subfield would need to be convinced, and then think about the same type of person in your larger field or even from a different field. Then, you might think about someone who is really antithetical to your work—but don’t feel like you have to convince

Placing Pro229

them (that might be an unwinnable battle). A lot of times, though, I really just need to think about what I need to be convinced—self-doubt can be very strong, even when intermittently tempered by excitement and familiarity with all my evidence. Here are some questions to ask for empirically dealing with those doubts: What are some counterfactuals you could examine? What are competing explanations you can investigate? Are there other sources of data that might shine a different light on what you are seeing? What are some additional sources of evidence you would expect to find if you are correct? And if you are not correct, what evidence would you expect to find? The actual steps you need to take to do these things are covered in some conventional qualitative methods books, particularly those written by political scientists (some of which I’m drawing on here), but they are described as things you set out to do at the very beginning. That’s much easier to do if you are working in an area that is already well covered, and you know a lot about your topic, case, region, period, group, or social phenomenon from the literature. But if you are doing something that is really pretty new, innovative, or breaks with traditional examinations, it’s a lot more difficult to design your research before doing some research. That’s why I’m discussing these strategies now, after discussing the other major steps of data analysis. I like to think of these strategies as checks that you put in place—the qualitative equivalent of robustness checks (basically, kicking the tires of your regression to see if your findings ever fall apart)—rather than part of your central research design.

Lay Extra Pro When You’re Doing a Sketchy Climb There are some sketchy climbs, by which I mean climbs where there is a high chance that you will fall. There are at least two really good reasons to lay down extra nuts and cams to protect yourself from a fall. First, you want to minimize how far you will fall—the bigger the whipper (fall), the more likely it is you get some bad scrapes or an ankle twist. Second, if you are going to fall, you want to be really sure a nut isn’t going to pop out. (That sounds dirty, but a lot of climbing terms are pretty dirty.) This happens in movies all the time—someone is climbing and suddenly their gear starts slipping (it’s very dramatic and makes for great TV, however inaccurately it’s usually presented). In reality, some pieces of gear are really strong, but you may have doubts that others are going to hold. In general, though, it’s a good idea to

230

Chapter 10

have more than one piece of pro just in case. There is no guarantee that you won’t go falling to the ground, but the more pieces of pro you’ve placed, the less likely that’s going to happen. You can even feel bomber—meaning that something is so solid that it’s bombproof, as in a bomb could go off, and your pro would stay put.

Sometimes, we get nervous about our research for annoying, irrational reasons like imposter syndrome and such. But sometimes we get nervous because we know we are making big claims, claims we can’t support, or just because we don’t have direct evidence for our claims. When we’re in these somewhat sketchy positions, we can bolster our confidence by doing these additional checks. This doesn’t mean it’s a bad study (or a bad climb)—it might be awesome if you can pull it off. As with climbing, there is absolutely no guarantee that something bad won’t happen (and you’ll fall to the ground) or make a fool of yourself in front of the academic community. But you can minimize (quite substantially) the chances of that happening. The more of these extra steps you follow, the more confident you can be. You can start to feel bomber. *

*

*

This chapter has discussed some of the tools, tricks, and considerations you can use to make causal statements with qualitative data. Part of the point of this chapter is just to remind folks that there is such thing as qualitative causal inference; but the biggest point is to discuss how to do qualitative causal inference well. Thinking about causation in terms of mapping out correlations, thinking through necessary or sufficient factors, using counterfactuals, and testing alternative explanations can help you make sure you’re on the right track and, hopefully, respond to skeptical critics (including the ones in your own head).

11

Living on the Sharp End Dealing with Skeptics of Qualitative Research

Sharp End The end of the belay rope that is attached to the lead climber. “Being on the sharp end” refers to the act of lead climbing, which is considered more psychologically demanding than top- roping or following, since it may involve more route-finding, as well as the possibility of longer, more consequential falls. Wikipedia (2020), “Glossary of Climbing Terms”

War Stories I was sitting in a seminar room listening to colleagues discuss some new research by a relatively well-known scholar—we’ll call him Professor X. During the discussion, one professor chimed in, essentially asking why we were even talking about Professor X, explaining that Professor X’s research wasn’t very good. In fact, this other professor said, “I read Professor X’s book and I couldn’t find any data. I kept reading through it looking for data and there weren’t any!” I was dumbfounded at this remark. Professor X’s book was a qualitative, historical-comparative work that I actually like quite a bit. It has some problems, certainly—no work is perfect—but it is generally well respected among people I know, and it has been frequently cited and not just in a peripheral way. While there are valid critiques of this book, lack of data wasn’t one of them. I sat there considering this professor’s remarks, trying to figure out what exactly he meant. After all, I was at the time in Canada, which tends to have a more flexible (interdisciplinary, pluralistic) approach to social science research.

231

232

Chapter 11

The professor who had spoken up, though, was a quantitative scholar, and even though he generally praised a very broad notion of interdisciplinary work, he had just replicated a tendency I’d seen many times before in US academia— he’d conflated “data” with numbers. (Another version of this tendency is when people use “empirical” to mean “quantitative,” as discussed in Chapter 2.) After trying to identify an alternative interpretation of his remarks, I decided that’s what he meant: Although full of rich descriptions—based on a clean research design, systematic data collection, and careful analysis—the book in question did not include graphs, tables with numbers, or regressions. The few numbers included were dates or some descriptive statistics added for clarity or specificity. Apparently, because of this fact—the complete absence of quantitative data and methods—this scholar had dismissed the book out of hand. His reaction was not unique, but it reflected a larger, multifaceted skepticism people—scholars, students, policymakers, journalists, and citizens— have about qualitative research. Here is another example from several years earlier when I was in grad school. I was at a cocktail party at a national conference speaking with a professor—a quantitative scholar from the department at which I was about to interview for a job. Never one to beat around the bush, he asked me, “So, you are a qualitative scholar. How are you going to be productive enough to get tenure?” I first corrected his misconception: My first two articles were in fact quantitative studies, one of which used a snazzy regression-discontinuity design on some time-series data (i.e., above average in its quantitative savvy). I explained that I use the methods needed for the research question at hand; thus, I only happened to use qualitative methods in my dissertation because the research question did not lend itself to quantitative methods. Then I explained that one could be quite productive with qualitative research: When you collect your own data, you end up with a lot of it. It might take extra time in the beginning, but you can mine that dataset for years afterward. He seemed to be satisfied with that answer, but I took the hint: I gave a quantitative job talk and made sure to show off my statistical savvy. Stories like these are far from uncommon and reflect a general tendency in some circles to discount qualitative research. It’s important to realize that folks saying these things aren’t necessarily assholes. The prof who didn’t recognize qualitative data as data is known for being a good advocate for his students; the prof at the cocktail party is someone I like and respect quite a bit. But both are embedded in contexts where qualitative work is perceived to be weaker

Living on the Sharp End 233

than quantitative work. They’re also in a discipline that (like several other disciplines) holds up old qualitative studies as classics but then derides any current work in that vein. *

*

*

Qualitative scholars frequently face skepticism about their ability to produce high-quality research—and in sufficient amounts. As we’ve seen throughout this book, there are many implicit critiques of qualitative methods vis-á-vis quantitative methods when it comes to things like defining qualitative methods (Chapter 2) or making causal inference (Chapter 10). Underlying these critiques are basic misconceptions—on the part of not only critics but also over-eager qualitative researchers—about qualitative methods’ inherent limitations. (Bad qualitative research is, sadly, one contributor to these misconceptions.) So part of learning about qualitative methods requires understanding common critiques of qualitative methods, both so you are prepared to defend your choice to use qualitative methods in general and so you can defend your specific methodological choices against rote critiques.

Mapping the Terrain This chapter covers a lot of (frequently misguided) criticism of qualitative methods, which might be a bit off-putting, especially for anyone just starting out. Plus, some readers might find this discussion unnecessary: Qualitative methods are so popular in some circles that many well-established commentators have criticized the general qualitative v. quantitative divide as stupid, and grad students are increasingly being encouraged to produce mixed-methods studies. But not covering these critiques seems irresponsible because I continue to see them—and to see them go unchecked by others. So it seems like it really is worth it to correct some misconceptions. I also want to make sure that anyone I’m convincing to use qualitative methods is prepared to encounter them and able to take them on. Before doing so, this section summarizes where qualitative methods stand across three disciplines.1 Let’s start with my primary field of criminology (possibly one of the most quant-obsessed disciplines) where qualitative methods are really starting to make inroads. In 2015, then–graduate student Johann Koehler published an article in the discipline’s flagship journal, Criminology; the article was entirely qualitative. It relied on archival documents, interviews and oral histories, and

234

Chapter 11

other textual sources to trace the discipline’s roots and some major sources of tension within the discipline that persist as a result of its conflict-ridden origins. It was a big deal to see that piece published in a journal widely considered as a generally quantitative-only venue. Likewise, Keramet Reiter, a primarily qualitative socio-legal scholar, recently received the American Society of Criminology’s Ruth Shonle Cavan Young Scholar Award, another honor generally reserved for quantitative scholars. She examines the history and current use of supermax prisons—also relying on interviews, archival documents, and official documents available online or through FOIA (Freedom of Information Act) requests. So things might be changing, even while many scholars in the field continue to look down their nose at qualitative research—and qualitative scholars. In political science, the general orientation to qualitative research is somewhat more complicated. On the one hand, the 1994 publication of Gary King, Robert O. Keohane, and Sidney Verba’s Designing Social Inquiry: Scientific Inference in Qualitative Research was a sort of invitation to political scientists to use qualitative methods. But it was simultaneously a statement that qualitative methods weren’t good enough on their own: They had to be conducted as though a quantitative scholar was at the helm. The book is basically a guide for applying quantitative logic to make qualitative methods . . . better. Ten years later, Henry E. Brady and David Collier responded with their book Rethinking Social Inquiry: Diverse Tools, Shared Standards, in which they pointed out a number of problems with applying a quantitative model to qualitative research. Their book has been a kind of rallying cry for qualitative political scientists and a bible for defending qualitative research against ignorant questions and misguided critiques from people biased against qualitative research. Particularly within political science, qualitative research has a somewhat defensive tone. Indeed, in the last several decades, political scientists have been leading the way in developing tools for making qualitative research more “rigorous.” For example, process tracing has formalized the steps of qualitative research into a strict recipe and added some jargon to further legitimize the approach. I don’t say this pejoratively—I used parts of the process tracing approach in my dissertation, and it continues to shape how I think about qualitative methods (as we have seen). But much of this innovation seems (to me) to be an effort to preempt quantitative critiques. Perhaps I’m reading into things, but I don’t think I am. I initially formed this impression after a two-week bootcamp in qualitative and mixed-methods research organized by, and aimed at,

Living on the Sharp End 235

political scientists. We spent several days discussing the (frequently misguided or overstated) critiques against qualitative research and how to respond to such critiques à la Brady and Collier—and prevent them from becoming valid critiques of our own work.2 Even within sociology, one of the strongholds for qualitative research, there is still rampant skepticism about qualitative research. To some extent, this skepticism takes a similar shape to the general quant v. qual divide within political science and criminology. In fact, as with political science, some qualitative sociologists, feeling the skepticism of non-qualitative scholars, are borrowing the tools of quantitative scholars—such as random sampling and quantification when working with comparatively small sample sizes—in order to appear more rigorous. Not too long ago, sociologist Mario Small criticized this trend. Borrowing a metaphor from physicist Richard Feynman, Small likened these moves to New Guinean “cargo cults” that tried to attract cargo drops and air traffic by replicating their old wartime airports (essentially, if you build it they will come). Similarly, it seems some qualitative scholars believe that if they imitate quantitative scholars by copying their trademark tools, they will be taken seriously. But these tools do not work with qualitative methods and ultimately look a little silly, much like the cargo cults hoping their faux airports will attract wartime-level air travel. As Small put it, “[T]hese practices constitute little more than applying words without adopting their meaning, constructing sticks-and-leaves airplanes that will never fly” (Small 2009, 10, 6). But, as we shall see below, sociologists have also added another layer to the skepticism (one they are not alone in expressing): actual disbelief not only in the veracity or validity of one’s results, but sometimes even distrust about whether one has conducted the research one claims to have done. Ethnographers in particular seem to be victims of this particular brand of skepticism: Did you really go there? Did he really say that? Did you really see her do that? These attacks do not come from sociologists alone. Indeed, a prominent legal scholar, Steven Lubet (2017), published a book called Interrogating Ethnography in which he simultaneously evaluates ethnography using a lot of criteria most qualitative scholars (myself included) would say are problematic at the very least and then proceeds to tell ethnographers how to do ethnography by reciting the things good ethnographers already do—a kind of methodological mansplaining. So why are so many people skeptical of qualitative research—and why would you want to do a type of research that so many people are critical of?

236

Chapter 11

So Skeptical One answer to the question of why people seem to be so skeptical of qualitative research is that the results you can get are so rich, so interesting, and so generative—essentially, so awesome—that people are skeptical anything that good can exist. Qualitative methods let you do things that are difficult to conceive of within a quantitative paradigm.

Success and Disbelief When Alex Honnold free soloed Half Dome—that is, climbed the roughly 2,000-foot sheer face of granite in Yosemite Valley using nothing but his hands, a chalk bag, and his climbing shoes (no other gear, not even ropes)— in 2008, some people didn’t believe it. (If he had made any mistakes, he literally would have fallen to his certain death.) Nothing like that had ever been done before. Honnold had free soloed a few other mountains and rock formations in the preceding year (as had others before him like John Bachar and Peter Croft), but nothing as big or as smooth (and thus hard to climb) as Half Dome. It was amazing. So, of course, people were skeptical . . . at first (Honnold and Roberts 2016).

When Alex Honnold started free soloing really difficult big walls, it was almost inconceivable from the current paradigm, which relies on ropes, harnesses, and carabiners to make sure people do not fall thousands of feet to their very squishy death. Likewise, in order to achieve its intended results, the quantitative approach involves its own set of harnesses and ropes to prevent disaster—you have to have a certain amount of data, you have to be willing to believe certain assumptions, your research design must satisfy certain requirements, and you have to analyze the data in a particular, very formulaic (literally!) way. The idea that one can proceed without all of this gear, so to speak, doesn’t just seem daring, it seems crazy. It also seems too good to be true. Okay, likening qualitative research to Honnold’s free solo of Half Dome might be exaggerating things a bit. Also, qualitative methods have been around for a while, and people are still skeptical (whereas within weeks of Honnold’s first big climb, people got on board). But, despite the failures of my analogy, there is an underlying point that I think is probably true: Something that awesome, and different from the techniques people are used to, breeds skepticism.

Living on the Sharp End 237

The divergence between the two traditions becomes a cause for concern for someone versed in the quantitative tradition, particularly when qualitative scholars make causal claims. There is a very small window in which one can make causal statements with quantitative data—so small, in fact, that most quantitative research doesn’t actually satisfy these requirements.3 Since they are so difficult to satisfy, scholars have apparently collectively agreed to overlook these requirements (Freedman 2010). But there is something about the power of numbers that suspends disbelief and conveys authority (e.g., Espeland and Stevens 2008), so people tend to get away with it. For those of us who work mostly with words, that can be pretty annoying because the same courtesy (or perhaps blind faith) is not extended to qualitative work. Let me briefly make what should be an obvious point but that somehow people forget—or are only recently starting to acknowledge on a wide scale. Statistical evidence can be just as easily fabricated as qualitative evidence, and I suspect it happens more often. To fabricate one’s qualitative data, one would actually have to impose new words in their records or say something happened that didn’t. To fabricate one’s quantitative data, one could change numbers in a spreadsheet, but one can also rerun a regression in R, STATA, SAS, or SPSS (i.e., various brands of statistical software), adding and subtracting variables until finding the combination of factors that yields the “right” coefficient on the variable of interest. In fact, I once heard a grad student admit to doing this—during their job talk—which understandably garnered a very strong, negative reaction from the audience (even from people I suspect do this themselves but who don’t admit to it publicly). One can also selectively drop outliers, not out of statistical best practices, but because it might change the results or make a graphical depiction look cleaner. Indeed, there are quite a few specifications one can add to the process of quantitative analysis that do not get written up and thereby avoid scrutiny.4 However, except in cases of established fabrication, I’ve not heard of anyone accusing quantitative scholars of fabricating their results without clear evidence.5 By contrast, this is a depressingly common accusation against qualitative scholars. Suggesting I’m right to think that awesome research with powerful findings particularly attracts disbelief, it seems that the more controversial or colorful one’s research, the more likely it is to be accused of inaccuracies or entire fabrication. In his book, The Stickup Kids, Randol Contreras (2012) describes a cadre of failed drug dealers who turned to robbing successful drug dealers. They don’t actually just rob from them—they also torture them in fairly

238

Chapter 11

gruesome ways. The book is beautifully written, and Contreras deals seriously with the ethical and methodological challenges of researching and writing about this topic. Even so, the book has had its share of critics. Most relevant for our purposes here, though, is the accusation that he fabricated the material: When his book went under review, one reader sent it back suggesting Contreras had made it up (Contenta 2016). People also go after books that make it big. Just a few years after Contreras’s work came out, Alice Goffman (2014) published On the Run, an extremely popular (for a little while) book about a collection of young Black men dealing with outstanding warrants for their arrest while living in a heavily policed Philadelphia neighborhood. For six years, Goffman lived or spent time in this neighborhood, observing and interviewing its residents, visitors, and the people they interacted with. Many of her observations include instances of police brutality and harassment (including their physical abuse of Goffman herself) as well as a variety of crimes committed by the people in her study. In the most notorious portion of her book, Goffman drives one of the young men in her study—a long-time friend of hers after years of fieldwork—to look for the men who just killed their friend, another study participant. Her book initially received extensive praise, but scholarly and public reaction quickly turned—citing a variety of potential problems, one of which was a charge that she had fabricated portions of her book (see, e.g., Lewis-Kraus 2016; Singal 2015). Some of the biggest debates in sociology, which have centered around controversial ethnographic research, seem to have no expiration date (especially the debates over Alice Goffman’s work). Notably, all of these questions about honesty and authenticity have been posed well before the era of accusations about “fake news” and may only get worse as we increasingly rely on technology (e.g., camera phones) for proof—without such “proof,” how do we know it really happened? Whether strategically responding to these claims or not, some qualitative scholars are increasingly turning to visual media—photographs and video recordings—as part of their documentation process. But even these moves will not silence critics since researchers’ ethical obligations to their participants (or vulnerability to their institutional review boards in the United States—research ethics boards/committees in Canada and the UK) often prevent researchers from making some recordings publicly available.6 Indeed, there are some aspects of qualitative research that help explain why people become skeptical and thus why these challenges to its authenticity are endemic. For one, it is impossible to fully replicate a qualitative study (at least

Living on the Sharp End 239

one dealing with live subjects/participants—historical work is a bit safer in this regard) in the same way one can with quantitative studies. With much quantitative work, one can re-download a publicly available dataset, try to replicate the exact series of steps the first analyst used to create a particular table or graph, and expect to find identical findings (this is actually easier said than done, but it is the expectation; it’s also rarely done, but it can be done). By contrast, you will never get exactly the same answers from two qualitative studies. You—even the same you who conducted the study in the first place—can never be at the same place, in the same time, with the same people, and the same larger context. If someone else tries to replicate the study, even if they are magically able to return to the same place and swap out your body for theirs, they may still collect different data: What you choose to focus on, what you can get access to, and what people tell or reveal to you will probably be measurably different from another person’s experience going through the same process.7 Many people will see this as a problem—evidence of fabrication or an inherent limitation of qualitative methods. Underlying this misunderstanding—beyond just a tendency to be skeptical of qualitative work—is a translation failure. Whenever you re-collect data, whether qualitative or quantitative, you expect some “margin of error” or difference from the original sample. In a quantitative study, I might take two random samples of Illinois voters on the same day; those samples will be very slightly different from each other. If I look at some variable I care about, the average in each sample will probably be similar but not exactly the same. If I look at 20 variables, just by chance, one of those variables will be really different across the two samples. That’s normal and expected. But, for some reason, when it comes to qualitative work, people seemingly expect to find precisely the same information and get suspicious if they don’t. This skepticism goes back to a misconception we discussed in Chapter 2: the assumption that qualitative work is not generalizable because details about the sample might vary from place to place. But qualitative work is supposed to be conceptually or theoretically generalizable: While the exact details of a repeated study may vary, the conceptual or theoretical results will likely be the same—in the same way that testing a statistical model on a slightly different sample from the same population would yield similar but not identical results. (This idea is represented in Figure 11.1.) From a qualitative scholars’ perspective, this is to be expected. But from a skeptical observer’s perspective, variation in the technical details would be alarming—proof of fraud, incompetence, or simply methodological failure.

240

Chapter 11

Figure 11.1 A Visual Representation of What Replicating Qualitative Research Might Look Like. Just as quantitative estimates from various studies are expected to vary within some range, so too qualitative findings should vary. The details will differ—as the exact point estimates and standard errors will differ across quantitative studies—but the general ideas are expected to be the same.

Beyond questions of replication, there is a frequent mistrust among some readers or audience members at research presentations that the qualitative scholar is somehow lying about their work. Maybe they don’t believe the person is fabricating their data, but just presenting it in a dishonest way such as selectively choosing examples that support their case and ignoring those examples that do not (the qualitative equivalent of rerunning your regression to get the right answer). This mistrust often takes the form of a question that sounds like, “How do we know you aren’t just cherry-picking your examples?” (No one ever seems to ask quantitative scholars how many specifications of their formula they ran before settling on their results; that would be rude.) In this scenario, you, the qualitative scholar, are seen as hoarding a treasure trove of data, but you are only letting people see tokens of your wealth—no one knows for sure how big your treasure is (do you only have a few coins and that’s what you are showing people?) or what type it is (all gold coins, some diamonds mixed in,

Living on the Sharp End 241

or a diverse mix of gems and precious metals?).8 Even when scholars take pains to contextualize their data—making clear what is representative, illustrative, or exceptional—some readers or audience members will hold onto their skepticism anyway. If the qualitative scholar has done a good job in explaining this, and a critic persists in their skepticism (without any indication that the scholar has been nefarious in their presentation), that’s what I call an asshole. They are quite common.

Valid and Invalid Criticisms of Qualitative Work Beyond the general, almost irrational mistrust in qualitative methods, there are a number of criticisms that often come from quantitative scholars (or people who are more familiar with quantitative than qualitative work) that profoundly misunderstand the qualitative enterprise: • Your sample size isn’t big enough. • You have sampled on the dependent variable. • This is a convenience sample (or worse, this is not a random sample). I’ll just say it: These are really annoying critiques, and they usually demonstrate the critic’s ignorance. But because they are so common, other people in the audience don’t necessarily realize how dumb these criticisms are. While I was writing this book, a colleague of mine (I’ll be honest—he’s known for being a total asshat) actually said one of these at a grad student’s talk (“What you’re doing is actually called, ‘sampling on the dependent variable’”)—and no one corrected him or chimed in to explain that it was fine in this case (and many others). A few months later, a friend of mine who uses both quant and qual methods used one of these clichés in a paper to explain why they were not using a particular method. Again, there’s no correlation between saying stupid things about qualitative work and being a bad person. Unfortunately, as with the accusation of fake news, there is a grain of truth that makes it somewhat complicated to explain why these are ignorant critiques, which is why they stick around. As with some fake news accusations, these are legitimate critiques in some contexts (e.g., some news stations objectively have very problematic relationships with truth and facts), but the person issuing the critique is probably misapplying it (e.g., critiquing the wrong news station). Additionally, these are generally valid critiques for quantitative research, but the critiques are not themselves indicators of bad qualitative research.

242

Chapter 11

Instead, these are important considerations for qualitative work. Unfortunately, these critiques have been imported into the qualitative world by some of the folks using quantitative rules to “improve” qualitative work, like Gary King et al. (1994). These days, such critiques are volleyed indiscriminately at all qualitative research. There certainly are examples of shitty qualitative research where these critiques would be accurate, but more often, the critiques are misapplied. What makes a critique silly or serious, legitimate or not, is the reasoning behind it.

Your Sample Size Isn’t Big Enough! If you only interview five people or you spend a month embedded in a particular context, and those are your only sources of data, that’s probably not enough for a peer-reviewed article—and certainly not a book. Of course, I have seen articles or books published that fit something close to that description, but not usually in the venues one aspires to publish and/or usually only by senior scholars who can (unfairly and wrongly) get away with it.9 But the question of sample size is complicated. Around the time that major critiques came out of Alice Goffman’s ethnography on the men with outstanding warrants in the Sixth Street neighborhood, I’d heard other scholars dismiss her work as “basically following around two or three guys”—or some variation of an attack on her sample size, focusing on her primary informants. There is certainly much we could criticize about her study, but sample size is not one of them. This critique ignores a number of other interviews she did, including a systematic canvass (doing a door-to-door survey) of the neighborhood’s residents (more than 100 people) and interviews with other relevant actors scattered throughout Philadelphia and, in some cases, beyond. It also ignores the duration and the embeddedness of her data collection: She collected data over six years, and her data were not restricted to only the three guys but the three guys in context, including everyone they interacted with—from their family and friends, to the police officers who chased them, to the owner of the corner store where they shopped, to their supplier of clean urine (for “piss tests”), to their neighbors, to the people in the prisons where some of the guys were incarcerated. Given that there is other stuff one could critique about her study, this always seems like a lazy critique to me—and one that seems to be based on a summary of her work rather than her actual work. So, how much is enough? As we have seen, there are no bright-line rules about how much is enough; yet people—by which I mean reviewers, audience

Living on the Sharp End 243

members at a talk, friends and advisors, and others—seem to have an intuitive sense (which doesn’t make it right) about this. “Enough” is usually measured by the summary counts of how many interviews, hours in the field, or documents you analyzed. We have already discussed ways to figure this out for your project that don’t involve counting (Chapter 7). That’s because counting is a less-than- optimal way to evaluate work. First, what matters more than how many interviews, days in the field, or documents you use is their quality—how rich are your data? You can have 100 interviews that each last five minutes, and unless you are really good at concise interviews, that dataset might be pretty close to worthless. Likewise, you can have a century’s worth of official reports, but they might not include any data on what you really want to know. Second, whether you are using interviews, ethnography, or documents, your study will be better if you are relying on multiple sources (remember our discussion from Chapter 7 of how this triangulation strategy is beneficial). The more data sources you use, the less data you need from any particular source. So people’s intuitive sense usually underestimates how much data a study is using once multiple data sources are in play. But people will count, so you might as well know what sorts of numbers they look for. Let’s say that your goal is not a book (or dissertation), but a journal article—and let’s say an article in a subfield journal rather than the flagship journal for your discipline. You usually need to do much, much more for a book/dissertation or an article in a flagship journal: To some extent, the amount of data you need to collect is roughly proportionate to the length of the product (article v. book) but also to the prestige of the publication outlet (rightly or wrongly, top-tier journals often have higher standards than middling and lower-tier journals). Focusing on one data source at a time, here are some rough “intuitive” numbers. If you are “only” doing interviews, you need a minimum of 20, but you will be safer with 30. (Yes, articles with fewer interviews have been published, but many others have been rejected because even 20 is too few to some reviewers.) Here, I’m assuming these are mostly around one-hour interviews as opposed to 15- to 30-minute interviews. If you are doing an embedded ethnography, you probably want a minimum of three to six months in the field (two months could work if you really put in a lot of hours). Again, there is some flexibility here: If you are doing fieldwork one day a week for six months, that might not cut it; but if you are doing really intense daily fieldwork for two months, that might.10

244

Chapter 11

Recognizing this variation, people usually report how many hours they spent in the field as well as other details indicating the quality of that time. For example, in one of my favorite studies, Phil Goodman (2008, 743) explains about his fieldwork at one of two prisons: “I spent nine days, spanning three weeks and about 60 hours, observing at Central. During my observations Central processed as many as 500 inmates a day. As a result, in only three weeks I was able to observe several thousand inmates get categorized.” (He put in a similar amount of time at another prison and also analyzed documents.) As this example illustrates, the numbers in these recommendations can move around a lot depending on the richness of your data—how well they capture what you care about—and how many different data sources you are using. It’s much more difficult to come up with numbers for text-based studies. If you are working with archival records, for example, it’s really going to depend on the richness of the source: You might be able to write an article based on one diary if it’s really rich (detailed on the issues you are interested in), and you have a strong knowledge of the context from other primary and secondary sources (i.e., sources written during the period you study as well as sources written later about that period). But it might be hard to write an article on some sort of serial for which you have a century of data if the serial doesn’t contain much relevant material. Overall, though, assuming reasonably rich relevant documents, I use a similar evaluation I would use for interviews: You should probably aim for 20–30 documents. (Of course, if you are looking at much shorter documents like newspaper articles, you probably want a minimum of 100.) Realistically, the documents will vary quite a bit in size and richness, so you might end up using somewhere between 500 and 1,000 or more. (If that fills you with panic, don’t worry—some of these might be really small, and they won’t be equally important!) In general, when trying to identify the minimum to pass muster, more is better; but if you are getting to your 100th interview or your third year in the field, you need to assess your motivation. First, what are the empirical benefits of continuing (for example, have you reached “saturation”—that is, so much repetition in your new data that you aren’t really learning anything new)? Second, what are your psychological reasons for persisting (is there something else you are avoiding or trying to overcome)? These are just rough guidelines to keep in mind when people start talking about the quantitative inferiority of your qualitative data, or to assuage your nerves if you worry you have not done “enough”; but if you need a refresher on how to really answer this question, return to Chapter 7.

Living on the Sharp End 245

But You Sampled on the Dependent Variable! A lot of qualitative research can be characterized as sampling on the dependent variable. This critique, like many critiques, is more of a catchphrase that needs some elaboration before it can be understandable. It means that a scholar is looking at a particular outcome of interest—aka the “dependent variable” (conventionally represented as Y)—and studying cases with a particular variation of that outcome or sometimes comparing cases with different outcomes of interest. The reverse would be to start with the full population and then select a sample of cases based on their characteristics or context—aka the “independent variables” (conventionally represented as X)—that we think are important for (potentially) explaining the outcome of interest. Since qualitative scholars, particularly those of the countercultural Dirtbagging variety, don’t typically speak in terms of variables, this characterization requires a bit of awkward translation first. Let’s go through some examples, starting with my book studying a prison (Eastern) that alone retained an exceptional approach to incarceration called the Pennsylvania System. Here, a prison’s approach to incarceration—the thing I want to explain—is the dependent variable (Y): 0 = follow the Auburn System (the more common approach) 1 = follow the Pennsylvania System (the unique approach)

That is, I’m trying to explain Y, or rather why a prison was Y = 1 (followed the unique Pennsylvania System) rather than Y = 0 (followed the more common Auburn System). I selected Eastern because of its unique approach (Eastern was Y = 1, whereas most prisons were Y = 0); by choosing my case based on this outcome of interest, I sampled on the dependent variable. In Jooyoung Lee’s book studying aspiring rap artists in the Los Angeles area, young men’s life and career choices are the dependent variable: 0 = follow the “traditional” life course of going to school and later getting a job 1 = prioritize the dream of becoming a rap star above all else

Lee was trying to explain why some young men were Y = 1 (prioritized their dream of becoming a rap star) instead of Y = 0 (followed a traditional life course). Lee focused on young men who diverged from the traditional approach (Y = 1); by focusing on this outcome of interest, he sampled on the dependent variable.

246

Chapter 11

We could repeat this exercise with a lot of examples. But, boiling them down in this way feels weird for several reasons. One reason is that it’s kind of difficult to imagine how these studies could have been conducted otherwise— sure you could do them differently, but they would be very different projects. I tend to think that if certain methods classes or texts didn’t drill in this advice, “Don’t sample on the dependent variable,” a lot of critics would instead see this technique as an intuitive approach. Indeed, for these studies, the sampling selection is legitimate given their research questions. Lee and I both wanted to know what motivations people had, what pressures they faced for their choices, and how they overcame (or didn’t overcome) those pressures. For those sorts of questions, studying Y = 0 prisons, people, or groups would not make sense. Another reason it feels weird is this critique actually mischaracterizes these studies. Embedded in this critique is the idea that these studies only research the unique cases—the Y = 1 sample (my deviant prison or Lee’s aspiring rap stars). In fact, each of these studies actually does research other (Y = 1) prisons or groups. Likewise, both studies look at prisons or people who eventually move from Y = 1 to Y = 0, which offers useful variation. But we can say the main focus is the Y = 1 sample (while they are still in the Y = 1 condition). Consequently, this critique is extra inappropriate for both of these studies. Scholars who make this critique, however, are drawing from a quantitative perspective in which sampling on the dependent variable would literally break their regression—in two senses. Statistical regressions require variation on the dependent variable (Y)—meaning your data cannot consist only of Y = 1 (or Y = 0), but instead every variable included in the model must have a range of values—or your computer program will return an error message. That is, if we collected useful quantitative data on one outlier group, we’d get stuck pretty early in the analysis. More generally, statistical analyses require a random sample for certain mathematical assumptions to hold, and sampling on the dependent variable is not random sampling. Neither of these technical problems apply to qualitative research. But there is another concern related to causal inference, which may be relevant for qualitative work. As we have seen in Chapters 6 and 10, some causal statements require at least one comparison case if you want to be on surer footing. If you only study a Y = 1 case (especially one that stays in the Y = 1 condition), you have no comparison case in this sense. But let’s say you are studying several Y = 1 cases, and they are your only cases. You technically have some comparison cases, but they might not be useful comparison cases.

Living on the Sharp End 247

As we have seen, there are some specific rules for how to select comparison cases, and they involve looking at the characteristics you think are important (both independent and dependent variables or the X’s and Y’s), rather than looking only at the outcome (the dependent variable, Y). An example is in order. Let’s say that I concluded that the only reason Eastern State Penitentiary, the prison I studied, retained the Pennsylvania System was because it had Quaker roots (LOL, right?—see Chapter 2). Let’s assume I could trace the influence of Quaker heritage (an independent variable, X) on the prison’s continued use of the Pennsylvania System (the dependent variable, Y). And say I’m pretty confident about that influence, but I want to make a more general statement that Quaker heritage causes prisons to adopt and retain the Pennsylvania System. If I’m only looking at that one prison, I would have a hard time making that statement. For example, what happens when it turns out that other prisons had Quaker roots as well, but they did not follow the Pennsylvania System’s approach to incarceration? Alternatively, maybe some of the other three prisons that temporarily followed the Pennsylvania System also had Quaker roots, but they still got rid of the system. I might be right about my claim, but I would need to explain why Quaker roots played a different role at Eastern and not elsewhere, which requires looking at other cases—other prisons that had Quaker roots (an independent variable). This is true in other studies. If Lee said that men who face the hardest socio- economic challenges (an independent variable, X) pursue their rap dreams (the dependent variable, Y), that statement would be false if it turns out that there are other, more hard-pressed young men who do not pursue their rap dreams that he just didn’t study or know about. Instead, his study discusses the role of socio-economic challenges, but he does not make a general, comparative argument about socio-economic challenges, nor does he suggest such challenges automatically cause people to focus on their big, if improbable, dreams. Notice that you can make some causal statements without comparison cases of this type. Even if you are studying one case, but you are studying it over time, you will probably find what are sometimes called within-case comparisons. For example, after 80-plus years, Eastern eventually officially got rid of the Pennsylvania System. The prison administrators basically wrote to the legislature to say, Look, we’ve not been using this system in practice for about 50 years, and we don’t really like breaking the law like this, so can you finally either give us the

248

Chapter 11

resources we need to implement the system for real or pass some legislation that acknowledges that we’re not using this system anymore? We don’t really care either way. Just help us stop breaking the law. Thanks, bye.

I’m paraphrasing, in case that’s not clear, but that’s a pretty close translation of what they said. Then the legislature changed the law. I have some other data that helps me see this—there were some other actors involved and other moving parts—but I can demonstrate that the prison administrators ultimately got it deauthorized. I didn’t need to look at other, non–Pennsylvania System prisons to make this claim because it was pretty clear in the data I was analyzing. However, I did have comparison cases, and I did see a similar thing happen earlier at other prisons that initially followed the Pennsylvania System and then got rid of it—in each case, right after the prison administrators wrote to the legislature with a similar plea. The fact that this didn’t happen at Eastern for so long led me to claim that administrative support was essential to keeping the Pennsylvania System. Note, however, that none of this required looking at prisons following the Auburn System. I had comparison cases, but not from the Y = 0 crowd—instead, I was looking at the change over time from prisons that started off as Y = 1 (i.e., followed the Pennsylvania System) and later became Y = 0 (i.e., followed the Auburn System). In this case, doing what is often criticized as sampling on the dependent variable was valid.

This Is a Convenience Sample! Some people have it in their head that convenience sampling is bad—if it’s convenient, it must not be rigorous. They’re wrong: There are perfectly good reasons to use convenience sampling. These same people sometimes label some studies as using convenience samples if the sample (or really the case) looks convenient literally (i.e., it looks easy to do), whether or not the author had other reasons for their case selection or sampling strategy. For example, Alice Goffman could be said to have used a convenience sample when she studied a Philadelphia neighborhood when she was going to school in Philadelphia (and, later, nearby New Jersey). The same for Jooyoung Lee who was going to school in Los Angeles when he did his fieldwork studying the aspiring rap stars of LA. And for Randol Contreras who studied the gang of men who robbed drug dealers in the South Bronx when he was going to school in New York City. But in each case, the fact that the site selection was “convenient” (and this should really be read as “more convenient” than, say, going to another state or country) is irrelevant. It’s irrelevant because these scholars’ sites make sense for

Living on the Sharp End 249

what they were studying (there was probably no better site for Lee’s study than southern California where aspiring rap stars cluster), because it probably did not make much difference (Goffman would likely have had sufficiently similar findings if she was studying a Philadelphia neighborhood or a Chicago neighborhood), or some combination of the two (if Contreras could have found an LA-based gang that used the technique of robbing from drug dealers, he probably would have had sufficiently similar findings). Certainly, there are examples of scholars selecting sites simply because they are nearby, but good qualitative researchers can give some (additional) justification for why they selected their site—or why site selection doesn’t really matter for what they study.11 Convenience sampling more typically refers not to the study location, but to the people or documents examined; but here too there is a lot of misunderstanding about when convenience sampling is a problem. Goffman got access to the people she studied by tutoring a young girl from the neighborhood and spent time with the girl’s family (immediate and extended), friends, and neighbors. Before officially beginning her study, Goffman already seemed like family to the people who became some of the central participants in her study (Goffman 2009, 341). Contreras had grown up with some of the men in his study and thus was long-time friends with them before his study began—although he had been gone from the neighborhood for some time and was more of an old friend at the time the study began (Contreras 2012, preface). Both of these cases may seem like the height of convenience (in the “easy” sense), but these scholars’ personal histories with their participants prior to the studies speaks to the difficulty of gaining access to fieldsites and people. For Goffman, her friendship and history of hanging out around the neighborhood helped her overcome some of the obstacles like suspicion of outsiders or race and class status that might otherwise have blocked her access (Goffman is a highly educated, middle-class White woman; the neighborhood’s residents were largely poor Black people with few educational opportunities) (Goffman 2009, 341–342). Likewise, given the sensitive nature of the gang’s activities—open not only to criminal charges but also retaliation from the aggrieved drug dealers and their networks—Contreras’s lengthy personal history with his participants, and his status as someone who grew up in the same area, helped to build trust in a way that may not have been possible for a stranger—and was still not guaranteed given Contreras’s different career trajectory and lengthy absence from the neighborhood (see Contreras 2012, introduction).

250

Chapter 11

There are plenty of times when using a convenience sample can be legitimate grounds for criticism. (We’ve already discussed, in Chapters 6 and 7, how to select your case(s) and sampling strategy in a methodical way to avoid this and other criticisms.) But I’ve seen this criticism more often lobbed inappropriately against good research—either mistakenly calling something a convenience study when there are other reasons for the sampling or assuming “convenience” means bad. *

*

*

Doing qualitative methods, unfortunately, makes even the best scholar vulnerable to unfounded skepticism. Some of this skepticism, I have argued, results from a general befuddlement or disbelief that one can make powerful statements, descriptive or causal, without going through all of the hoops quantitative scholars must go through. Some of the skepticism results from ignorance and bias: Some scholars evaluate qualitative work under the same criteria as quantitative work, which is unnecessary and ultimately a bad idea, while others simply repeat criticism that they have heard before without understanding it (such as criticizing someone for sampling on the dependent variable, which is a legitimate sampling strategy—if used correctly). This chapter has provided some tools for recognizing the distinction between valid and invalid critique and how to respond to the latter.

12

The Sweeper Sweeper The last member or the tail of a climbing group. The sweeper’s task is to spot and retrieve things that may have accidentally fallen from the preceding climbers; to make sure that no mess or gear is left behind; and to make sure that the rear is keeping up with the whole team. Wikipedia (2020), “Glossary of Climbing Terms”

The Dirtbagger Manifesto As mentioned in Chapter 1, I really struggled to give a label to the approach described in this book. The standard choices for describing it—micro-level analysis, Grounded Theory, ethnography, inductive, theory generating, and so on—always left something to be desired. In writing this book, my goal has not been to offer one particular technique or method for qualitative research, but to describe a general style or approach to qualitative methods—my approach and one that I see so many of my colleagues using, but an approach that rarely gets discussed in general overviews of qualitative methods. I also wanted to correct a number of standard misconceptions about how qualitative methods can be done. This combination of characteristics led me to label it the Dirtbagger approach, in honor of early US rock climbers, primarily those who lived in camps (or caves) in Yosemite Valley in the 1950s and 1960s so they could spend all of their time climbing. They accomplished amazing feats by rejecting the social norms of the time. Even though climbing has become much more mainstream, the Dirtbag spirit lives on in contemporary climbers out there sending routes considered impossible in that earlier era but who now believe almost anything

251

252

Chapter 12

is possible. It’s this combination of deviance, dedication, vision, and achievement that underlies what I’m calling the Dirtbagger approach. So, as we wrap things up, how might one summarize it? 1. There is no one right way to do qualitative social science (but there are certainly wrong ways), and people should choose the approach that works for them, for the particular project at hand, given whatever constraints and opportunities are happening in their life at the time. 2. People disagree about what constitutes qualitative research, and a lot of definitions are overly narrow and misleading. 3. The research question is important, but it should be broadly construed. We need to understand that it can change, the final question doesn’t always precede data collection, nor does every paper resulting from a qualitative project even have a research question. 4. Your project will be better received if you connect it to some sort of anchor, such as a policy or sexy issue or a theoretical debate in the literature. But don’t let people bully you into thinking you need to do both of those things. 5. There are certain inviolable rules about setting up one’s research design, but executing a study also requires a good deal of flexibility because shit happens, and sometimes you need a do-over. 6. It’s fine if you selected your case intuitively or out of some inexpressible interest; just make sure to justify why you studied the case(s) you did using some of the established justifications. 7. One area where you have to be really careful is your sample: Think about what you are leaving out, what your data allow you to observe, and what you can do to fill in some of those blanks. 8. Regardless of your actual project, approach your research as a kind of fieldwork—that means taking fieldnotes (or memos), observing your reactions to your data and setting, and remembering what it means that you yourself are the instrument of data collection. 9. Content analysis and analytic memos are flexible tools that can get you through most of any qualitative project. They provide a systematic yet flexible way to analyze your data, and they are super useful for the write up. 10. Don’t let people tell you qualitative research doesn’t let you make causal claims: Qualitative research excels at unpacking causality either by directly demonstrating the causal mechanisms by which something hap-

The Sweeper253

pens or by allowing for causal inference through hypothesis testing, exploring alternative explanations, counterfactuals, necessary and sufficiency calculations, and logic. 11. People can be skeptical of qualitative research as an empirical tool and as a career choice; qualitative scholars know how powerful these methods are, but they also need to be prepared to defend their choices by understanding the standard criticisms and under what conditions those critiques are valid. I’m not claiming this approach is the best way—although I’ve definitely found it to be more useful than alternatives most of the time, and many of my colleagues seem to feel the same way. Indeed, my advice certainly does not always lay out the most efficient route (again, for that see Luker 2008). But by following what is in many ways a deviant approach (according to mainstream standards), we can come up with more creative insights and travel to heights others can’t imagine.

No One Right Way Revisited How do we—you the reader, me the author—know that this book isn’t just a manifesto glorifying and defending a highly undisciplined, inefficient, and ultimately incorrect approach to social science research? The answers that most readily come to mind each have their own counterargument: “It’s how a lot of people work.” Maybe undisciplined research strategies are more common than disciplined strategies. “A lot of people (including myself) have had a lot of success in generating creative and highly placed articles.” Maybe we could be more productive and come up with better research if we used other strategies. “There is no one best way that works for everyone or every project.” But what about the Boice (1990) writing experiment and others like it that show certain techniques work, individuals’ claims that it won’t work for them to the contrary? Well, shit. Just kidding. I’m still right. It might very well be that the approach I have described is inferior to other approaches. But there’s also a reason we haven’t all adopted systems that promise to be 100% effective. Maybe we aren’t all super disciplined when it comes

254

Chapter 12

to following very rigid approaches; approaches that require us to be rigid aren’t going to work for those of us who find ourselves in that situation. And it’s not just a matter of the frustration that comes from trying to shoehorn yourself into a mold that doesn’t fit—it’s also the grinding feelings of self-doubt that come with it.

Gendered Climbing Shoes For a long time, rock climbing was dominated by men. To some extent that is still true, but the last decade has seen an explosion of talented female pro-climbers like Sasha DiGiulian, Ashima Shiraishi, and Margo Hayes. Until fairly recently, companies that manufactured climbing shoes (starting in the 1980s) did not make shoes for women; female climbers had to wear shoes made for male climbers. Some women, like Lynn Hill and later Beth Rodden, still kicked ass at climbing—Hill was the first person to free climb El Capitan’s Nose and then the first person to climb it in a single day, while Rodden (with her climbing partner Tommy Caldwell) completed the second free ascent of that same route. But think about how much better female climbers could have been (and how many more women might have been climbing) if they had climbing shoes for women. It’s not even just about proper fit, although that matters—even with fairly wide feet, I prefer climbing shoes for women rather than gender-neutral shoes. Part of it is also feeling you belong and the confidence that brings.

Advice about the One Right Way is damaging. For example, if the only advice out there is to write like a machine (and you have a really hard time writing like a machine), you might feel like you’re not good enough and think you can’t be a writer or you don’t belong in academia. But that’s obviously bullshit, and it’s messed up that people feel that way about themselves! So yeah, the Dirtbagger approach might not be the best way—and it’s certainly not the best way for everyone—but enough people need an alternative, and this approach certainly provides one. In the end, try out different strategies and see what works for you. See what makes intuitive sense for you, for your particular project, for whatever you are going through right now. When I first read Professors as Writers (Boice 1990), I had my shit together and couldn’t relate to a lot of the advice he had for people struggling with their writing. A year later, I found myself having the same

The Sweeper255

struggles his book was aimed at alleviating. Likewise, for some projects, process tracing worked really well and for others it didn’t. In some cases, Rocking Qualitative Social Science might work really well, and in other cases you might decide you want to do something more similar to Salsa Dancing into the Social Sciences. It’s good to know what your alternatives are and that you don’t have to stick to just one approach. Ultimately, let’s be less judgmental about other people’s approaches to their work. What works for me might not work for you and vice versa. What works for an advisor might not work for their student. What works for you might not work for your friend. Better to be open to a range of strategies than to shame someone for their approach. (And just to be clear, there’s a difference between our preferences against an approach and our knowledge that a particular method simply gives wrong or misleading results—right now, we’re talking about preferences.) We have enough anxiety in academia, so let’s not add to it by making people feel bad about using a method we don’t find useful for ourselves. Finally, promoting one strategy to the exclusion of others is bad for the field. The last 50 or so years have been a period of epic growth as previously (and still) marginalized scholars are increasingly coming to the fore, introducing important concepts, theories, and methods that demonstrate the major blind spots and severe limitations of previously (and still) dominant approaches. These important contributions are prompting scholars to reconsider things many previously took for granted as basic, natural, or timeless tendencies of human nature and society. For too long these insights were blocked—for example, until very recently, the work of W. E. B. Du Bois was not taken seriously in the several fields to which he contributed (Morris 2017)—because of deeply entrenched biases against marginal groups like female scholars and scholars of color, especially Black scholars. And yet, once their contributions were recognized, they have significantly improved our collective understanding of the social world. Discrimination against researchers because of demographic features is obviously a different beast entirely than the dismissive and critical attitude toward scholars using a different approach to methods. But I like this example because it illustrates the appeal of diversity: Lots of research demonstrates that diversity leads to greater creativity and innovation (e.g., Page 2007). Methodological diversity is just one of many types that can help advance science. Additionally, however, there are interaction effects: When a member of a marginalized group presents research using entirely sound but unique methods, criticism tends to

256

Chapter 12

be stronger than when that type of research is presented by a member of the dominant group. So encouraging methodological pluralism is one way of minimizing some of the pathways by which research can reify the marginalization of scholars. Ultimately, that’s better for everyone. *

*

*

This final chapter has gone back up the route and cleaned up the gear left on the wall before finishing our climb. It has reviewed the major components of the Dirtbagger approach I have described in this book: It is not necessarily the best approach, as in the most efficient, but it is flexible, and that flexibility makes it something you can adapt to fit your purposes, preferences, and style in ways that still let you achieve great things. My central hope for this book is that it cuts down on some of the anxiety that emerges from the research process. I also hope it reduces the anxiety that comes from people’s judginess about research that follows a different approach from their own—by giving researchers who follow that style both the tools to defend themselves and the means to remind judgy critics that no one has a monopoly over methods. At the end of the day, there’s no one right way.

Notes

Acknowledgments

1. The video used to be online but seems to be no longer available. 2. Indeed, Luker’s Salsa Dancing into the Social Sciences is currently the best general text available for atypical qualitative methods. The collected works of Howard Becker are an alternative contender. However, his books focus on one specific stage or technique of doing qualitative research such as how to turn your research into a research paper or dissertation (Becker 1998), writing a research paper or dissertation (Becker 1986), how to deal with the challenges of case studies (Becker 2014), and most recently how to deal with “evidence” (Becker 2017). But they don’t go into detail on the rest of the research process in the way that Luker’s book takes our projects from the cradle to the grave.

Chapter 1

1. Gym grades are much easier than outdoor grades. Outdoor grades go up to 5.15d and start at 5.0; but a 5.0 outside is about as difficult as a gym 5.8 or thereabouts. My first time climbing outside, I was a 5.10c gym climber, and I struggled with an outdoor 5.4. Although, to be fair, it was raining. 2. Thankfully, I had a useful precedent. Luker (2008, 1–2) begins her book, Salsa Dancing into the Social Sciences, with a discussion on salsa dancing as a model for research. Yes, I’m totally riffing off that. 3. I’m conflating multiple things here. That’s because, while distinct, they overlap. A lot of quantitative approaches follow (or aspire to follow) the scientific method. But quantitative methods also have specific rules about how much data you need, what counts as data, and what types of questions you can ask with that data, which the scientific method is less persnickety about. (Making things even more complicated, I’ll sometimes talk about “normal science,” which is also different but has some overlap with these other approaches.) Different groups of scholars look more to one or the other approach, and the rest of us—for whom neither approach really works—can fall between the cracks. If that sounds familiar, this book is definitely for you. 257

258Notes

4. Throughout this book, I will often use the gender-neutral pronouns they/their. 5. Again, using a non-methodological label for describing a particular approach to research is not unprecedented. Luker (2008) refers to her intended readers as “salsa dancers.” 6. Thanks especially to Ellen Berrey, Phil Goodman, and Neda Maghbouleh for making this point. 7. Fatima Minhas, private communication used with permission, March 10, 2019.

Chapter 2

1. I’ll later make the point that data collection and data analysis basically go hand in hand. However, some people do use one without the other, so I’m treating them as distinct actions here. 2. Structured interviews are sometimes the basis for survey research. Here, the interviewer uses a “survey instrument” (basically a standard list of questions, usually asked in a particular order), and the interviewee can respond to either closed-or open- ended options—that is, they must select an answer from a set list of options, or they can respond however they want. Surveys with closed-ended options are more often the domain of quantitative research, while open-ended questions are more often the domain of qualitative research. 3. If you’ve taken a stats class, you know that’s because the Law of Large Numbers says the math that makes regressions work doesn’t hold up when your sample size is that small. Maybe—maybe—you can run a regression with a few variables if you have n = 200, but you need to proceed with caution. 4. Here is another wrinkle. Since most courts are operated at the county level, and Maryland has 23 counties, and let’s say for the sake of argument that there is one court for serious offenders per county, and those are the courts we care about, then is our n = 23? Again, we know there are differences in sentencing disparities across counties (some of the theories about sentencing disparities focus on county-level or court-level variables); with regressions using hierarchical models (more complicated regressions that calculate statistical significance differently), the sample size gets bumped down to the number of counties or courts, recognizing the lack of independence (or non-random similarity) between observations within counties. 5. A great resource for dealing with this critique is Becker (2014). 6. This critique also misses another element of these and other similar case studies that focus on a particular group or setting. In many such studies, scholars are relating whatever is going on in that specific case (group, state, organization, etc.) to larger social trends—what we sometimes call macro-level social factors or changes. In fact, Michael Burawoy (1991) has formalized a method in which ethnographies of specific contexts specifically relate their micro-or local-level observations to larger (macro-level) state, national, or international factors or trends. He calls it the “extended case method” because the ethnographer is “extending” the details and insights of their case to devel-

Notes

259

opments beyond that case, usually in an iterative manner, letting the case shape our understanding of those larger trends and also letting the larger trends help shape our understanding of the details of the case. 7. The following information comes from Rubin (2021). 8. Increasingly, scholars are performing mixed methods, which can include both qualitative and quantitative methods, such as sentencing disparities studies that use regressions to analyze sentencing patterns and interviews with judges and other courtroom actors to understand these patterns (e.g., Ulmer and Kramer 1996).

Chapter 3

1. Note that I’m saying “that you think is interesting,” not “that you have selected strategically because it is the best place to study it” or “that someone else finds interesting.” There is some sort of implicit interest driving you that you’ve not yet stated or maybe even identified consciously. 2. Indeed, when writing up my article, I did not include a research question. There are three places where I could have stated a research question but didn’t. First, at the end of the introduction (Rubin 2015b, 24), I wrote: While these agency-centered accounts correct the understanding of prisons as totalizing institutions in which prisoners are mere automatons, they may go too far. I argue that scholars have overused (and misused) the label “resistance.” The term “resistance” implies consciously disruptive, intentionally political actions. Much behavior labeled resistance does not fit this description, given the evidence presented. After noting both normative and analytical problems with such labeling, I offer the concept of “friction” . . . to describe most reactive behaviors that occur when people find themselves in highly controlled environments.

Then, near the end of the literature review (Rubin 2015b, 27), I wrote: I argue that scholars can more fruitfully analyze prisoner behavior by conceptually distinguishing between prisoners’ frictional activities and resistance, understood as consciously political, grievance-or justice-oriented (and often collective) behavior. . . . Importantly, we must not view friction as insignificant or less meaningful than resistance, but as potentially different in its causes, frequency, and consequences. Below, I outline the normative and analytical consequences from their commensuration.

Finally, at the end of the data and methods section, I wrote, “Through these examples, I seek to illustrate overlooked aspects of friction that have been obscured by their characterization as resistance” (Rubin 2015b, 31). Basically, these were insights I derived from my data—it would be artificial to add in a research question because I had both collected the data and come up with these insights without a research question about prisoner resistance.

260Notes

Chapter 4

1. There are a few rare exceptions, like Michel Foucault, who totally got away with it, but then he got away with breaking a lot of rules the rest of us could not get away with breaking. Certainly, some folks would argue he did not get away with it, but he’s still really famous and popular, so . . . 2. By scholarship, I mean academic books, book chapters, and articles, and occasionally government or NGO reports. 3. When the social patterns you suddenly start to see relate to power inequalities associated with race, gender, and sexuality, for example, people now tend to call this getting “woke.” But this tendency is also true of more politically neutral realizations, such as when you read up about how architecture and the built environment is designed to encourage particular types of behavior and then you go to Disneyland or Disney World and realize just how much of your experience is subtly shaped by the designers and staff (e.g., Shearing and Stenning 1984). 4. I’m not positive, but I think the origin of this advice—at least in social science, since it might actually stem from journalism—comes from Wilson and Gudmundsdottir (1987), although it has also been developed in Luker (2008). 5. During the pandemic, some of my most arcane historical research became relevant as COVID-19 swept prisons worldwide, just as other diseases had for hundreds of years. So, again, the policy context can change. I’ve seen a similar thing happen with a number of other seemingly esoteric topics in recent years. 6. At different universities, comps will have different titles (like qualifying exams— quals for short), but they’ll be something like this. If you don’t have one at your uni, you can find many examples of these reading lists online.

Chapter 5

1. You can have a great route planned, but things you can’t control can force you to take a detour or prevent you from completing the climb altogether. Shit happens—the weather, injuries, gear malfunctions, and your personal physical or mental limits can stop you from completing your climb as you expected. 2. When Honnold free soloed Half Dome, he didn’t pre-climb all of the pitches to “keep it sporting.” That led to a particularly scary moment where he wasn’t sure he could do the final hard move. But he was able to do it, didn’t fall, and became famous (Honnold and Roberts 2016). So, sometimes it works out, even if it feels a bit sketchy. 3. Random sampling might be the norm for quantitative methods, but it’s not the norm for qualitative methods. There are certainly times when one might use random sampling, but more often there will be a more appropriate sampling method for identifying your case and your sources of data (see Chapters 6 and 7). 4. This is helpful to keep in mind for another reason: In grad school, I was frustrated by the different standard to which my research was held. From what I could tell, I was doing things I’d seen the Big Name folks doing, but then I was told by my advisors that I couldn’t do that. Only later did I understand that there is this different standard for methods, and that’s why I couldn’t get away with using the same method a more senior

Notes

261

person used in a project they were praised for. (This is also true for theory development, by the way.) 5. Although these threats are discussed in a lot of texts, I’m using Shadish et al. (2002, 53–63) as the basis for my discussion. 6. Just to be sure, I also hired two undergraduates to recode a small random sample of documents that I gave them in random order, and I was able to see that the frequency of the professionalization code increased over time, with the subcodes varying across time. I didn’t trust the coding enough to present tables with a frequency analysis, but it was enough to reassure me that I hadn’t been mistaken. 7. It’s really important to recognize that this is different from rerunning a regression to get significant findings. Recall that there are mathematical reasons why you can’t do that. There are also other reasons why you can’t do that, if you are doing so strategically and not genuinely. If you are looking over and over again until you get a desired result, that is flat-out wrong; if you are looking over and over to figure out what new analyses you can realize, that’s fine.

Chapter 6

1. Since theories should typically be true across time and place—with appropriate adjustments—the absence of historical and international research in many fields always struck me as a major problem: How do we know our theories hold up in other contexts? But I digress. 2. I don’t mean to imply that political scientists don’t do ethnography at all; instead, those who use ethnographic methods seem to use a different framework than in sociology or anthropology, where we find most ethnographers. 3. I say “about” because you can argue about one or two cases that should or should not be included in that total. See Rubin (2015a). 4. I continue to rely on these sources—particularly info from reports in the Bureau of Justice Statistics (https://www.bjs.gov/index.cfm?ty=tp&tid=18) as well as Banner (2002) and some other scholars—when I give examples using capital punishment as my topic of interest. 5. Incidentally, Seawright and Gerring (2008) subtitle their article, “A Menu of Qualitative and Quantitative Options.” 6. Certainly when we try to extend the findings from our US studies to human nature more generally, we face some pretty basic challenges. There is a great TED talk, for example, about how language shapes how we think; because there is such a multiplicity of language styles (including how one would order objects or symbols from left to right, up and down, according to cardinal directions, etc.), much of our research about how humans think, based so often on US populations, is producing heavily biased results (Boroditsky 2017). 7. For more on the death penalty in international context, see, for example, Garland (2010) and Hood (2001). 8. My focus on Eastern State Penitentiary also falls into this category. In many ways, Eastern was not a particularly important prison in its time. It was well known, but it

262Notes

was not influential (if anything, it was the example of what not to do). However, it did receive a lot of attention, both in the nineteenth century and since then. In fact, most criminology undergraduates will recognize the name—they may mix up whether Eastern is the one with solitary or the one with factory-style labor (I’ve also seen faculty confuse Eastern with its competitor, Auburn State Prison). Eastern is an obligatory reference in most textbooks that say anything about prisons, and part of the reason I wanted to study it was because it’s more exciting to learn about a famous prison than a not-famous prison. Of course, there are other reasons to study it—Eastern represents one of the two available models for how to design early prisons in the United States. And, my main reason: It defied outrage, scrutiny, and norms to retain its exceptional, highly criticized approach to incarceration. 9. You can kind of think of this strategy as the case selection equivalent of stratified random sampling—but minus the random part. You are selecting from the different strata you believe are important to have in your study. 10. For any statistics fans out there, this method is similar to the statistical method for causal inference called “matching,” as Seawright and Gerring (2008, 304–306) point out. 11. Again, as a qualitative researcher, this is just half the battle: I still need to investigate whether things look the way I expect. Maybe I look closer and see that the early adopter got a prison before it reduced its reliance on capital punishment. Maybe I also see the late adopter kept trying to get a prison, but internal squabbles about where to put the prison kept blocking the requisite legislation from passing. Such findings would suggest a more complicated relationship between capital punishment and prison adoption than the literature suggests and indicate the need to look to other factors to understand states’ prison adoption decisions. 12. All of these examples may not work if it was clear the exogenous shock was coming ahead of time. For example, the declaration of war might be something people were preparing for, or it might totally take them by surprise. So you have to use some judgment to decide what’s really sudden and unexpected. 13. This is a recurring theme in a variety of literatures, but especially critiques of the path dependence literature (e.g., Thelen 2003). 14. You’re also better off because, if you looked at the quantitative data without knowing about Polly’s murder, you might not know about this confounding factor; however, if you look at the qualitative data from this time, you would almost certainly find out that it’s something else that is relevant and needs to be taken into account. 15. I wrote this before the COVID-19 pandemic, but that’s a good illustration: A lucky few people were able to continue to do their data collection via interviews and could see how the pandemic was affecting people. The research topic, however, might have shifted since most people weren’t studying pandemic effects before the pandemic. Most people, however, lost access to their fieldsites, which illustrates how well the stars need to align for living through an exogenous shock to work out for your research.

Notes

263

Chapter 7

1. There have been times when climbers decided that rules based on environmental protection or the respect for Indigenous lands were stupid. Just to be clear, I disagree, and I’m not counting those rules in this category. 2. That is, as long as you don’t claim to have a perfectly representative sample when you don’t—or don’t know with great certainty. 3. What made it into the records was (a) what the reformers decided to write down, which depended to some extent on who was taking the minutes at the time; (b) who came to the meeting and what they chose to report; and (c) what the individual reformers were actually privy to when they visited the prison. 4. There is also a growing resistance from vulnerable communities—especially Indigenous, Black, and other people of color—that are heavily studied: Many members of these communities do not want to be further studied, particularly for research that is replicating what has already been demonstrated by dozens or hundreds of previous studies. This goes extra for studies conducted by White scholars on communities of color because of the lengthy history of insensitive (and worse) approaches. Again, the consideration here is what sort of payoff are your research participants, or their communities, getting from that additional interview or even that additional study, particularly if we already have enough data (or enough studies). These considerations are one reason why participatory action research is becoming more common so that the members of these communities have a say in how they are studied. While these issues come up in many IRB/REB trainings these days, I was first introduced to these ideas by Indigenous methodologists (e.g., Gray 2015; Smith 2013; Tuck and Yang 2014). These ideas have also been developed and expanded by Black feminist and other critical race scholars (e.g., Zuberi and Bonilla-Silva 2008). 5. I actually had a lot of other data sources, but these were the most useful sources for this project. 6. I later learned the reform society requested its members to keep diaries for their visits to the prison.

Chapter 8

1. As others have pointed out, White men studying other White men or middle-class people studying other middle-class people are not usually described as doing insider research. Additionally, the label “insider” research often overlooks the many ways in which the researcher is different from the participants, even if they share the same skin tone, ethno-racial category, religion, or first language. As people labeled “insiders” have pointed out, they and their research participants are usually well aware of the multiple differences between them—class, education, gender performances, occupation, status, proficiency in the participants’ preferred language, and knowledge of what participants see as basic skills and common sense (e.g., Contreras 2012; Emerson 2001, chs. 6, 8, 9; Flores 2016). Consequently, people who consider certain research to be insider research

264Notes

or mesearch miss the point that what they see as a characteristic shared by the researcher and the people they study is not necessarily a salient characteristic to either party. 2. Now, some caveats. First, these are not necessarily the two strategies ethnographers would say are the most important or distinguishing features of ethnography. Second, these strategies are not unique to ethnographers—which is also why I can assemble them into a generalizable approach. 3. For a few examples, see part II in Emerson (2001). There are also great examples related to criminological research in Contreras (2012), Flores (2012), Goodman (2011), and Jenness (2011). 4. To be clear, some of these are distinct streams—not all feminist research is decolonializing, for example (e.g., Tuck and Yang 2012). 5. Some of the most interesting discussions are taking place in a variety of medico- scientific areas, where researchers are starting to pay attention to gendered differences in ADHD, autism, and other forms of neurodiversity (e.g., Szalavitz 2016). 6. Maybe this is one big reason why we don’t speak in terms of standard errors in the variations in our answers. They’re not wrong; they’re just different perspectives or different angles. For example, if I showed you a picture of Everest, it’s accurate whether I showed you the Nepal side or the Tibet side—both are pictures of Everest even though they show different faces and would therefore look different. 7. For an ethnographic description of how these designations are made in California prisons, see Goodman (2008). 8. I’ll also point out that there are inaccuracies in these documents: In the various lists of prisoners in the prison I studied, a surprising number of prisoners were variously identified as White or Black at different times and by different (all White male) administrators. Some of this may have been sloppy recordkeeping, and some of it may have been variation in how one identified someone who was on the border between racial categories. Additionally, there was no official racial designation for Asian or Native American. It wasn’t unique to my dataset either (see Fyson and Fenchel 2015). 9. It might be helpful to think of them as memos. However, when I use the word “memo,” students sometimes think I mean something more official or formal. As we’ll see, that’s totally not the case. Or, at least, it doesn’t have to be. You can make them formal if you want, but they can be super informal, too.

Chapter 9

1. There is an unfortunate tendency among journals’ gatekeepers—the reviewers and editors—to reject quantitative research that yields non-significant findings. By contrast, in qualitative research, your chances depend more on your ability to identify the interesting material and frame it appropriately. 2. This is kind of a funny distinction. Charmaz (1996) laid out six steps that characterize Grounded Theory; I borrow from all but one of these, the last: “delay of the literature review.” But it’s that last step that seems to be most associated with Grounded Theory.

Notes

265

3. Chapter 6 of Emerson et al. (1995) is a particularly useful introduction to the ideas I am describing in this chapter. For a more detailed explanation, see Saldaña (2012). 4. You can do the coding process for separate datasets in parallel, consecutively, or with some mix of those two options. I prefer some mix: Reading documents in parallel lets you get a better sense of what’s going on, but you also want to be able to really focus on a dataset, so splitting the difference is helpful. 5. I didn’t even do true open coding. I coded more than what I thought would be directly relevant to my research question, but I didn’t actually code everything. 6. You might have more than 150 codes and be fine with that. For me, it was too many, and I ended up really focusing on only several distinct coding families, each with their own sets of codes that were more manageable. 7. Many universities have some program by which graduate students and faculty can employ undergraduates in research tasks: at UC Berkeley, it was called URAP (Undergraduate Research Apprenticeship Program); at Florida State University, UROP (Undergraduate Research Opportunity Program); at University of Toronto, ROP (Research Opportunity Program). In each case, a student signs up for course credit in exchange for a certain number of hours of work each week on a research project. Sometimes, there are additional portions like giving a presentation at the end of the semester or keeping a journal, as well as requirements to meet with their supervisor for some set number of hours per semester to make sure faculty aren’t abusing the situation and just using the students as laborers rather than treating it as a two-way street. 8. One big caveat, though, is the level of sensitivity in your data. I mostly use a type of data that can be collected without going through an IRB/REB. If you are doing an ethnography of a sensitive issue, keeping your notes online may not fly. In that case, you can also replicate the blog format through other, non–internet-based programs on your computer. 9. Thanks to Ellen Berrey for this idea.

Chapter 10

1. In addition to the tricks I describe below, another piece of the puzzle that gave me confidence was finding a similar phenomenon in a different context described in the theory of Philip Selznick (1949, 1957). 2. But keep in mind: If you aren’t using the specific steps and jargon, and citing the scholars who pioneered and use process tracing, you’re not officially doing process tracing. It’s more that you are using the same intuition, but the difference matters. 3. Don’t worry too much about what’s Bayesian and what’s frequentist. Basically, these are two philosophies underlying statistical properties, tricks, and rules. Both are pretty neat, but usually people have a favorite. For our purposes, though, the details of these philosophies don’t really matter. If you get into statistics, though, you’ll learn one or the other or maybe both. 4. Sociologist John Sutton (1988, 1990) had used neo-institutional theory in his work on juvenile incarceration in a later historical period.

266Notes

5. For a helpful, more detailed guide on this type of approach, see Goertz and Mahoney (2012). 6. I would say you can’t randomly assign people different “immutable” traits like race or gender, but since these are socially constructed rather than biological categories, you technically could, but it probably wouldn’t be ethical—or it wouldn’t get past an ethics review board. More subtly, even if you could change one’s race or gender for an experiment, you wouldn’t be able to do so in way that erases all the experiences one had from when they had a different race or gender, which would likely continue to shape their experiences. So, people don’t usually do experiments with things that have at least traditionally been seen as immutable, even though we’re now pretty clear that they aren’t. 7. For example, see Bennett (2008), Brady and Collier (2010), and George and Bennett (2005).

Chapter 11

1. The obsession with quantitative methods is a fairly American trait, not found as much in other countries. In Canada, for example, there is far less quantitative research conducted and less respect for quantitative work than in the United States—although there are plenty of exceptions. In describing the disciplinary tendencies in this section, I am really describing trends in the United States. 2. When teaching my first graduate qualitative methods class in a criminology department, I borrowed this approach. I had previously hosted a monthly workshop on qualitative methods, which gave me a taste of the misconceptions about—as well as the thirst for—qualitative research in a top criminology department: No one had taught the class in recent memory. Even teaching the same class in a sociology department, I still belabor these points because the ghosts of such faulty critiques would show up in the students’ answers to midterm questions. 3. With quantitative data, for example, only a randomized experiment (which requires randomly assigning observations to a control group or a treatment group so the difference between the two groups is meaningful) satisfies these requirements; since experiments are difficult to conduct for some types of questions, quasi-and natural experiments can approximate these conditions (for a useful guide, see Angrist and Pischke 2009). However, most quantitative work uses neither. Statistician David Freedman (2010) distinguishes between quantitative and statistical methods, noting that a lot of quantitative methods violate statistical rules required for making causal inferences. Without naming names, I’ll just say that some (sub)fields are more aware of these limitations than others. In my experience, for whatever that’s worth, it seems to be the fields that are least aware of their own limitations that are the most critical of similar limitations in qualitative methods. 4. For these reasons, there is a growing discussion in physics and other disciplines to perform “blinded” analyses: finalizing the analysis before examining the results to avoid scientist-imposed biases (i.e., tweaking the analysis to achieving the “right” answer, according to current beliefs) (Rubin et al. 2015). In the social sciences, there is now a

Notes

267

move toward pre-registered reports, wherein people will document their analyses before running them. There’s also an increasingly common requirement to make your quantitative data publicly accessible so people can check your results or rerun your models according to the specifications that might make more sense. 5. Several years ago, there was a significant case in which a graduate student who had coauthored a highly publicized Science article admitted to fabricating his data after another set of scholars attempted to replicate the study to extend that research (see, e.g., Bohannon 2015). In fact, just months before this manuscript went out for review, a big scandal broke in criminology with allegations that a professor had fabricated their results, and there was a fair amount of evidence provided with that accusation. People don’t take these issues lightly so when they come after quant scholars, they bring their evidence. (Part of the scandal, however, has been over whether or not the allegations should be believed, despite the evidence delivered, and another part has been over how relevant authorities are handling the allegations.) There seems to be a much lower standard for taking on qualitative scholars. 6. In fact, the scholars using photos and videos are often doing so at the behest of their participants and as part of a broader effort to recast research “subjects” (an increasingly outdated phrase) as research participants who engage on their own terms with these projects (e.g., Gurusami 2019; Lee 2016). 7. With archival records, replication is possible but still difficult: One has to go to the same archive (which can be time-consuming and expensive) and find the same documents, which hopefully have not disappeared in the time since the study (this sometimes happens). In her award-winning book on the Attica prison riot of 1971, historian Heather Ann Thompson (2016) begins by explaining the difficulties of collecting archival data on this episode. In particular, some of the data with which she had been working disappeared upon a subsequent trip to the archives. 8. Apparently, we qualitative scholars are pirates in my imagination. 9. There is a clear hierarchy in academic publishing. Each field has its top journals, its middling journals, its low-tier journals, and journals no one has heard of. While you can probably get an article published somewhere, there are certain journals where you are better off not publishing your article at all than having that journal on your CV. 10. I’ve also noticed a number of scholars these days referring to their personal or other professional experience to supplement (e.g., a year-long ethnography plus five years working in [whatever relevant occupation]). I have mixed feelings about this—it does contribute to one’s credibility, and you can’t unlearn your own experiences, but it is also an ethical gray zone, as we’re not supposed to use data collected before going through our universities’ research ethics boards/committees. 11. For a great model on how to do this, see McDermott (2006).

This page intentionally left blank

Bibliography

Angrist, J. D., and Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton, NJ: Princeton University Press. Banner, S. (2002). The Death Penalty: An American History. Cambridge, MA: Harvard University Press. Becker, H. (1963). Outsiders. New York: Free Press. Becker, H. (1986). Writing for Social Scientists. Chicago: University of Chicago Press. Becker, H. (1998). Tricks of the Trade: How to Think About Your Research While You’re Doing It. Chicago: University of Chicago Press. Becker, H. (2014). What About Mozart? What About Murder? Reasoning from Cases. Chicago: University of Chicago Press. Becker, H. (2017). Evidence. Chicago: University of Chicago Press. Beckett, K., and Herbert, S. (2010). Penal boundaries: Banishment and the expansion of punishment. Law & Social Inquiry, 35(1):1–38. Bennett, A. (2008). Process tracing: A Bayesian perspective. In J. M. Box-Steffensmeier, H. E. Brady, and D. Collier, editors, The Oxford Handbook of Political Methodology (pp. 702–721). New York: Oxford University Press. Bennett, A. (2010). Process tracing and causal inference. In H. Brady and D. Collier, editors, Rethinking Social Inquiry, 2nd edition (pp. 207–220). Lanham, MD: Rowman & Littlefield. Bohannon, J. (2015, May 28). Science retracts gay marriage paper without agreement of lead author LaCour. Science Magazine. http://www.sciencemag.org/news/2015/05/ science-retracts-gay-marriage-paper-without-agreement-lead-author-lacour Boice, R. (1990). Professors as Writers: A Self-Help Guide to Productive Writing. Stillwater, OK: New Forums Press. Boroditsky, L. (2017, November). “How language shapes the way we think.” TED, 14:04. https://www.ted.com/talks/lera_boroditsky_how_language_shapes_the_way _we_think Brady, H., and Collier, D. (2010). Rethinking Social Inquiry, 2nd edition. Lanham, MD: Rowman & Littlefield.

269

270Bibliography

Brady, H. E. (2010). Data-set observations versus causal-process observations: The 2000 U.S. presidential election. In H. Brady and D. Collier, editors, Rethinking Social Inquiry, 2nd edition (pp. 208–240). Lanham, MD: Rowman & Littlefield. Brady, H. E., Collier, D., and Seawright, J. (2010). Refocusing the discussion of methodology. In H. Brady and D. Collier, editors, Rethinking Social Inquiry, 2nd edition (pp. 15–31). Lanham, MD: Rowman & Littlefield. Brown, M. (2009). The Culture of Punishment: Prison, Society, and Spectacle. New York: New York University Press. Burawoy, M. (1991). The extended case method. In M. Burawoy, A. Burton, A. A. Ferguson, and K. J. Fox, editors, Ethnography Unbound: Power and Resistance in the Modern Metropolis (pp. 271–287). Berkeley and Los Angeles: University of California Press. Burawoy, M. (1998). The extended case method. Sociological Theory, 16(1):4–33. Burawoy, M. (2003). Revisits: An outline of a theory of reflexive ethnography. American Sociological Review, 68(5):645–679. Byrne, E. (2018). Swearing Is Good for You: The Amazing Science of Bad Language. New York: W. W. Norton Company. Calavita, K., and Jenness, V. (2013). Inside the pyramid of disputes: Naming problems and filing grievances in California prisons. Social Problems, 60(1):50–80. Calavita, K., and Jenness, V. (2015). Appealing to Justice: Prisoner Grievances, Rights, and Carceral Logic. Oakland: University of California Press. Caldwell, T. (2015, September 15). “What are you up against?” TEDxKC, 17:39. https:// www.youtube.com/watch?v=PnMs_qLwaes Caldwell, T. (2017). The Push: A Climber’s Journey of Endurance, Risk, and Going Beyond Limits. New York: Viking. Carrabine, E. (2004). Power, Discourse and Resistance. Aldershot Hants, UK, and Burlington, VT: Ashgate Publishing Limited. Charmaz, K. (1983). The grounded theory method: An explication and interpretation. In R. M. Emerson, editor, Contemporary Field Research (pp. 109–126). Boston: Little, Brown. Charmaz, K. (1996). The search for meaning—grounded theory. In J. Smith, R. Harre, and L. V. Langenhove, editors, Rethinking Methods in Psychology (pp. 27–49). London: Sage Publications. Chesney-Lind, M., and Chagnon, N. (2016). Criminology, gender, and race: A case study of privilege in the academy. Feminist Criminology, 11(4):311–333. Clemmer, D. (1940). The Prison Community. New York: Holt, Rinehart and Winston. Contenta, S. (2016, October 2). From failed Bronx drug dealer to U of T sociologist. The Star (October 2). https://www.thestar.com/news/insight/2016/10/02/from-failed- bronx-drug-dealer-to-u-of-t-sociologist.html Contreras, R. (2012). The Stickup Kids: Race, Drugs, Violence, and the American Dream. Berkeley and Los Angeles: University of California Press. Cummins, E. (1994). The Rise and Fall of California’s Radical Prison Movement. Stanford, CA: Stanford University Press.

Bibliography271

Davis, F. J. (1991). Who Is Black?: One Nation’s Definition. University Park: Pennsylvania State University Press. DiMaggio, P. J., and Powell, W. W. (1983). The iron cage revisited: Institutional isomorphism and collective rationality in organizational fields. American Sociological Review, 48(2):147–160. @DirtbagMovie. (2018, July 11). “In the climbing world, being called a ‘Dirtbag’ is a badge of honor, a hard-earned title not for the meek.” Twitter, 1:37 p.m. https://twitter.com/DirtbagMovie/status/1017100631184568320 Duneier, M. (1999). Sidewalk. New York: Farrar, Straus and Giroux. Eason, J. M. (2017). Big House on the Prairie: Rise of the Rural Ghetto and Prison Proliferation. Chicago: University of Chicago Press. Emerson, R. M. (2001). Contemporary Field Research: Perspectives and Formulations. Prospect Heights, IL: Waveland Press. Emerson, R. M., Fretz, R. I., and Shaw, L. L. (1995). Writing Ethnographic Fieldnotes. Chicago: University of Chicago Press. Espeland, W. N., and Stevens, M. L. (2008). A sociology of quantification. European Journal of Sociology, 49(3):401–436. Evening Sends (2020). Climbing dictionary. http://eveningsends.com/ climbingclimbing-definitions/ Feeley, M. M. (1979). The Process Is the Punishment: Handling Cases in a Lower Criminal Court. New York: Russell Sage Foundation. Flashman, J. (2017, June 12). Interview: The first naked ascent of El Capitan. Climbing. https://www.climbing.com/news/the-first-naked-ascent-of-el-capitan/ Flores, J. (2012). Jail pedagogy: Liberatory education inside a California juvenile detention facility. Journal of Education for Students Placed at Risk (JESPAR), 17(4):286–300. Flores, J. (2016). Caught Up: Girls, Surveillance, and Wraparound Incarceration. Berkeley and Los Angeles: University of California Press. Freedman, D. A. (2010). On types of scientific inquiry: The role of qualitative reasoning. In H. Brady and D. Collier, editors, Rethinking Social Inquiry, 2nd edition (pp. 221– 236). Lanham, MD: Rowman & Littlefield. Fyson, D., and Fenchel, F. (2015). Prison registers, their possibilities and their pitfalls: The case of local prisons in nineteenth-century Quebec. History of the Family, 20(2):163–188. Gallo, C. (2014). Talk Like TED: The 9 Public-Speaking Secrets of the World’s Top Minds. New York: St. Martin’s Press. Garland, D. (2001). The Culture of Control: Crime and Social Order in Contemporary Society. Chicago: University of Chicago Press. Garland, D. (2001). Introduction: The meaning of mass imprisonment. Punishment & Society, 3(1):5–7. Garland, D. (2010). Peculiar Institution: America’s Death Penalty in an Age of Abolition. Cambridge, MA: Harvard University Press. Geertz, C. (1973). The Interpretation of Culture: Selected Essays. New York: Basic Books.

272Bibliography

George, A., and Bennett, A. (2005). Case Studies and Theory Development in the Social Sciences. Cambridge, MA: MIT Press. Gibson-Light, M. (2019). The Prison as Market: How Penal Labor Systems Reproduce Inequality. PhD dissertation, University of Arizona. Glaser, B. G., and Strauss, A. L. (1967). The Discovery of Grounded Theory: Strategies for Qualitative Research. Chicago: Aldine Publishing Company. Goertz, G., and Mahoney, J. (2012). A Tale of Two Cultures: Qualitative and Quantitative Research in the Social Sciences. Princeton, NJ: Princeton University Press. Goffman, A. (2009). On the run: Wanted men in a Philadelphia ghetto. American Sociological Review, 74(3):339–357. Goffman, A. (2014). On the Run: Fugitive Life in an American City. Chicago: University of Chicago Press. Goffman, E. (2001). On fieldwork. In R. M. Emerson, editor, Contemporary Field Research (pp. 153–158). Prospect Heights, IL: Waveland Press. Goffman, E. (2009 [1963]). Stigma: Notes on the Management of Spoiled Identity. New York: Simon & Schuster. Gonzalez Van Cleve, N. (2016). Crook County: Racism and Injustice in America’s Largest Criminal Court. Stanford, CA: Stanford University Press. Goodman, P. (2008). “It’s just Black, White, or Hispanic”: An observational study of racializing moves in California’s segregated prison reception centers. Law & Society Review, 42(4):735–770. Goodman, P. (2011). From “observation dude” to “an observational study”: Gaining access and conducting research inside a paramilitary organization. Canadian Journal of Law and Society, 26:599–605. Goodman, P. (2020). “Work your story”: Selective voluntary disclosure, stigma management, and narratives of seeking employment after prison. Law & Social Inquiry, 45(4):1–29. Goodman, P., Page, J., and Phelps, M. (2017). Breaking the Pendulum: The Long Struggle over Criminal Justice. New York: Oxford University Press. Gracy, K. F. (2004). Documenting communities of practice: Making the case for archival ethnography. Archival Science, 4(3–4):335–365. Gray, R. R. R. (2015). Ts’msyen Revolution: The Poetics and Politics of Reclaiming. PhD dissertation, University of Massachusetts, Amherst. Gurusami, S. (2019). Motherwork under the state: The maternal labor of formerly incarcerated Black women. Social Problems, 66(1):128–143 Harding, S. (2009). Standpoint theories: Productively controversial. Hypatia, 24(4):192–200. Harris, C. (2001). Archival fieldwork. Geographical Review, 91(1–2):328–334. Hartman, S. (2008). Venus in two acts. Small Axe: A Caribbean Journal of Criticism, 12(2):1–14. Hasson, U. (2016). “This is your brain on communication.” TED2016, 14:44. https://www. ted.com/talks/uri_hasson_this_is_your_brain_on_communication?language=en

Bibliography273

Honnold, A., and Roberts, D. (2016). Alone on the Wall. New York: W. W. Norton Company. Hood, R. (2001). Capital punishment: A global perspective. Punishment & Society, 3(3):331–354. Huang, M. (2016). Vulnerable observers: Notes on fieldwork and rape. Chronicle of Higher Education. https://www.chronicle.com/article/Vulnerable-Observers- Notes-on/238042 (last accessed: February 5, 2019). Humphreys, L. (1975). Tearoom Trade: Impersonal Sex in Public Places. New York: Routledge. Hutchinson, S. (2006). Countering catastrophic criminology: Reform, punishment and the modern liberal compromise. Punishment & Society, 8(4):443–467. Intemann, K. (2010). 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia, 25(4):778–796. Irwin, J. (1970). The Felon. Englewood Cliffs, NJ: Prentice-Hall. Irwin, J. (2005). The Warehouse Prison: Disposal of the New Dangerous Class. Los Angeles: Roxbury. Irwin, K., and Umemoto, K. (2016). Jacked Up and Unjust: Pacific Islander Teens Confront Violent Legacies. Oakland: University of California Press. Jacques, S. (2014). The quantitative–qualitative divide in criminology: A theory of ideas’ importance, attractiveness, and publication. Theoretical Criminology, 18(3):317–334. Jenness, V. (2011). Getting to know “the girls” in an “alpha-male community”: Notes on fieldwork on transgender inmates in California prisons. In S. Fenstermaker and N. Jones, editors, Sociologists Backstage: Answers to 10 Questions About What They Do (pp. 139–161). New York: Routledge. Jordan, W. (1974). The White Man’s Burden: Historical Origins of Racism in the United States. New York: Oxford University Press. King, G., Keohane, R. O., and Verba, S. (1994). Designing Social Inquiry: Scientific Inference in Qualitative Research. Princeton, NJ: Princeton University Press. Kohler-Hausmann, I. (2018). Misdemeanorland: Criminal Courts and Social Control in an Age of Broken Windows Policing. Princeton, NJ: Princeton University Press. Lamont, M., and White, P. (2005). Workshop on Interdisciplinary Standards for Systematic Qualitative Research: Cultural Anthropology, Law and Social Science, Political Science, and Sociology Programs. National Science Foundation. Latour, B. (1986 [1979]). Laboratory Life: The Construction of Scientific Facts. Princeton, NJ: Princeton University Press. Lee, D. S., and McCrary, J. (2017). The deterrence effect of prison: Dynamic theory and evidence. In Regression Discontinuity Designs: Theory and Applications (pp. 73–146). Emerald Publishing Limited. Lee, J. (2016). Blowin’ Up: Rap Dreams in South Central. Chicago: University of Chicago Press. Leo, R. A. (1996). Inside the interrogation room. Journal of Criminal Law and Criminology, 86(2):266–303.

274Bibliography

Lewis-Kraus, G. (2016, January 12). The trials of Alice Goffman. New York Times Magazine. http://www.nytimes.com/2016/01/17/magazine/the-trials-of-alice-goffman. html?_r=0 Lofland, J., Snow, D., Anderson, L., and Lofland, L. H. (2006 [1971]). Analyzing Social Settings: A Guide to Qualitative Observation and Analysis, 4th edition. Belmont, CA: Wadsworth. Lorimer, H. (2009). Caught in the nick of time: Archives and fieldwork. In D. DeLyser, S. Aitken, M. Crang, S. Herbert, and L. McDowell, editors, The SAGE Handbook of Qualitative Research in Human Geography (pp. 248–273). London: Sage Publications. Lowe, C. C., and Fagan, A. A. (2019). Gender composition of editors and editorial boards in seven top criminal justice and criminology journals from 1985 to 2017. Journal of Criminal Justice Education, 30(3):424–443. Lubet, S. (2017). Interrogating Ethnography: Why Evidence Matters. Oxford: Oxford University Press. Luker, K. (2008). Salsa Dancing into the Social Sciences: Research in an Age of Info-Glut. Cambridge, MA: Harvard University Press. Lynch, M. (2000). The disposal of inmate #85271: Notes on a routine execution. Studies in Law, Politics, and Society, 20:3–34. Lynch, M. (2010). Sunbelt Justice: Arizona and the Transformation of American Punishment. Stanford, CA: Stanford University Press. Maurutto, P., and Hannah-Moffat, K. (2006). Assembling risk and the restructuring of penal control. British Journal of Criminology, 46(3):438–454. McDermott, M. (2006). Working-Class White: The Making and Unmaking of Race Relations. Berkeley and Los Angeles: University of California Press. McLennan, R. M. (2008). The Crisis of Imprisonment: Protest, Politics, and the Making of the American Penal State, 1776–1941. New York: Cambridge University Press. McNeill, F., editor (2018). Pervasive Punishment: Making Sense of Mass Supervision. Emerald Publishing Limited. Meranze, M. (1996). Laboratories of Virtue: Punishment, Revolution, and Authority in Philadelphia, 1760–1835. Chapel Hill: University of North Carolina Press. Merry, S. E. (1990). Getting Justice and Getting Even: Legal Consciousness Among Working-Class Americans. Chicago: University of Chicago Press. Merry, S. E. (2016). The Seductions of Quantification: Measuring Human Rights, Gender Violence, and Sex Trafficking. Chicago: University of Chicago Press. Meyer, J. W., and Rowan, B. (1977). Institutionalized organizations: Formal structure as myth and ceremony. American Journal of Sociology, 83(2):340–363. Mill, J. S. (1872 [1843]). A System of Logic, Ratiocinative and Inductive. Vol. 1. London: Longmans, Green, Reader, and Dyer. Morrill, C. (1995). Conflict Management in Corporations. Chicago: University of Chicago Press. Morrill, C. (2007). “Systematic Ethnography.” Invited Lecture, Center for Law and Society, Program in Jurisprudence and Social Policy, Boalt Hall School of Law, University of California, Berkeley.

Bibliography275

Morrill, C., and Musheno, M. (2018). Navigating Conflict: How Youth Handle Trouble in a High-Poverty School. Chicago: University of Chicago Press. Morris, A. (2017). The Scholar Denied: W. E. B. Du Bois and the Birth of Modern Sociology. Oakland: University of California Press. Oxford English Dictionary. (2020). https://www.lexico.com/en/definition/empirical Page, S. E. (2007). The Difference: How the Power of Diversity Creates Better Groups, Firms, Schools, and Societies. Princeton, NJ: Princeton University Press. Pager, D. (2003). The mark of a criminal record. American Journal of Sociology, 108(5):937–975. Perkinson, R. (2008). Texas Tough: The Rise of America’s Prison Empire. New York: Metropolitan Books/Henry Holt. Phelps, M. S. (2017). Mass probation: Toward a more robust theory of state variation in punishment. Punishment & Society, 19(1):53–73. Reel Rock 12: Break on Through (2017). Sender Films. https://senderfilms.com/ productions/details/2525/Reel-Rock-12 Reiter, K. (2016). 23/7: Pelican Bay Prison and the Rise of Long-Term Solitary Confinement. New Haven, CT: Yale University Press. Reiter, K. A. (2012a). The most restrictive alternative: A litigation history of solitary confinement in U.S. prisons, 1960–2006. Studies in Law, Politics, and Society, 57:71–124. Reiter, K. A. (2012b). Parole, snitch, or die: California’s supermax prisons and prisoners, 1997–2007. Punishment & Society, 14(5):530–563. Rios, V. M. (2011). Punished: Policing the Lives of Black and Latino Boys. New York: New York University Press. Rock Climb Every Day (2020). Glossary of climbing terms. https://rockclimbeveryday. com/glossary-of-climbing-terms/ Rogers, E. M. (2003). Diffusion of Innovations. New York: Free Press. Rubin, A. T. (2015a). A neo-institutional account of prison diffusion. Law & Society Review, 49(2):365–399. Rubin, A. T. (2015b). Resistance or friction: Understanding the significance of secondary adjustments. Theoretical Criminology, 19(1):23–42. Rubin, A. T. (2016). Penal change as penal layering: A case study of proto-prison adoption and capital punishment reduction, 1785–1822. Punishment & Society, 18(4):420–441. Rubin, A. T. (2017a). Professionalizing prison: Primitive professionalization and the administrative defense of Eastern State Penitentiary, 1829–1879. Law & Social Inquiry, 43(1):182–211. Rubin, A. T. (2017b). Resistance as agency? Incorporating the structural determinants of prisoner behaviour. British Journal of Criminology, 57(3):644–663. Rubin, A. T. (2019). Punishment’s legal templates: A theory of formal penal change. Law & Society Review, 53(2):518–553. Rubin, A. T. (2021). The Deviant Prison: Philadelphia’s Eastern State Penitentiary and the Origins of America’s Modern Penal System, 1829–1913. New York: Cambridge University Press.

276Bibliography

Rubin, D., Aldering, G., Barbary, K., Boone, K., Chappell, G., Currie, M., Deustua, S., Fagrelius, P., Fruchter, A., Hayden, B., Lidman, C., Nordin, J., Perlmutter, S., Saunders, C., and Sofiatti, C. (2015). UNITY: Confronting supernova cosmology’s statistical and systematic uncertainties in a unified Bayesian framework. Astrophysical Journal, 813:137. Rudes, D. (2008). Social Control in an Age of Organizational Change: The Construction, Negotiation and Contestation of Policy Reform in a Parole Agency. PhD dissertation, University of California–Irvine. Saldaña, J. (2012). The Coding Manual for Qualitative Researchers. London: Sage Publications. Schoenfeld, H. A. (2018). Building the Prison State: Race and the Politics of Mass Incarceration. Chicago: University of Chicago Press. Seawright, J., and Gerring, J. (2008). Case selection techniques in case study research: A menu of qualitative and quantitative options. Political Research Quarterly, 61(2):294–308. Seeds, C. (2017). Bifurcation nation: American penal policy in late mass incarceration. Punishment & Society, 19(5):590–610. Selznick, P. (1949). TVA and the Grass Roots. Berkeley: University of California Press. Selznick, P. (1957). Leadership in Administration: A Sociological Interpretation. New York: Harper & Row. Shadish, W., Cook, T. D., and Campbell, D. T. (2002). Experimental and Quasi- Experimental Designs for Generalized Causal Inference. Boston: Houghton Mifflin. Shearing, C. D., and Stenning, P. C. (1984). From the panopticon to Disney World: The development of discipline. In A. N. Doob and E. L. Greenspan, editors, Perspectives in Criminal Law: Essays in Honour of John LL.J. Edwards. Canada Law Book Inc. Simon, J. (2014). Mass Incarceration on Trial. New York: The New Press. Singal, J. (2015, June 18). The internet accused Alice Goffman of faking details in her study of a Black neighborhood. I went to Philadelphia to check. New York Magazine. http://nymag.com/scienceofus/2015/06/i-fact-checked-alice-goffman-with-her- subjects.html Small, M. L. (2009). “How many cases do I need?”: On science and the logic of case selection in field-based research. Ethnography, 10(1):5–38. Smith, L. T. (2013). Decolonizing Methodologies: Research and Indigenous Peoples. London: Zed Books Ltd. Snow, D., and Anderson, L. (1993). Down on Their Luck. Berkeley: University of California Press. Stuart, F. (2016). Down, Out, and Under Arrest: Policing and Everyday Life in Skid Row. Chicago: University of Chicago Press. Sutton, J. R. (1988). Stubborn Children: Controlling Delinquency in the United States, 1640–1981. Berkeley: University of California Press. Sutton, J. R. (1990). Bureaucrats and entrepreneurs: Institutional responses to deviant children in the United States, 1890–1920s. American Journal of Sociology, 95(6):1367–1400.

Bibliography277

Sykes, G. M. (1956). The corruption of authority and rehabilitation. Social Forces, 34(3):257–262. Szalavitz, M. (2016, March 1). Autism—It’s different in girls. Scientific American (March 1). https://www.scientificamerican.com/article/autism-it-s-different-in-girls/ Thelen, K. (2003). How institutions evolve: Insights from comparative historical analysis. In J. Mahoney and D. Rueschemeyer, editors, Comparative Historical Analysis in the Social Sciences (pp. 208–240). Cambridge, UK: Cambridge University Press. Thompson, H. A. (2016). Blood in the Water: The Attica Uprising of 1971 and Its Legacy. New York: Pantheon Books. Tolbert, P. S., and Zucker, L. G. (1983). Institutional sources of change in the formal structure of organizations: The diffusion of civil service reform, 1880–1935. Administrative Science Quarterly, 28(1):22–39. Tuck, E., and Yang, K. W. (2012). Decolonization is not a metaphor. Decolonization: Indigeneity, Education & Society, 1(1):1–40. Tuck, E., and Yang, K. W. (2014). R-words: Refusing research. In D. Paris and M. T. Winn, editors, Humanizing Research: Decolonizing Qualitative Inquiry with Youth and Communities (pp. 223–247). Thousand Oaks, CA: Sage Publications. Turner, S. F., Davis, L. M., Fain, T., Braithwaite, H., Lavery, T., Choinski, W., and Camp, G. (2015). A national picture of prison downsizing strategies. Victims & Offenders, 10(4):401–419. Ugelvik, T. (2011). The hidden food: Mealtime resistance and identity work in a Norwegian prison. Punishment & Society, 13(1):47–63. Ulmer, J. T., and Kramer, J. H. (1996). Court communities under sentencing guidelines: Dilemmas of formal rationality and sentencing disparity. Criminology, 34(3):383–408. Urban Dictionary (2020). Dirtbag. https://www.urbandictionary.com/ (last accessed: November 6, 2020). Vocabulary.com (2020). Foothold. https://www.vocabulary.com/dictionary/foothold (last accessed: November 4, 2020). Wikipedia (2020). Glossary of climbing terms. https://en.wikipedia.org/wiki/Glossary_ of_climbing_terms (last accessed: November 4, 2020). Wiktionary (2020). Dirtbag. https://en.wiktionary.org/wiki/Wiktionary:Main_Page (last accessed: November 4, 2020). Willis, J. J., Mastrofski, S. D., and Weisburd, D. (2007). Making sense of COMPSTAT: A theory-based analysis of organizational change in three police departments. Law & Society Review, 41(1):147–188. Wilson, S. M., and Gudmundsdottir, S. (1987). What is this a case of?: Exploring some conceptual issues in case study research. Education and Urban Society, 20(1):42–54. Young, K. M. (2014). Everyone knows the game: Legal consciousness in the Hawaiian cockfight. Law & Society Review, 48(3):499–530. Zimring, F., Kamin, S., and Hawkins, G. (2001). Punishment and Democracy: Three Strikes and You’re Out in California. New York: Oxford University Press. Zuberi, T., and Bonilla-Silva, E. (2008). White Logic, White Methods: Racism and Methodology. Lanham, MD: Rowman & Littlefield.

This page intentionally left blank

Index

academic training, research questions influenced by, 44 advocacy, 34 aid climbing, 13, 179 alternative explanations, 157, 228–29 American Philosophical Society (APS), 153–55, 164–65 American Society of Criminology, 234 analytic memos, 202–6 anchors, 59–60 Anderson, Leon, 28, 33–34, 42–43, 50, 87– 88, 128. See also Lofland, John Angola (prison). See Louisiana State Penitentiary APS. See American Philosophical Society asshole questioners, 77, 81–83, 241 ATLAS.ti, 195 Auburn State Prison, 88 Auburn System, 32, 115, 117–18, 248 audience for research: comprehensive exam reading lists as vehicle for knowing, 82; knowing and speaking to, 60– 62, 71–74, 82–83, 90–91 Author Meets Critic (AMC) panels, 65 autocoding, 195 Bacher, John, 236 Bayesian statistics, 215, 265n3(Ch.10) Becker, Howard, 257n2(Acknowledgments) Beckett, Katherine, 30–31, 57, 96, 210 Beckey, Fred, 6

belaying, 59–60 Bennett, Andrew. See George, Alexander bias. See selection bias Biographie, France, 35 blinded analyses, 266n4 Blogger, 196 blogs. See online blogs Boice, R., Professors as Writers, 253–55 bouldering, 14 Brady, Henry, 21, 28–29, 234–35; Rethinking Social Inquiry (with David Collier), 28, 234 Brown, Michelle, The Culture of Punishment, 212 Burawoy, Michael, 27, 258n6(Ch.2) Bush, George W., 29 Calavita, Kitty, 131, 160–61 Caldwell, Tommy, 3, 8–9, 35–36, 39–40, 89, 181, 254 CAQDAS (computer-assisted qualitative data analysis software), 195, 207 Carrabine, Eamonn, 224 cases: before-and-after, 134–37; convenient, 127–29; defined, 86; deviant, 123; diverse, 130–31; extreme, 122; influential, 123–25; interesting, 129–30; most- different, 131–32; most-similar, 132–34; prominent, 125–26; randomly selected, 126–27; typical/representative, 120–22, 130–31 279

280Index

case selection, 108–37; approaches to, 109– 12; climbing analogy for, 108–9; justification of/fielding questions about, 109–10, 114; multi-case studies, 130–37; number of cases as issue in, 118–19; sampling compared to, 87–89; single- case studies, 120–30; strategies for, 118–19; from typology of cases, 116–18; universe of possible cases for, 112–20 case studies: defining, 86–87; multi-case, 130–37; sample sizes of, 18–22; single- case, 120–30; types of, 70 causal inference/claims: challenge of, 210–11; climbing analogy for, 210, 229– 30; correlation in relation to, 215–25; counterfactuals and, 225–29; defending, 209–10; necessary and sufficient logic and, 219–25; process tracing and, 213–14; in qualitative work, 209–11, 237; in quantitative work, 237; strength of, 211–14, 246–48; typologies and, 215–25 causal process observations (CPOs), 21 cause and effect (chicken and egg) problem, 96 Cerro Fitz Roy (mountain), 84 chain-referral sampling, 140–41 Civil War, 97, 99 Clemmer, Donald, 32 climbing: accomplishments in, 35, 40; as analogy for research, 1–3, 60–62, 84–85, 108–9, 138–39, 179, 181–82, 210, 229–30; defining, 13–14; gear for, 119– 20; gender in, 254; generalizability of knowledge in, 24–25; mainstreaming of, 6; misconceptions about, 19; as model for researchers, 7–9; personal style in, 1–2, 139, 172; project lists for, 35; route-planning in, 84–85; shoes for, 254; “so what?” question about, 77–78; styles of, 2–3, 13–14, 119–20. See also aid climbing; deep-water free solo climbing; dirtbagging approach to social science; dynamic climbing; free climbing; free solo climbing; ice

climbing; off-width crack climbing; speed climbing; sport climbing; static climbing closed coding, 190 codebooks, 188–89 coding: closed/focused, 190; consistency in, 194–95; content analysis as, 184; defined, 186; example of, 190–93; intuitive nature of, 184–86; iterative nature of, 189–90, 194; open, 52, 189; practices of, 186–89; process of, 189–90, 193–95; technologies for, 195–201 coding families, 187–88 collaborative research projects, 17 Collier, David, 21, 234–35; Rethinking Social Inquiry (with Henry Brady), 28, 234 comparative case studies. See multi-case studies comprehensive exam reading lists, 82 comprehensive sampling, 141 confounding, 95–100 content analysis, 184, 195. See also coding; data analysis Contreras, Randol, 23, 73–74, 237–38, 248–49 control group, 226 convenience sampling, 141, 248–50 correlations, 96, 132–34, 214–25 counterfactuals, 157, 225–29 Criminology (journal), 234 criminology, qualitative methods in, 233–34 critical theory, 68, 169, 170 Croft, Peter, 236 data: accusations of dishonesty concerning, 235, 237–38, 240–4 1; criteria for sufficiency of, 147–53; multiple sources of, 156–61; research question in relation to, 38–39, 45; tainting of, 147 data analysis, 179–207; anxiety associated with, 181, 207; climbing analogy for, 179, 181–82; coding aspect of, 184–201; data collection in relation to, 15, 17,

Index281

182–83; intuitive nature of, 184–86; memos for, 202–7; practice of, 183–84; storage of information for, 207 data collection, 164–78; data analysis in relation to, 15, 17, 182–83; fielding questions about, 174; fieldnotes and, 172–78; fieldwork and, 164–68; from multiple sources, 156–61; reflexivity and, 168–72; storage of, 207; techniques of, 173–74; typical methods of, 15–17. See also sampling dataset observations (DSOs), 21 Dawn Wall, El Capitan, Yosemite National Park, 8, 36, 39–40, 89, 181 decolonializing theory, 169 deductive approaches, 5, 37, 55 deep-water free solo climbing, 14 dependent variables, sampling on, 122, 241, 245–48 diary method, 16 diffusion research, 124, 216 DiGiulian, Sasha, 254 dirtbagging approach to social science: case selection in, 111, 114, 119, 153; causal inference/claims in, 210; characteristics of, 5–6, 39, 42, 103, 105–6; countercultural character of, 5–6, 39, 251–52; data analysis/coding in, 189–90, 193; data collection in, 56; description of, 4, 5, 9; ethnography as instance of, 110; fielding questions about, 41; positive, supportive nature of, 8–9; rationale for, 78, 253–55; research design in, 85, 101–6; research questions and, 50; research questions in, 38–40, 48; sampling in, 139, 141–42; summation of, 252–53; tackling difficulties in, 7–9, 40, 62; theory’s role in, 63 dirtbags, defined, 1, 6 Dirtbag: The Legend of Fred Beckey (documentary), 6 discovery, in research, 105 disenfranchised groups. See vulnerable groups, research on Du Bois, W. E. B., 255

dynamic climbing, 2–3 early adopters, 33, 124, 216–18 Eastern State Penitentiary: appropriateness of qualitative research for, 29; author’s research on, 21, 56–57; coding of, 185– 86, 190–93; confounding factors in research on, 96–99; counterfactuals in study of, 227–28; defending conclusions/argument of study of, 208–9, 212–13, 247–48; fieldnotes taken on, 176–77; framing of research on, 72– 73; generalizability of research on, 23–24; memos recorded for, 203–5; process tracing applied to, 26–27, 31– 32; research question for study of, 42; sampling in study of, 245–46; selection bias in research on, 153–54; selection of, as research focus, 88, 117–18, 162–63, 261n8; sufficiency of data in research on, 149–52; triangulation of data in, 159–60; typology used in study of, 33 Eckstein, Harry, 70 econometricians, 214 Edelman, Lauren, 39, 47 El Capitan, Yosemite National Park, 13, 35, 179, 254; Dawn Wall, 8, 36, 39–40, 89, 181; The Nose, 179, 254 Elwell, Elizabeth Velora, 154–55 Emerson, Robert, 166, 184 empiricism: defined, 18; puzzles involving, 50, 72–74 endnotes. See footnotes/endnotes ethics, 11–12, 55, 152–53, 238, 267n10. See also research ethics boards ethnography: case selection in, 110; confounding effects in, 100; data collection in, 19; defined, 15; as dirtbagging social science, 110; fieldnotes in, 172– 78; fieldwork in, 15, 165–66; framing of research in, 73; pilot studies and, 146– 47; reflexivity in, 168–72, 174; research questions in, 42, 44; sampling in, 142; skepticism about, 235; sufficiency of data in, 148–53

282Index

Everest (mountain), 77–78, 84 Excel software, 198, 199, 206 executions, of inmates, 30 exogenous shocks, 134 external validity, 22, 95 falsifiability, 68 Feeley, Malcolm, 69 feminist theory, 169–7 1 Feynman, Richard, 235 fieldnotes, 172–78, 202 fieldwork: access to, 166–67; author’s experiences of, 164–65, 175–76; data collection as, 168; defining, 15, 165–68; discomfort/danger associated with, 167–68; locations of, 165–66; notes taken during, 172–78; reflexivity in, 168–72 Fitz Traverse, Patagonia, 35 focused coding, 190 focus groups, 15–16 footholds, 108–9 footnotes/endnotes: for coding, 196–97; familiarity with literature from reading, 61 Foucault, Michel, 260n1 Foulke, William P., 154 free climbing, 13, 60, 179 Freedman, David A., 28 Free Solo (documentary), 59 free solo climbing, 13, 19, 59–60, 143–44, 236, 260n1 frequentist statistics, 215, 265n3(Ch.10) Garland, David, 67; Culture of Control, 61 Geertz, Clifford, 126, 148 generalizability: case selection and, 91–94, 109, 119, 121, 128; of qualitative studies, 22–24, 239; of theory, 67–68 George, Alexander, and Andrew Bennett, Case Studies and Theory Development, 45–46, 70, 119, 129 Gerring, John, 118–19, 122, 123, 125–27, 129– 30, 162 Gibson-Light, Michael, 30

Goffman, Alice, 28, 238, 242, 248–49 Goffman, Erving, Stigma, 61 Goodman, Phil, 145, 244 Google Forms, 198, 200–201 Gore, Al, 28–29 Grounded Theory, 184, 187, 207, 264n2(Ch.9) Half Dome, Yosemite National Park, 13, 35, 84, 236, 260n1 hangdogging, 102 Hartman, Saidiya, 155–56 Hawthorne effect, 100 Hayes, Margo, 35–36, 172, 254 Herbert, Steve, 30–31, 57, 96, 210 Hill, Lynn, 179, 254 historical studies: confounding effects in, 101; data collection in, 19; research questions in, 42; value of/interest in, 66, 79 history, as confounding factor, 97 homelessness, 28, 30–31, 34, 42–43, 50, 87– 88, 128, 210 Honnold, Alex, 19, 35–36, 59, 60, 85, 236, 260n1 hypothesis testing. See theory testing ice climbing, 14 independent variables, 245 in-depth/intensive interviewing, 15–16 inductive approaches, 5, 37–38, 45 insider research, 263n1(Ch.8) institutional review boards (IRB), 153, 238, 263n4, 265n8, 267n10 instrumentation, as confounding factor, 98–99 intercoder reliability, 195 internal validity, 95–101, 195 international research, 109–10 interpretation of research, 208–30; anxieties about, 208–9, 230; climbing analogy for, 229–30; correlation and, 215–25; counterfactuals and, 225–29; fielding questions about, 228–29; necessary and sufficient logic and, 219–25;

Index283

strength of, 211–14; typologies and, 215–25 interview-based studies: causal inference and, 31; coding of, 193–94; ethics of, 152–53; fieldnotes for, 172–73; overview of, 15; research design for, 147; researcher effects on data collection in, 98–99, 169; sampling considerations for, 18–19, 140–41, 147–53, 242–44; types of, 15–16 IRBs. See institutional review boards Jackson, Albert Green, 154–55 Jenness, Valerie, 131, 160–61 Jorgeson, Kevin, 8, 36, 40, 89, 181 journalism, 61 Keohane, Robert, 234 King, Gary, 234, 242 Klaas, Polly, 135–36 Koehler, Johann, 233–34 K2 (mountain), 84 late adopters, 33, 124, 216–18 Lee, Jooyoung, 23, 29, 75, 245–49 Lijphart, Arend, 70 literature. See research literature Lofland, John, David Snow, Leon Anderson, and Lyn H. Lofland, Analyzing Social Settings, 44–45, 110 logic, 215, 220 Louisiana State Penitentiary (Angola), 127 Lubet, Steven, 235 Luker, Kristin, 24, 41–44, 228; Salsa Dancing into the Social Sciences, x– xi, 255, 257n2(Acknowledgments), 257n2(Ch.1) Lynch, Mona, 23, 30 Mallory, George Leigh, 77–78 The Matrix (film), 69 maturation, as confounding factor, 98 McLennan, Rebecca, 217 memos, 202–7 Meranze, Michael, 217

Merry, Sally, 53 methodology, notes about, 206 Mill, John Stuart, 131–32 Mill’s Method of Agreement, 131–32 Mill’s Method of Difference, 132–34 mixed (quantitative and qualitative) methods, 259n8 modern prisons, 33, 88, 112, 115, 117, 124, 215–19 Moonlight Buttress, Zion National Park, 35, 84 Morrill, Cal, 79 mountaineering, 14 Mount Watkins, Yosemite National Park, 13, 35 multi-case studies, 130–37 National Science Foundation (NSF), 18, 55 necessary vs. sufficient logic, 219–25 neo-institutional theory, 124, 216 The Nose, El Capitan, Yosemite National Park, 179, 254 novelty, in research, 4, 67, 71, 83, 89–90 NSF. See National Science Foundation NVivo, 195 Obama, Barack, 40 Occupy Wall Street, 54 off-width crack climbing, 14 Ondra, Adam, 89 online blogs, 196–97, 207 open coding, 52, 189 Pager, Devah, 64–65 participant observation, 15 participant-researchers, 17 Pennsylvania System, 21, 26–27, 32, 115, 117–18, 208–9, 224, 247–48 people, recording information about, 206 photovoice, 16–17 pilot studies, 146–47 policy-relevant research, 63–66, 78–79. See also Theory-Policy Matrix political science research: and causal inference, 25, 213, 228, 229, 234; data in, 21;

284Index

and ethnography, 261n2; qualitative methods in, x, 214, 234–35; research questions in, 45; theory generation and testing in, 111 Popper, Karl, 68 popular books, 61 population, for sampling, 140, 142, 144 positionality, 169 positivism, 169–70 postmodern theory, 68, 169, 171 power: accuracy of data influenced by, 170–7 1; availability of data influenced by, 155–56 pre-registered reports, 266n4 prisoner resistance, 48–49, 56–57, 93–94, 153 probes, 157–58 process tracing, 25–27, 31–32, 213–14, 234, 265n2 proto-prisons, 33, 74, 76, 88, 117, 217–18 publication: amount of data geared toward venue for, 243; challenges of, 60; hierarchy in, 267n9; non-significant findings and, 264n1 purposive sampling, 141 Quakers (Society of Friends), 26–27, 224– 25, 247 qualitative methods: accusations of dishonesty concerning, 235, 237–38, 240– 41; appropriate uses of/value of, 29–34; case selection for, 109–10; causal inference in, 209–11; characteristics of, 17– 29; confounding in, 96–100; defining, 14–15, 17; disciplinary uses of, 233–35; empirical nature of, 18; generalizability of, 22–24, 239; mainstream (science- inspired), 3–4, 63, 102–3; misconceptions about, 19, 22, 25, 162, 239, 241–42; mixture of quantitative and, 259n8; non-linear/recursive/iterative nature of, 102–6, 161–63; open-mindedness of, 1–5, 43, 106, 146, 251; overview of,

13–34; quantitative methods compared to, 17–29, 58, 109, 233, 236–37, 266n1; quantitative studies based on, 17; replication as issue with, 238–40; sampling as issue in, 18–22, 241–50; skepticism/ criticism concerning, 41, 44, 86, 99, 100, 136, 214, 231–50; stress associated with, 179–81, 207; theory generating and testing in, 25–27; typical, 15–17; variables in, 115, 245–48; words vs. numbers in, 27–29, 231–32, 237 quantitative methods: case selection in, 109; confounding in, 95; content analysis in, 195; fabrication of data in, 237; generalizability of, 22–24; mainstream (science-inspired) character of, 3–4, 63, 102–3; misconceptions about, 22, 233; mixture of qualitative and, 259n8; qualitative methods as basis for, 17; qualitative methods compared to, 17– 29, 58, 109, 233, 236–37, 266n1; research questions and, 55; sample size as characteristic of, 19–22; sampling on the dependent variable in, 246; theory generating and testing in, 25, 27; words vs. numbers in, 27–29 race, categorization of, 156, 171 racism, 213–14 La Rambla, Spain, 35, 172 random sampling, 22–23, 89, 126–27, 140, 260n3(Ch.5) REBs. See research ethics boards reference memos, 206–7 reflexivity, in conduct of research, 168–72, 174 regression discontinuity design, 226–27 regressions (statistics), 17, 19–21, 28–29, 123–24, 179–80, 215, 218, 226, 229, 232, 237, 240, 246 Reiter, Keramet, 65, 234 research design: background on, 84–107; climbing analogy for, 84–85; defined,

Index285

86; evaluation of, 89–94; fielding questions about, 86–87, 89, 95, 107, 161–62; idealized/conventional conception of, 101–4, 161–62; novelty in, 89–90; open- minded approach to, 101–6; research question in relation to, 51, 103, 105; sampling as basic feature of, 86–87 researchers: anxieties of, 179–81, 207, 208–9, 230, 255–56; attraction of their fields/topics for, 80–81; diversity of, 255–56; personal qualities of, 3; reactions of, to data and settings, 174; reflexivity of, 168–72 research ethics boards (REBs), 153, 238, 263n4, 265n8, 267n10 research interests, 40–41, 44 research literature: defined, 67; looking for work similar to one’s own in, 76; narrow vs. wide-ranging, 67; for researching universe of cases, 115–16; research questions derived from, 39, 45 research projects: absence of research questions in, 55–58, 259n2; climbing analogy for, 60–62; connecting theory to, 71–83; connection of, to scholarly discipline(s), 81–83; fielding questions about, 62–63, 74–83; framing of, 60–62, 71–74, 81–83; justification of, 60–83, 111–12; prospective lists of, 36; researchers’ fundamental motivations for, 80– 81; “so what?” questions about, 77–80; strategies for connecting theory to, 74–83; theory-policy matrix and, 63–71; theory’s role in, 66–71; timing of, 66– 67; timing of justification of, 62–63 research questions, 35–58; data in relation to, 38–39, 45; defining (broad vs. narrow), 40–46; mid-research changes in, 51–54; mix-and-match approaches to, 46–49; prominent cases in, 125–26; puzzle-solving approach to, 49–51; research design in relation to, 51, 103, 105; research interests vs., 40–41, 44–

45; research project’s lack of, 55–58, 259n2; role of, 36–40; selection of, 46– 51; theory in, 47–48; topics in, 46–48 The Right Way, critiques of, 2–4, 58, 191, 253–56 Rodden, Beth, 254 sampling, 139–63; case selection compared to, 87–89; climbing analogy for, 138– 39; comprehensive, 141; convenience, 141, 248–50; criteria for sufficiency of, 147–53, 242–4 4 (see also size of samples); on the dependent variable, 122, 241, 245–48; fielding questions about, 141–43, 157; limitations of, 143– 46, 155–56, 219; overview of, 86–87; population available for, 140, 142, 144; purposive, 141; random, 22–23, 89, 126–27, 140, 260n3(Ch.5); representative, 144–46; rules concerning, 139; size of samples, 18–22 (see also criteria for sufficiency of); snowball, 140–4 1, 159; strategies for, 139–4 7; stratified random, 140; systematic, 143. See also data collection saturation, 150–53, 244 science, as model for social science, 3–4, 63, 102–3 Seawright, Jason, 21, 118–19, 122, 123, 125– 27, 129–30, 162 selection bias, 153–63; effect of, on worth of studies, 91–94; fielding questions about, 157; hedges against, 156–61, 168–72; historical factors in, 153–55; research design and, 91–94; researcher awareness of, 89; societal power dynamics as factor in, 155–56 semi-structured interviews, 15–16 sexy research, 63–66, 71 Sharma, Chris, 2–3 Shiraishi, Ashima, 254 single-case studies, 120–30 sites, defined, 86

286Index

Small, Mario, 91, 235 Snow, David, 28, 33–34, 42–43, 50, 87–88, 128. See also Lofland, John snowball sampling, 140–41, 159 sociology, qualitative methods in, 235 solitary confinement, 21, 23, 27, 32, 42, 65, 115, 208–9, 224 spatial exclusion ordinances, 30–31, 57, 210 speed climbing, 13 sport climbing, 14, 102 static climbing, 2–3 statistical matching, 226 statistics, 215, 237, 265n3(Ch.10) stratified random sampling, 140 structural racism, 213–14 structured interviews, 15, 258n2 Stuart, Forrest, 53 surveys, open-ended vs. closed-ended, 258n2 swearing, 7 Sykes, Gresham, 32–33, 210 tables. See typologies taxonomies, 32–34 testing, as confounding factor, 100 text-based studies, 16 theoretical insights, 23–24 theory: defining, 67–69; falsifiability as criterion of, 68; framing research in terms of, 71–83; perspectival (lens- based) types of, 68–69; prominent cases for, 125–26; publication value of, 66; puzzles involving, 49–50; research questions and, 47; thick description in relation to, 149; uses of, 69–7 1 theory generation: case selection and, 111; multi-case studies and, 130–31; in qualitative methods, 25–27; in quantitative methods, 25, 27

Theory-Policy Matrix, 63–7 1, 129 theory testing: case selection and, 111; case selection for, 118, 129–30; multi-case studies and, 130–31; in qualitative methods, 25–27; research design and, 105; as use of theory, 69–7 1 thick description, 148–50 Thompson, Heather Ann, 54 three strikes law, 135–36 topo, 13 top roping, 60 trad. See free climbing training. See academic training treatment group, 226 triangulation, 101, 156–61 typologies, 32–34, 116–18, 215–25 undergraduates, research assistance from, 265n7 unstructured interviews, 16 validity. See external validity; internal validity variables: dependent, 122, 241, 245–48; independent, 245; in multi-case studies, 130; in qualitative work, 115, 245–48 Verba, Sidney, 234 vulnerable groups, research on, 55, 155–56, 263n4 Wethersfield State Prison, 88 within-case studies, 120, 134, 247 Word software, 196, 207 Yosemite National Park, 24–25, 251; Dawn Wall, 8, 36, 39–40, 89, 181; El Capitan, 13, 35; Half Dome, 13, 35, 84, 236, 260n1; Mount Watkins, 13, 35; The Nose, 179, 236 Young, Katie, 126