Made by Me

Jason A. Heppler

Made by Me

Generative AI shows potential as a tool for synthesis and analysis in history, but we should remain skeptical about its utility and ethical implications.

While the moral, legal, environmental, and ethical questions of generative AI continue to swirl and likely will for some time, I think we’re at a point where the tool’s utility is becoming clearer. I remain deeply skeptical—perhaps even a curmudgeon—about some of big tech’s promises (it’s already proven itself incorrect to the point of being deadly), but I’m still trying to understand the issue to hold a well-informed opinion. This is an attempt to consider its application beyond what Big Tech marketers have attempted to convince us.

Let’s start with where they start. Generative AI¹ needs massive training sets that represent the kinds of things it’s asked to represent. Through the process of training, the AI learns the patterns in the data and can generate new data that fits within those patterns. It’s statistics all the way down. In the case of a Large Language Model (LLM) it’s always asking itself, “what’s the next most likely word to come after this previous word, and does that next word make sense within the context of the other words in the sentence?” The LLMs don’t necessarily understand a text as a text; that is, as a sequence of ideas unfolding logically but rather as a set of tokens that carry statistical weights. Generative AI attempts to compute all tokens in a sentence at the same time (what’s referred to as a transformer and a concept of self-attention). This is a simplification, but that’s the basic idea. OpenAI’s ChatGPT demonstrated the first real buzz about the novelty of generative text, followed now by Google’s LaMDA, Microsoft’s DialoGPT, and others. We don’t exactly know the datasets these LLMs are trained on, but we can assume a mixture of websites, digitized books, and other sources of text that may or may not be under copyright.

One key debate in the rise of generative AI has been its role in creating things. I’ve already reflected on how I think AI changes the relationship between time-to-create, creative work, and the production of history. That is, humans create things and I cannot accept the argument that generative AI is simply doing what humans do just on a different degree of scale and time. The act of creation is a human experience, and an underlying programming language that is able to generate text, images, or video is not the same thing as humans doing the same work. It’s a false equivalence to say a computer program has the same experience (or, even the same rights) as a human.

I cannot deny the usefulness of generative AI in my day-to-day work. GitHub’s Copilot is nothing short of remarkable. The number of times it’s able to guess what I’m about to type as I’m working in neovim (or VS Code) is wildly good. As an autocomplete it’s fantastic. I have to say the same with Copilot chat, which has helped me reason through some pretty tricky programming issues in the last few months. A key point, however, is I do not accept Copilot’s suggestions blindly. These tools don’t understand a project’s entire code base; don’t fully understand the needs, wishes, desires, or requirements of a project PI or the details of the historical analysis we’re trying to do; and oftentimes the suggestions aren’t always quite right, even if it’s very close. Which means there’s a degree of expertise, knowledge, and experience that one needs to bring to bear upon these tools.

Despite this usefulness, there’s also the sense that generative AI is indistinguishable from a bullshit generator. In matters of code, perhaps that’s less of an issue: either code works or it doesn’t, and debugging is part of your process when you’re writing software. But what about prose? The act of writing, as Annie Dillard notes, is a process of discovery. It’s a process of thinking through ideas, of trying to understand what you’re trying to say, of compiling notes and ideas and experiences to craft something original. It’s a process of trying to communicate with others. Generative AI can’t do that. It can only generate text that looks like it’s trying to communicate like a human.

And what of the idea that generative AI is working off the back of content made by humans without their expressed consent? I certainly understand the reaction to be angry or horrified or frustrated that a VC-funded startup is creating technologies off the back of other’s (or your) work. After all, this isn’t their work, it’s your work that you created for a specific purpose that did not include ingestion into training models.

Anthony Clark's "I made this" comic. — Anthony Clark’s “I made this” comic. Source

I’ve come to believe that it’s perhaps the wrong default to try and deprive LLMs of good quality content to learn from.² Certainly people have a right to block web crawlers from ingesting their content (as I’ve even done) but what about the considered and careful knowledge we create through books, articles, and chapters? What if LLMs trained on a body of millions of books can better synthesize texts or discover connections otherwise impossible for humans to do within a lifetime? I’ve spent the better part of a decade arguing that computational approaches to doing history open new avenues for us to analyze sources, narrate the past, find new connections for analysis, or work at new scales of space and time. In short, digital history generates new knowledge. What if LLMs can offer us the same?

Which leads me to ask: what does generative AI for history look like? What might it offer us as historians and educators? I can certainly envision some bad ideas: I don’t want to have a conversation with a Founding Father about the affairs of the present.³ I don’t need policymakers asking George Washington how he’d decide on public land disputes in 2024. There are plenty of ways this could go sideways. But what if we trained a local, private, open source LLM against transcriptions of our sources? And what if we can correlate that training data against secondary sources? What avenues might that open for us to think about connections among sources or find new sources of analysis?

Among the world of digital humanities, many have strenuously argued that the use of textual and visual material transformed for humanistic analysis, including in-copyright work, is fair use. For historical research, such an argument allows us to do computational work at a greater scale: analyzing newspaper sources, diplomatic papers, modeling texts, understanding homesteading, exploring the photographic record of the Farm Security Administration. How different is it that LLMs are also transforming works, just differently than we’ve explored in digital history? Or put another way, what’s the difference between tokenizing texts for topic models and tokenizing texts for generative AI?

Generative AI faces a reputation crisis. Ryan Broderick argued a couple of months ago that:

At this point, I’m confident saying that 75% of what generative-AI text and image platforms can do is useless at best and, at worst, actively harmful. Which means that if AI companies want to onboard the millions of people they need as customers to fund themselves and bring about the great AI revolution, they’ll have to perpetually outrun the millions of pathetic losers hoping to use this tech to make a quick buck. Which is something crypto has never been able to do.

Broderick isn’t wrong about the harm generative AI can and does introduce. I should probably reject generative AI in its entirety given the problems it has with regurgitating content without attribution; even more problematically (I have almost nothing positive to say about AI image, video, and voice generation) is its tendency to make images of leaders almost always white, or its racist depictions of people of color, or its depiction of women as elfish and skimpy; its high environmental costs in water and electricity (in a moment of human existence where climate change threatens our lives and livelihoods). We’ve had nearly 75 years of AI thinkers imagining that general purpose AI is right around the corner. It’s hard to not be dismissive—if not hostile—of this most recent hype wave.⁴

And yet.

We’re talking, very simply, of (shameless) software. Significant new software has often been given an askance glance: word processors were going to write novels for us; cameras on phones would replace artistic photography and videography; podcasts would displace books. Software is a tool that contains the ideas, biases, decisions, and experiences of its creators. Generative AI does this too, but it may also have a purpose in the way we do history; that is, we should create and train the tools that serve our methods and purposes.

I asked above what the distance was between a tokenized topic model and a tokenized generative AI. To me, it’s a matter of application. The current wave of high promises and godlike qualities of generative AI espoused by leaders in big tech are not appealing to me; as I said at the start, I’m still quite skeptical about much of this even as I try and grapple with what it means for us to do history in the age of generative AI. But I’m also trying to understand the utility of these tools in my work, and how they might offer new ways for us to think about the past. Can generative AI help me synthesize my own research notes? Can it find linkages or connections between my notes and to secondary sources unknown to me? Thinking of generative AI as a tool for writing (which I think makes for a very bad tool anyway) or just a phenomenal auto-complete I think misses the potential. I’m more interested in the ways generative AI can accurately synthesize material known to it, how it might suggest avenues through my research notes, or even how it may help build software for the purpose of computation, data visualization, and analysis.

To lightly rephrase a common description, generative AI could be a tool with thought—not a replacement for the hard work that goes into research, writing, and thinking, but a method by which we can do new things. Taking an example from my own research, how might I think more capaciously about environmental radicalism to understand themes and ideas not just in one collection of newsletters but among many more groups and sources than I could possibly read? Could generative AI both synthesize a collection of material, and pinpoint quotations or other lines of thought across a trove of primary sources? A generative AI that could help classify or categorize pieces of text for me and allow me to gather together, for example, all occurrences of primary sources discussing capitalism would be monumentally useful. I do this sort of categorization in my sources I read already, and the potential to expand on the kinds of methods I already deploy in my work is intriguing. I don’t want these tools thinking and crafting for me; but they could open new ideas for me that would be otherwise impossible to explore without the help of computation.

Furthermore, I’m also keenly interested in what Dan Cohen recently wrote: that one way to help get good-quality generative AI is not necessarily the continued ingestion of (primarily) web content but to instead rely on libraries.⁵ Cohen and his colleagues recently proposed an idea of a library consortium that develops high-quality training sets that would allow AI researchers to take advantage of the vast accumulation of knowledge held by libraries:

Library-held digital texts come from lawfully acquired books—an investment of billions of dollars, it should be noted, just like those big data centers—and libraries are innately respectful of the interests of authors and rightsholders by accounting for concerns about consent, credit, and compensation. Furthermore, they have a public-interest disposition that can take into account the particular social and ethical challenges of AI development. A library consortium could distinguish between the different needs and responsibilities of academic researchers, small market entrants, and large commercial actors.

If we don’t look to libraries to guide the training of AI on the profound content of books, we will see a reinforcement of the same oligopolies that rule today’s tech sector.

Almost two thousand words later, I’m not sure whether I’ve landed squarely on how I really feel about generative AI. I’m skeptical of how the tech industry has framed it (and implemented some of it), but I do think we can reframe generative AI as a tool for synthesis and analysis more akin to the kinds of things we’ve been doing in digital history and computational humanities for three decades. That is, a tool that is much more careful and considerate of the content it ingests and the content it generates.

Perhaps the guiding question isn’t “how can I stop this?” but “how can I shape this?”

Further Readings