A collaborative historical research model in the age of big data

Five years ago the project Living with Machines was funded by UK Research and Innovation’s Strategic Priorities Fund – a scheme established to support ambitious projects that would not normally be funded by a single research council.

The project was designed to explore what was now possible in light of more than two decades of digitisation, thanks to huge advances in data science and artificial intelligence (AI).

Its aim was to be both a data-driven history project, and a data science project grounded in the values of human experience and expertise.

Specifically it sought to work with digitised nineteenth-century newspapers, census returns, and Ordnance Survey maps (amongst other collections) to understand the impact of the coming of the machine age on the lives of ordinary people.

To do this, it assembled a large and diverse team – 42 people over the project’s lifetime – including data scientists, historians, curators and library professionals, computational linguists, digital humanists, experts in visualisation, and urban geographers. As such the project was an experiment in a new research paradigm.

Very literally, this project was about historians living with machines: working with computers, with methods from machine learning, and with colleagues from computational backgrounds.

On the other side of the project

Now at the other side of the project, we are reflecting on how far we’ve come as a team and – more importantly – what we have learned about the process of collaboration. We of course are only two voices among many, but we both have a kind of birds-eye view of the project thanks to Ruth’s position as principal investigator – which means she was in most meetings that happened on the project – and Léllé’s role as project coordinator.

It’s very easy to get hung up on deliverables and outputs on a research project. We have plenty of those, ranging from more traditional articles and books, to datasets, methods papers, software and code, an exhibition and other events (view the achievements list on the Living with Machines website). An overview of some of the topics we’ve tackled in recent and forthcoming articles include: the representativeness of digitised collections of newspapers, and their impact on historical research; the trope of the living machine in Victorian books and newspapers; and the impact of the arrival of rail on communities (in our forthcoming book, ‘Living with Machines: Computational Histories of the Age of Industry’).

But for us, the journey we undertook to deliver our research was as important as the destination. Collaboration is more than just putting the right people in a room and hoping. It requires a proactive approach precisely because of the way that it brings together individuals from different research and professional cultures with different expectations about how to work, and how to disseminate research findings.

Giving everyone a voice

One of the things we’re most proud about is that Living with Machines was a collaboration that gave everyone voice and agency. It wasn’t a team delivering the vision of a principal investigator and bunch of co-investigators. We know we were very fortunate to be able to hire such huge talents as postdocs onto our project, and we responded to that talent and expertise by working with datasets we hadn’t considered working with, and designing methods that we had not dreamed of. Because of that we were able to use computer vision for distant viewing maps, games of ‘hide the word’ with language models, and methods for finding and linking people across census returns.

Similarly at the outset we were committed to work with research software engineers and research data scientists in the team in genuinely collaborative ways – rather than seeing them as technical service providers. Our work with all the members of the Turing’s Research Engineering Group and the research software engineers from Edinburgh and the Queen Mary University of London has clearly shown us the vital importance of the profession for the kinds of lateral thinking of cross-domain transfer of methods that leads to innovation. This is why we’ve been able to develop innovative software such as Deezymatch, alto2txt, MapReader, T-Res and so much more.

As this suggests, our successes and the nature of our outputs have been deeply rooted in the way that we went about working together. In order that this underpinning labour not be rendered invisible, we sought to share our experiences and recommendations in a short open access book, ‘Collaborative Historical Research in the Age of Big Data: Lessons from an Interdisciplinary Project’ (Cambridge University Press, 2023), co-written by Ruth, with Emma Griffin, Mia Ridge, and Giorgia Tolfo. The book shares lessons specifically on how to go about starting a new collaboration, and how to sustain a team over the duration of a project – as well as the practical issues about working with digitised collected within the current landscape of cultural heritage data in the UK, and the infrastructural requirements of large data-driven projects of this kind.

Series of short documentary videos

Behind the scenes, filming with Ruth Ahnert. Credit: the Living with Machines team

Another way we have sought to make visible the collaborative foundation of our project was via a series of short documentary videos, produced and directed by Léllé over the final months of the project.

The episodes of the docuseries aim to highlight the team’s experiences, research objectives, challenges that were encountered, and lessons learnt.

The 10-part series follows 22 interviewees, discussing:

Behind the scenes, filming with Guy Solomon. Credit: the Living with Machines team

collaboration (chapter one)
the digitisation process (chapter two)
the computational methods harvested and deployed (chapter three)
the research on bias and representativeness in historical newspaper collections (chapter four)
the experiments on the language of mechanisation (chapter five)
the development of tools for analysis and annotation of maps (chapter six)
research with census records (chapter seven)
linking heterogenous datasets (chapter eight)
crowdsourcing (chapter nine)
the Living with Machines Exhibition at the Leeds City Museum (chapter 10)

Bringing forward the personal voices of the team members, the series emphasises the necessity of adopting a human-centred, historically-informed, and ethical perspective in all AI and machine learning ventures.

Changing how we work

What we think comes across both from the book and the docuseries is that, for almost everyone in the team, the experience of this project has radically changed how we work. This is perhaps summed up best by one of our co-investigators, the historian Jon Lawrence, who says in our first episode of the series:

Certainly the highlight of the last five years, is that I’ve learned more in these years than probably the last 20 years, in terms of methods, because I’d been doing incremental developments in the things I knew, now I’ve been trying to grapple with a whole set of opportunities I didn’t even know existed.

By sharing our experience of actually ‘doing’ our work – the behind-the-scenes perspective you rarely get to see on projects – we hope that we might help people to see where our experiences can be useful to them, even if they work in very different fields.

Watch episode one of the Living with Machines series on YouTube.

Although the project has now ended, you can stay up to date with new outputs via our social media (X or Mastodon), by signing up to our newsletter or on our project website.

Top image: The Living with Machines team at the end of the project celebration. Credit: the Living with Machines team

Ruth Ahnert

Professor of Literary History and Digital Humanities, Queen Mary University of London

Professor Ruth Ahnert served as principal investigator on the Living with Machines project, based at The Alan Turing Institute.

Léllé Demertzi

Programme Coordinator in Data Science for Science and Humanities, The Alan Turing Institute

Léllé Demertzi is an artist and cultural producer. She works as Programme Coordinator at the Alan Turing Institute and the project Living with Machines.