How 21st Century Tech Can Shed Light On 19th Century Newspapers

Portraits of men from the 19th century

Sample portrait images from The Graphic newspaper.

The 19th century saw something of an explosion in periodicals. For example, the number of newspapers in Britain alone leapt from 550 in 1846 to more than 2,400 just 60 years later. For humanities scholars, tracking information in such a huge mass of publications poses a daunting challenge.

Digital humanities efforts have made some headway in creating tools that allow scholars to search across all of that text. But the challenge becomes significantly more complex when trying to make sense of the thousands of images also found in newspapers of the period.

This is where Paul Fyfe and Qian Ge come in. Fyfe is an associate professor of English at NC State, where Qian Ge is a Ph.D. student in electrical and computer engineering. Together, they have done some exploratory work on how computer vision might be used in the context of analyzing 19th century newspapers.

A paper describing their findings, “Image Analytics and the Nineteenth-Century Illustrated Newspaper,” was published in October in the Journal of Cultural Analytics.

We recently had the chance to pick Fyfe’s brain to learn more about what this unlikely duo learned while attempting to analyze more than 140,000 images from three period newspapers.

The Abstract: What research questions or challenges were you setting out to address with this project?

Paul Fyfe: Our first question was pretty simple: could you even use computer vision approaches to analyze these historical materials? As we discovered, many image analytics tools are intended for photographs. This makes sense, as digital images have proliferated. But the majority of our materials are not photographs. They are engravings made from carved lines and hatches. Can a computer recognize and sort this stuff? And if so, how?

Our next question had more to do with digital humanities research. Generally speaking, lots of “big data” approaches to historical materials have focused on text. But, as the historian Liz Lorang asks, “What can we do with the millions of images that represent the digitized cultural record?”

Our last question was about the 19th century when, arguably for the first time, millions of images were circulating thanks to illustrated newspapers. What kinds of visual patterns could we find in these illustrations? What things were people seeing? How did the relative amount of text and image on the newspaper page change over time?

TA: What was novel about your approach to the material in addressing these questions?

Fyfe: To date, most large-scale digital humanities research projects have focused on text. Our project is part of a wave of new research on visual materials at scale. It is also novel in trying to use computer vision techniques not simply on photographic images, but historical engravings.

Computational research is very novel in my own field of 19th century book and media history. We’re not trying to closely analyze a single illustration and its meaning, but studying thousands at a time. It’s a different approach to understanding how texts, newspapers and images work on a larger scale than a single reader. Not that it’s a better approach, of course, but perhaps a complimentary approach to what humanities researchers are already good at doing.

Finally, it was a novel collaboration between the Departments of English and Electrical and Computer Engineering. Largely facilitated by the activities of NC State’s Visual Narrative cluster.

TA: What do you think are the key findings that stem from the work?

Fyfe: We found a lot of clusters of related images that haven’t been studied before as a group. For example, loads of illustrations of crimes at night; dozens of international maps published by a newspaper known for its art reproductions; thousands of portraits formatted in the same way. These clusters make us think about the patterns of visual knowledge as the press became a multimedia experience.

We also found patterns in how illustrated newspapers presented images over time. For instance, the increasing number of halftone photographs relative to wood engravings; the gradual blending of text and images on a single page, as opposed to pages strictly containing one or the other.

Finally, it’s important to note what we couldn’t find. Our techniques, at least, were not able to identify the contents of images. We had to sort them by low-level measurements of their pixels instead of by subject matter. Additionally, while we worked with thousands of illustrations, those images still offer only a fairly narrow representation of the 19th century’s visual culture. Even working at a large scale, we know we have to qualify what – and more importantly who – our data set represents or excludes.

TA: How could these findings inform future work in the field?

Fyfe: I think it shows other humanities scholars the potential of looking at much larger collections of materials than we’re used to. People might have to broaden the scale of their arguments about what a historical illustration means or how it functions. And I hope it encourages others to try out more computer vision approaches to these materials, or at least to pursue new kinds of interdisciplinary collaboration.

  1. Peter Dowling says:

    Dear Matt (and by extension, Paul Fyfe),

    I am very interested in Paul’s research re illustrated newspapers in terms of large scale quantitative analysis of the imagery in the papers and at a closer level his in-depth analysis of imagery of the ‘accident’ in the papers.

    In the 1990s I did a PhD at Monash University, Melbourne, researching the image subject matter range in colonial Australian illustrated newspapers (publishing period, 1853-96) using a quantitative analysis methodology at a manual level. In essence, I recorded the titles of all images in every (monthly) issue of the papers for every even-numbered year, to then classify the approximate 7,000 images across 27 subject headings such as Panoramas, Buildings, Exhibitions, Commerce, Maritime Transport/Communications, which were then bracketed under Progress Culture; Civic Occasions, Leisure, Portraits, bracketed under Civic Culture; and Landscapes, Rural, Mining, Natural History, Natural Disasters, under the Frontier bracket. This was followed by count-ups to establish percentages for each subject heading according to three time periods. In writing up the PhD, I related the established image subject matter range of the papers to the broader general history of colonial Australia in the second-half nineteenth century. I also established some themes re the nature and constraints upon the production of pictorial news.

    Towards the end of my research, I realized that my approach had given me the rudiments of an index to imagery in the papers. I applied for post-doc fellowships to establish a full index by going back over the odd-numbered years using my already established methodology and then putting it all together. Wasn’t successful in applications, and in the end decided to do it myself in spare time. The result was a two-volume index of approx 1,500 pages, indexing an estimated 12,500 images by subject (using the same above-mentioned subject headings, place (firstly by colony, then by region) and finally date, this becoming the first volume. In the process, I recorded all available information re creators (illustrators, photographers, wood-engravers) to establish the second volume as an index to creators. This was published privately as P. Dowling, Index to Imagery in Colonial Australian Illustrated Newspapers (2012), as a print run of 50 copies. Have sold about 30 to Institutional and University libraries. There’s a volume in the British Library, which would be the closest one to North Carolina.

    Since then, I’ve had a mental break from it all until this year, in part due to frustration at the lack of interest in the papers here in Australia, as to all intents and purposes I am the only scholar of the papers in the country.

    My ultimate ambition is still to publish the PhD, only now, courtesy of much on-going thought about the papers and in the light of the completed index, in a substantially revised form, beginning hopefully next year.

    In the interim, Paul, would you be interested in reading a couple of my articles and other material produced re the Index? If so please send me your email address.

    Looking forward to your response,


