Muninn WWI

Research

Muninn's research is currently composed of four main research initiatives.

  1. Computer Science: Data extraction, structuring and statistical quality control
  2. Bio-Medical: Modeling infectious disease through the examination of WWI medical records
  3. History: Creating data sets for social history and institutional reconstitution
  4. Literature: Examining the use of language in the documents

For an idea of how these research elements fit together, please see this flow chart which illustrates the flow of data from the original archives, through the computer science systems and through to our humanities and social science research teams.


Computer Science

The computer science component of this project will focus on the extraction of information from our source documents, and the organisation of that information into large databases for the other researchers. More technical details on our proposal can be found here.

The computer science team is being led by Rob Warren, who is in the process of helping to secure the archival sources and processing power which we need to undertake the project.

[top]

Bio-Medical Statistics and Epidemiology

First World War personnel records are full of information about the health of soldier, both from injuries sustained in combat and from other causes, among them infectious diseases. Indeed, this period is vitally important for the study of modern medical history, as the late period of WWI coincided with the advent of the Spanish Flu, one of the twentieth century's worst pandemics.

These documents promise to give us the ability to reconstruct what happened during the infectious disease epidemics of WWI is a way that we have never before been able to do. They will provide us with a framework through which we can directly test many of our previously untestable assumptions about how large pandemics operate in a real-world environment.

Our biostatistics effort is being lead by Crystal Linkletter.

[top]

History

The historical benefits of a project such as this one are incalculable. For the first time, historians will have a chance to come eye to eye with large-scale structures in the history of the Great War which were previously inaccessible merely because the study of them was too resource intensive. Medium-scale structures, too, will become vastly easier to study than they have heretofore been.

Methodology

Our study will undertake a massive scale process of institutional reconstitution. This part of the project will aim to construct a partial model of the institutions which fought the war by building up a picture according to the principle of the point cloud, in which millions of observations are added together to create an outline of the larger structure in which they fit. In simple terms, each record becomes like a point of colour in a painting by Seurat. In the reconstitution process we cause the computer to organise the records such that we can then 'zoom out' and see the patterns embedded between them.

Different types of records will require slightly different processes. In each case, we will take the information for each actor, be it a person, a unit or an administrative structure, and use that information to clump those actors together into groups. We then create relationships between these groups, creating bigger and bigger super-groups, until we have a rough map of the whole institutional system. Experience tells us that both of these steps, the grouping of records and the creation of relationships between these groups, will require some human oversight. The ability of a computer to work out the nuances of military terminology are as yet crude when compared to a human military historian. Nonetheless, the computer will be able to suggest connections and, more importantly, instantaneously apply general rules to the whole vast dataset. This computational 'heavy lifting' will reduce the work of many years down to a few months or weeks.

Different types of records will also produce different top-level 'models' of the institutions being reconstituted. Data derived from personnel records will inevitably tell only part of the story, data derived from regimental diaries another part. When the first stage modeling process is complete, it will be possible to compare and integrate these top level models such that they fact-check one another and fill in one another's gaps. The result will be a complex, multi-source model of the large-scale institutions which fought the Great War, a model richly embedded with geographical and chronological information.

Due to the fact that many of the component records which make up the new super-model are explicitly linked to geographic place names and to calendar dates, we will be able to use chronological visualisation tools and geographic information systems (GIS) to 'animate' the model and observe its development through time and through physical space. We can use other embedded information to pick out individual systems, such as the supply chains and communications networks which innervated the system. Other forms of information sampling will be useful for social history. For example, it will be possible to compare the addresses of recruits to their post-war addresses when they receive pensions upon leaving the service. This would shed light on the massive population movements created by wartime service. Finally, it will be possible to compare our institutional reconstitution to other large-scale historical databases. Our system will allow us to generate lists of people who served in the front-lines as opposed to in the rear. This can be compared to census data to, for the first time since the war, properly assess the long-term health impacts of front-line service.

With the creation of these reconstituted institutional models, it will be possible to create models of disease transmission in exactly the same way as one would with any other socio-institutional facet of the model. We will pass these recreated disease transmission models to the epidemiological team for their statistical analysis.

History Team

The historical part of this project is currently being coordinated, although not led, by Nick Gunz. Although there are a few exceptions, historians tend not to work in large groups or research units. In order to gather the advice and cooperation of the historical community, then, we are assembling an advisory group of WWI historians from all subspecialties. These researchers will be given advance access to Muninn WWI data and will have the ability to shape the type of data being produced by the project.

[top]

Literature

The literature component of this project will examine how descriptions of disease in wartime medical records alter in the presence of pandemic risk factors.

The arrival online of large datasets of medical records, particularly medical records kept at times and in places that present conditions favourable to pandemics, enables a more comprehensive search for significant language patterns that exchange nosological terms for other descriptors, a practice that, perhaps because unofficial, has heretofore received little attention other than as an exceptional or idiosyncratic practice.

We believe that euphemistic diction is widely used wherever pandemic conditions are likely to cause panic in a larger population. A vital resource for this study are corpora of digitised wartime medical records, which describe the ailments of soldiers, who live in close proximity and frequently under circumstances of deprivation.

The literature team is headed by Shelley Hulan.

[top]

 

Brown UniversityUniversity of WaterlooUniversity of Western Ontario
University of CambridgeCarleton University

contact: webteam@muninn-project.org