A critical analysis of data science applied to the exploration, analysis, and visualization of 19c social networks hidden in plain sight in the catalogues and indexes to large manuscript and print collections. Millions of 19c manuscripts are held in many libraries around the world and local search engines routinely facilitate their dissection into manageable chucks of data for traditional historical analysis. But how effectively can open-source applications using universal technologies be used to examine virtually linked datasets in their entirety? Can we learn new historical information by only examining catalogue data, without having a historical narrative as a starting point?
To assess the usefulness of the application of data science to historical data I have built a Historical Data Digital Toolkit (HDDT) comprising open source, free to use technologies that historians can access globally (SQL databases, Jupyter Notebooks, Gephi). I have taken data from Royal Anthropological Institute London, Quaker national archives, London, genealogical data from Quaker Family History Society and my own research. These datasets have then been combined into one SQL database. I can then deploy a range of applications within the HDDT to analyse and visualize catalogue data (pandas, NumPy, NetworkX, hvplot, Gephi, datashader).
The combined virtual dataset (on my laptop) enables me to expose the social networks amongst around 3000 people involved in some way with the development of the discipline of anthropology in Britain 1830 to 1870. Amongst these are around 600 Quakers. Can the HDDT reveal insights into these complex social networks, insights not readily understandable using traditional historical analytics?