The ARL Data Infrastructure

What is outlined here is a series of publications of academic library data. They were published with paper documentation and in digital formats. Given that most— but not all— are works for hire, I can't make the available unless I have permission of the owners. With that permission, I will be happy to supply the data and answer any questions I can. The purpose of this page is to list those publications which used the ARL infrastructure. As a result, the various series can be merged with little difficulty. They use the same variable names and same key variable. There are adjustments as the years go by as new variables are added.

The Princeton and Purdue compilations are a superset of the original data and include some ARL variables in addition to the original data.

The publication of Cumulated ARL University Library Statistics, 1962-63 through 1978-79 by Kendon Stubbs and David Buxton (Washington, ARL: 1981) was the first digital recompilation of a longitudinal file of library data. The methodology used was adapted in a number of subsequent publications almost all of which are owned by either the Association of Research Libraries or the Association of College and Research Libraries. The range of these data is from 1907/08 though 1987/88. I do not know if the organization of the ARL data since then has changed this infrastructure radically. Clearly, since then libraries have faced a revolution in the technology of information and have had to adapt. Their data has also had to adapt.

The key variable is INSTNO (Institution Number). In the Stubbs-Buxton compilation, INSTNO was a three-digit number tied to one institution and the numbers and names sorted in the same order. Kind of like Cutter numbers. Simple and robust. When I started working on The Gerould Statistics, the three-digit numbers did not provide enough space to fit the extra libraries so that the order by INSTNO would parallel the order alphabetical names and I expanded the three-digit number to four digits. I just multiplied by 10 so it was easy to go back and forth. The Gerould libraries had a few libraries like those of Leland Stanford Jr University which was Stanford in Stubbs-Buxton so, naturally, the early data were given the Stanford INSTNO. All subsequent publications discussed on this page used the four-digit key. Somewhere I have a count of the number of libraries that ever appeared in all of these files. It was several hundred and, generally, the larger academic libraries.

The variable names in these files are short but usually pretty obvious. Long names are hard to program so short names like: VOLS, VOLSADN, TOTSTAFF, YEAR, INAM, and so on. Given the publications that are highlighted here are longitudinal files, no variable is dropped but will be carried on even after it is no longer collected but new variables are added both through disaggregation and for new things that are collected as libraries adapt to the many changes in information technology. Such files tend to increase number of variables over time until they are no longer collected.

The list that follows describes the publications.

Academic Data using the ARL infrastructure
Source of data Title Years Notes
ARL The Gerould Statistics, 1907/08-1961/62 (Washington: Association of Research Libraries, 1986.) 1907/08-1961/62 Predecessor series to the ARL data. Background here
ACRL ACRL Academic Library Statistics, 1978/79-1987/88: A Guide to the Machine Readable Version of the ACRL Statistics (Chicago: Association of College and Research Libraries, 1989.) 1978/79-1987/88 This was the second ACRL series. The first was published in annual issues of one of the ALA journals. I don't have my notes handy so that is something to look up. The series went on for years and looks like a universe survey. This second series used the same survey instrument ARL used for its members. Roughly, ARL did surveyed its members and they were the 100 largest academic libraries and this publication caught the second 100. It was published with a booklet and diskettes. I remember Kendon observing that one of the variable definitions disagreed with the ARL definition. That was my mistake and I can not remember what that was. But: documentation in paper and digital data on diskettes and in several formats.
This compilation was based on four separate surveys of these ACRL libraries, published in paper only. The packaging was clever and Mary Ellen K. Davis came up with it.
ARL (Kendon Stubbs and Robert Molyneux) Research Library Statistics, 1907/08 Through 1987/88: A Guide to the Machine-Readable Version of the Gerould and ARL Statistics (Washington: Association of Research Libraries, 1990.) 1907/08-1987/88 This compilation joined the Gerould data with the ARL data through 1987/88. The packaging was copied from Mary Ellen's idea for the ACRL data. It had documentation and diskettes. I suspect since then, the data are available in one file from ARL. The Internet was later. It is easy to forget that.
ACRL ACRL/Historically Black Colleges and University Library Statistics, 1988-89 (Chicago: Association of College and Research Libraries, 1991.) 1988-89 Issued in paper only.

I wish to pause here and tell a story from 30 years ago.

At the time I was asked to work on a survey of the HBCU libraries, I had done...maybe 10 of these academic library data surveys and published for ARL and ACRL. Generally, at the time, arrangements were verbal with maybe a letter or a phone call. There was no email, of course.

ACRL wanted the HBCU libraries surveyed and, again, the ARL form was used. They sent a contract and the contract had an odd stipulation in it: that I was not to write anything based on the data without ACRL's permission. Kathleen de la Peña McCook pointed this out and asked me about this language and we chatted. I had never had a problem writing about anything from these various compilations and I couldn't think of any reason why ACRL would not permit me to write about what I found. Well, Kathleen was always keen and she was correct to draw attention to this proviso. At the time, I had all these data so I would be able to make comparisons and observations. Surely, I thought, that would not be a problem.

In the process of compiling the data, I spoke with a number of the directors at these libraries and with some, as these things go, we struck up friendly chats. When I had all the data together, I compared various ratios with the ARL/ACRL/HBCU libraries and found...things that struck me. For instance, there is a calculation that is easy to make but one has to bear in mind it is maybe not what you think it is: expenditures for materials divided by the number of volumes added. It is an attempt to get the average cost paid for volumes added. Yes, yes, I know. In any case, both of these numbers are in that series and the results of that calculation for the HBCU libraries were very different from the other two groups. Now, these libraries are smaller and the size of libraries is a skewed distribution so caution is in order. Ratios are one tool used to get a sense of things in such distributions. The differences were pronounced and I was curious so I called some of the directors that I got along with and talked to them. There was a common story: the administration gave the libraries a budget and after reporting these data to agencies like ACRL or the Feds, the budgets were cut. The libraries actually had fewer dollars than reported to purchase materials.

Each of the directors who talked to me swore me to secrecy and this is the first time I have told the story here, thirty years later.

I wrote ACRL for permission to write an article. I can't remember the whole details but I have the letters and can find them. I believe replies were delayed and the last communication was an arch reply said to the effect: I thought we had already given permission. I wrote back attempting to disabuse her of that notion and never had a reply.

There were other similar kinds of things in making these kinds of comparisons with these libraries. I could have helped our colleagues. Thirty years later, this event still rankles

ARL The Gerould Statistics, 1907/08-1961/62, 2nd, Web edition, 1998 1907/08-1961/62
LSU As yet unpublished 1926/27-1986/87 This is a series based on the original Gerould template and covers Southern libraries, largely. 163 libraries ever reported. Some are quite small but it looks like most of the biggest Southern Colleges and Universities are represented. Not all institutions reported all years. This digital series was keyed from the original sheets but I have not checked them, I am embarrassed to report. The ARL infrastructure is used. I think it would be neat to finish this series and then, finally, combine these various series and see what we can find. A table from the unfinished documentation gives the institutions' name and data ranges along with the four-digit ARL institution numbers.

December 22, 2022

Back to the main page