|Beginning date by state|
|List of Libraries|
|Schedule of Changes|
|Variables in PLDF3|
PLDF1 and PLDF2
NCES has published a number of series of library data and with the public library series it continues a rich tradition of compiling and publishing library data by the Department of Education and its predecessors that goes back to the seminal Public Libraries in the United States of America: Their History, Condition, and Management, (Washington, GPO: 1876) (commonly called the “1876 Report”). The report was published by the then Bureau of Education. There are actually older library data published by the US government but the subject of this history will have to wait for another time. A good place to start is Robert V. Williams, “The Making of Statistics of National Scope on American Libraries, 1836-1986: Purposes, Problems, and Issues.” Libraries and Culture 26(2): 464-485 (Spring, 1991).
The name of this dataset, “PLDF3,” reflects the fact that it is in an evolutionary continuum. In the discussion about variables, the reader will see that over the years, a number of variable names were used to describe essentially the same variables. Zipcodes are recorded in variable called ZIP in PLDF3 which was the name used in the 1998-2004 datasets. However ZIP1 was used from 1991 to 1997, LIBZIP from 1988-1990 and FLDE in 1987. (Note that in the dataset, NCES variable names are in uppercase while variables I created are lowercase and italicized here for clarity.) PLDF1 was the first merge of the various datasets and in it, all four of these variables representing the zipcode—indeed all the various variables used in each year—appear in the merged data. The result was a dataset that had something close to 150 variables and was about 230 megabytes in the master dataset in SAS format. A spreadsheet available in two formats (pldf3_variables.ods or pldf3_variables.xls) have a list of these variables by year. Using PLDF1 would have required an analyst to sort through the documentation and variable names and to write code to collapse the same variables with its various names into one variable name. This is what PLDF2 did and it has to be noted that the excellent and professional documentation made this work possible. PLDF2 had far fewer variables than the 150 or so in PLDF1 as a result of collapsing the like variables into one variable name. PLDF2 was 125 megabytes as a SAS dataset.
A problem arose if one wanted to use PLDF2 to track changes through time: are the changes we see from year to year the result of underlying changes in the conditions of libraries or a result of the fact that each year we may be measuring different sets of libraries as new libraries open and old ones close? For a number of purposes, it would be useful to analyze only libraries reporting each year so that we compare, as the saying goes, “apples and apples” or to be able to analyze any given set of individual libraries through time. Because of the way the data were published, such analysis was impossible with PLDF2 and this led to the complex process of creating PLDF3. This process is discussed in some detail in the documentation of that dataset.
Creating PLDF3 required the construction of a unique identifier for each library so that individual libraries can be analyzed. The NCES variable FSCSKEY appears to be such a number at first blush but is not. A large number of libraries have more than one such key in PLDF2 and a few have as many as five. My guess is that, on average, each library in the full dataset has about two FSCSKEYs. This fact is an artifact of the history of the collection of these data. Once a new key variable that is unique to each library over time was constructed, then it is relatively easy for someone analyzing the data to group libraries by the span of years these libraries reported data. The new identifier would also have other uses. PLDF3 differs from PLDF2 by virtue of the new variable newkey which resulted from an attempt to create a key variable unique for each library over time. How this was done is a complicated story and is discussed in the main PLDF3 documentation.