Quick Links
Data revision history
Beginning date by state
PLDF3 data
List of Libraries
Schedule of Changes
Variables in PLDF3

PLDF3 Data

Enhanced Longitudinal Public Library Data File (PLDF3)

Robert E. Molyneux

Abstract

The purpose of these pages is to document a set of longitudinal public library data constructed from the annual Public Library Survey of the National Center for Education Statistics (NCES) under the Federal State Cooperative System (FSCS) program and more recently by the Institute of Museum and Library Services (IMLS) through the Public Libraries Survey (PLS).

With the addition of the FY 2020 data, there are 303,851 observations in PLDF3. An “observation” presents the data for one library for one year. From FY 1987 through FY 2020, 10,245 library systems have contributed data. The Data Revision History has an accounting of data revisions to the raw NCES and IMLS data and the number of observations resulting from each change. These changes are in reverse chronological order according to when the problems were discovered and the changes made—not by the date of the data changed..

What follows here is complex and examples will be used to illustrate a difficulty with the raw data if one is trying to combine the many years data into one usable file. These examples are now largely historical but still important. Historical because, as each year has passed, IMLS, has made each year's data better. The documentation has always been a model for the library world but the compilers have made the inter-year connections better so there is no need to rehash these problems with current data. While each year is better, the early years are a bit...rough. For that reason, they are documented here with the tangled history of newkey.

Fair warning: there were knotty problems in assembling PLDF3 and sorting them out took time. What follows is the documentation of that solution: creating a newkey to correct problems with the FSCSKEY. This documentation has to be done but there is little need for most people to read the details that follow. Well, unless you have insomnia.

PLDF3 and Imputed Data

PLDF3 is designed to remove data imputations found in the original FSCS/PLS data. Those data do have imputations and the documentation has detailed discussions about how those imputations were done. Why did I remove the imputations? That is a complicated story. Briefly, the purposes of the FSCS/PLS effort and mine are different because we have different objectives. If you want to create a longitudinal file with the imputations, you can download the raw data from IMLS or the Archive. I suggest you adapt the newkey and consider examining the data revision history where duplicates and missing values are anomalies are discussed. With 300,000 observations there are not many such anomalies as a percent but there are still many.

Why go through the trouble of removing the imputations? The answer to that simple question is complicated, and I think interesting. Most people will not so I will discuss the matter of imputations here

Back to the knotty problems that had to be solved to create PLDF3, The first two paragraphs below give a short overview

FSCSKEY and newkey

The background of how PLDF3 came about is discussed separately where a review of its history is outlined—including a discussion of PLDF1 and PLDF2. In order to use the underlying data for trend analysis, problems with the original data had to be addressed. The process of going from PLDF1 to PLDF3 systematically dealt with these problems. PLDF3 is a superset of PLDF2 and rendered both PLDF1 and PLDF2 obsolete.

The NCES/IMLS data use a key variable, FSCSKEY, to identify libraries. This key has the two digit state postal code and a four-digit sequence key for each public library in the U.S. Although in recent years there has been good control of this number so that assignment of keys is improving in consistency over time, in the past the FSCSKEY was not applied consistently. Over time, many libraries were assigned more than one FSCSKEY and some as many as five. In addition, individual FSCSKEYs were issued to more than one library. This fact had to do with the way NCES published the data and it resulted in a failure of one thing a key variable must do: each one must be unique and tied to one library through a series. Key variables, unlike library names, will not change and given their structure are easier to manipulate library names which do change. In order to fix this problem so that each library had a unique key that was, in fact, a new key variable, a newkey was created for PLDF3. This page now turns to FSCSKEY and newkey in more detail. It is hoped that the newkey will provide analysts with a tool which will keep a library's data together in a consistent fashion.

FSCSKEY Examples

Here is a simple example from the data in the libraries in West Virginia to illustrate how things would look when the FSCSKEY is stable over time. The data are sorted by FSCSKEY and year:

Example 1: Simple Example from PLDF2
FSCSKEY year ZIP state CITY LIBNAME
WV0001 1988 25801 WV Beckley Raleigh County Public Library
WV0001 1989 25801 WV BECKLEY Raleigh County PL
WV0001 1990 25801 WV BECKLEY Raleigh County PL
WV0001 1991 25801 WV BECKLEY Raleigh County
WV0001 1992 25801 WV BECKLEY Raleigh County
WV0001 1993 25801 WV BECKLEY Raleigh County
WV0001 1994 25801 WV BECKLEY Raleigh County
WV0001 1995 25801 WV BECKLEY Raleigh County Public Library
WV0001 1996 25801 WV BECKLEY Raleigh County Public Library
WV0001 1997 25801 WV BECKLEY Raleigh County Public Library
WV0001 1998 25801 WV BECKLEY RALEIGH COUNTY PUBLIC LIBRARY
WV0001 1999 25801 WV BECKLEY RALEIGH COUNTY PUBLIC LIBRARY
WV0001 2000 25801 WV BECKLEY RALEIGH COUNTY PUBLIC LIBRARY
WV0001 2001 25801 WV BECKLEY RALEIGH COUNTY PUBLIC LIBRARY
WV0001 2002 25801 WV BECKLEY RALEIGH COUNTY PUBLIC LIBRARY
WV0001 2003 25801 WV BECKLEY RALEIGH COUNTY PUBLIC LIBRARY
WV0001 2004 25801 WV BECKLEY RALEIGH COUNTY PUBLIC LIBRARY
WV0001 2005 25801 WV BECKLEY RALEIGH COUNTY PUBLIC LIBRARY
WV0001 2006 25801 WV BECKLEY RALEIGH COUNTY PUBLIC LIBRARY
WV0001 2007 25801 WV BECKLEY RALEIGH COUNTY PUBLIC LIBRARY
WV0001 2008 25801 WV BECKLEY RALEIGH COUNTY PUBLIC LIBRARY
WV0001 2009 25801 WV BECKLEY RALEIGH COUNTY PUBLIC LIBRARY

The left column is FSCSKEY, a number that is made by the two digit state code (WV) plus a four-digit number originally assigned sequentially (0001). The year column is in order by the year the data are reported to NCES/IMLS so the listed ZIP, state, CITY, and LIBNAME all come from the data for that year. Note that in 1988 the CITY is “Beckley” while in subsequent years it is “BECKLEY.” In other words, in the 1988 data, Beckley is reported in mixed case and in subsequent years it is uppercase. LIBNAME varies, too, as can be seen. Variations CITY and LIBNAME are common and those in ZIP are not that unusual in this dataset as happens, for instance, when a main library is moved to a new building or as a result of errors. These variations are important, particularly if one wants to examine data from the same library or set of libraries over time. How can one isolate the data from one library with changes in its name, location, or its CITY from year to year? As discussed above, being able to analyze libraries with these variations is made easier with a key variable such as the FSCSKEY.

Now to a more typical case from Colorado, sorted again by FSCSKEY and year:

Example 2: A Complex Example from PLDF2
FSCSKEY year ZIP state CITY LIBNAME
CO0001 1987 80421 CO BAILEY PARK CO PL
CO0001 1988 80421 CO Bailey Park Co PL
CO0001 1989 80229 CO Thornton Adams Co PL
CO0001 1990 80229 CO Thornton Adams Co PL
CO0001 1991 80229 CO Thornton Adams Co PL
CO0001 1992 80229 CO Thornton Adams Co Lib Sys
CO0001 1993 80229 CO Thornton Adams Co Lib Sys
CO0001 1994 80229 CO Thornton Adams Co Lib Sys
CO0001 1995 80229 CO THORNTON ADAMS CO LIB SYS
CO0001 1996 80229 CO Thornton Adams Co Lib Sys
CO0001 1997 80229 CO THORNTON ADAMS CO LIB SYS
CO0001 1998 80229 CO THORNTON ADAMS CO LIB SYS
CO0001 1999 80229 CO THORNTON ADAMS COUNTY LIBRARY SYSTEM
CO0001 2000 80229 CO THORNTON ADAMS COUNTY LIBRARY SYSTEM
CO0001 2001 80229 CO THORNTON ADAMS COUNTY LIBRARY SYSTEM
CO0001 2002 80229 CO THORNTON ADAMS COUNTY LIBRARY SYSTEM
CO0001 2003 80229 CO THORNTON ADAMS COUNTY LIBRARY SYSTEM
CO0001 2004 80229 CO THORNTON RANGEVIEW LIBRARY DISTRICT
CO0001 2005 80229 CO THORNTON RANGEVIEW LIBRARY DISTRICT
CO0001 2006 80229 CO THORNTON RANGEVIEW LIBRARY DISTRICT
CO0001 2007 80234 CO NORTHGLEN RANGEVIEW LIBRARY DISTRICT
CO0001 2008 80234 CO NORTHGLEN RANGEVIEW LIBRARY DISTRICT
CO0001 2009 80234 CO NORTHGLEN RANGEVIEW LIBRARY DISTRICT

Here we have selected information on the libraries from 1987-2009 with FSCSKEY CO0001 in Colorado. Note first that 1987 and 1988 appear to be for a different library...but are they? In the next years, there are changes in CITY, LIBNAME, ZIP, and the name of the library. Are we talking about the same library in 1987-88 as we are with the rest of the years?

What happened in FY 2004 with the name change and in 2007 with the change in CITY and ZIP could be just another change in these external details about this library but without affecting the collections, staff, or the library's users. Will data from CO0001 for these years be from the same library? From the West Virginia example, we know we can expect some variation for any given library in these reported variables but how much is too much? Let's look at another example from Colorado to probe the question further:

Example 3: Another Complex Example from PLDF2
FSCSKEY year ZIP state CITY LIBNAME
CO0094 1987 81502 CO GRAND JUNCTION MESA CO PL
CO0094 1988 80027 CO Louisville Louisville PL
CO0094 1989 80421 CO Bailey Park Co PL
CO0094 1990 80421 CO Bailey Park Co PL
CO0094 1991 80421 CO Bailey Park Co PL
CO0094 1992 80421 CO Bailey Park Co PL
CO0094 1993 80421 CO Bailey Park Co PL
CO0094 1994 80421 CO Bailey Park Co PL
CO0094 1995 80421 CO BAILEY PARK CO PL
CO0094 1996 80421 CO Bailey Park Co PL
CO0094 1997 80421 CO BAILEY PARK CO PL
CO0094 1998 80421 CO BAILEY PARK CO PL
CO0094 1999 80421 CO BAILEY PARK COUNTY PUBLIC LIBRARY
CO0094 2000 80421 CO BAILEY PARK COUNTY PUBLIC LIBRARY
CO0094 2001 80421 CO BAILEY PARK COUNTY PUBLIC LIBRARY
CO0094 2002 80421 CO BAILEY PARK COUNTY PUBLIC LIBRARY
CO0094 2003 80421 CO BAILEY PARK COUNTY PUBLIC LIBRARY
CO0094 2004 80421 CO BAILEY PARK COUNTY PUBLIC LIBRARY
CO0094 2005 80440 CO FAIRPLAY PARK COUNTY PUBLIC LIBRARY
CO0094 2006 80440 CO FAIRPLAY PARK COUNTY PUBLIC LIBRARY
CO0094 2007 80440 CO FAIRPLAY PARK COUNTY PUBLIC LIBRARY
CO0094 2008 80440 CO FAIRPLAY PARK COUNTY PUBLIC LIBRARY
CO0094 2009 80440 CO FAIRPLAY PARK COUNTY PUBLIC LIBRARY

It seems clear from this third example that the data for CO0001 for 1987 and 1988 should be with the other data from Bailey in CO0094. As it happens, the Rangeview Library District in CO0001 is the new name for the Adams County Library District so it is in the right place. But what of Grand Junction and Louisville? Alas, the data from each are located in other FSCSKEYs so we have a Gordian Knot of intertwined libraries and data that have to be untangled. This is what PLDF3 attempts to do: to join the data from one library for one year with the data for that same library in other years. In the early years of the data FSCSKEYs were frequently reassigned. How can we look at trends for one library or a set of libraries, given this condition of the data?

Enter “newkey”

The central change in PLDF3 is the addition of the variable newkey which attempts to create a unique key for each library that is consistent through time. Consider the next example table for newkey CO0001:

Example 4: Example with newkey
newkey FSCSKEY year ZIP state CITY LIBNAME
CO0001 CO0028 1987 80233 CO NORTHGLENN ADAMS CO PL
CO0001 CO0048 1988 80233 CO Northglenn Adams Co PL
CO0001 CO0001 1989 80229 CO Thornton Adams Co PL
CO0001 CO0001 1990 80229 CO Thornton Adams Co PL
CO0001 CO0001 1991 80229 CO Thornton Adams Co PL
CO0001 CO0001 1992 80229 CO Thornton Adams Co Lib Sys
CO0001 CO0001 1993 80229 CO Thornton Adams Co Lib Sys
CO0001 CO0001 1994 80229 CO Thornton Adams Co Lib Sys
CO0001 CO0001 1995 80229 CO THORNTON ADAMS CO LIB SYS
CO0001 CO0001 1996 80229 CO Thornton Adams Co Lib Sys
CO0001 CO0001 1997 80229 CO THORNTON ADAMS CO LIB SYS
CO0001 CO0001 1998 80229 CO THORNTON ADAMS CO LIB SYS
CO0001 CO0001 1999 80229 CO THORNTON ADAMS COUNTY LIBRARY SYSTEM
CO0001 CO0001 2000 80229 CO THORNTON ADAMS COUNTY LIBRARY SYSTEM
CO0001 CO0001 2001 80229 CO THORNTON ADAMS COUNTY LIBRARY SYSTEM
CO0001 CO0001 2002 80229 CO THORNTON ADAMS COUNTY LIBRARY SYSTEM
CO0001 CO0001 2003 80229 CO THORNTON ADAMS COUNTY LIBRARY SYSTEM
CO0001 CO0001 2004 80229 CO THORNTON RANGEVIEW LIBRARY DISTRICT
CO0001 CO0001 2005 80229 CO THORNTON RANGEVIEW LIBRARY DISTRICT
CO0001 CO0001 2006 80229 CO THORNTON RANGEVIEW LIBRARY DISTRICT
CO0001 CO0001 2007 80234 CO NORTHGLENN RANGEVIEW LIBRARY DISTRICT
CO0001 CO0001 2008 80234 CO NORTHGLENN RANGEVIEW LIBRARY DISTRICT
CO0001 CO0001 2009 80234 CO NORTHGLENN RANGEVIEW LIBRARY DISTRICT

This table is in order by newkey and year. Note that the newkey is the same for all these years while the FSCSKEY (see the second column) changes, as does the CITY. However, we still have data from the Adams County Public Library (or now Rangeview Library District) so, in spite of the change in the city, it still appears we are dealing with the same system. In general, the problems with FSCSKEY appear to be with the early years of the data and not for all states. By 1996 or 1997, FSCSKEY appears to be largely consistent. However, a common occurrence in a few states is for a library to miss a few years and reappear with a completely new FSCSKEY in subsequent years. For instance, in the FY 2015 data, I found one, in the FY 2017 there were two. There were none in the FY 2018 data but FY 2019 has one: a library that reported from FY 1988 – FY 1990, then stopped reporting, and then reappeared in FY 2019.

The short explanation of how newkey is created is by taking the FSCSKEY for the last year the library reported—if there is ambiguity, unless the library closed before then—and assigning the value of the FSCSKEY to each institution's record as newkey. That way, users of the latest data can use the newkey for consistent data for earlier years. As mentioned, the FSCSKEY has largely been stable in recent years so changes have only rarely been necessary in the last few years. The longer and more formal explanation is discussed within the Data Revision History and the Schedule of Changes in FSCSKEYs. This step requires programming changes in the data based ultimately on an analysis of all libraries in the dataset. In cases where two libraries are have the same newkey by these two rules, the one that has disappeared is assigned a sequence number above 9000 (for example NE9007). That part of the address space is largely empty. All such changes are in the schedule of changes where the actual programming code is included by state. As mentioned above, PLDF3 is a superset of the NCES/IMLS data and keeps the FSCSKEY and most other variables from the original data. The NCES/IMLS variables are in upper case, and the four added are in lowercase..

The creation of this dataset is painstaking work and requires careful attention to detail. The process involves programming to take data from NCES/IMLS and making sometimes numerous emendations to these data and as a result, there will be 57 (the states plus DC, American Samoa, Guam, the Northern Mariana Islands, Palau, Puerto Rico, and the Virgin Islands) base programs, a number of ancillary programs per state, plus summary programs. It is a complex process that takes time and I suspect we will be rooting out the odd error here or there for a time. The complete programs themselves are available to anyone who wants them, although the various state programs have the code that creates the newkeys for each library where it differs from that library's FSCSKEY/year pair.

Now for something completely different

For those who share with me an interest in the vagaries of data, take a look at the curious case of the ZZs. I doubt these anomalies are important but they are certainly diverting. I would welcome an idea of what these are about.

December 14, 2022