PLDF3 Data
Beginning date by state |
PLDF3 data |
List of Libraries |
Schedule of Changes |
Variables in PLDF3 |
Enhanced Longitudinal Public Library Data File (PLDF3)
Robert E. Molyneux
Abstract
The purpose of these pages is to document a set of longitudinal public library data constructed from the annual Public Library Survey of the National Center for Education Statistics (NCES) under the Federal State Cooperative System (FSCS) program and more recently by the Institute of Museum and Library Services (IMLS) through the Public Library Statistics Cooperative (PLSC).
With the addition of the FY 2015 data, there are 257,512 observations in the dataset. An “observation” presents the data for one library for one year. From FY 1987 through FY 2016, 10,157 library systems have contributed data. The Data Revision History has an accounting of data revisions to the raw NCES and IMLS data and the number of observations resulting from each change. These changes are in reverse chronological order according to when the problems were discovered and the changes made—not by the date of the data changed.
What follows here is complex. These examples are now largely historical but still important. Historical because, as each year has passed, NCES/IMLS and the ace data experts at Census, have made each year's data better. The documentation has always been a model for the library world but the compilers have made the inter-year connections better so there is no need to rehash these problems with current data. While each year is better, the early years are a bit...rough. For that reason, they are document here with the tangled history of newkey.
FSCSKEY and newkey
The background of how PLDF3 came about is discussed separately where a review of its history is outlined—including a discussion of PLDF1 and PLDF2. In order to use the underlying data for trend analysis, problems with the original data had to be addressed. The process of going from PLDF1 to PLDF3 systematically dealt with these problems. PLDF3 is a superset of PLDF2 and renders both PLDF1 and PLDF2 obsolete.
The NCES/IMLS data use a key variable, FSCSKEY, to identify libraries. This key has the two digit state postal code and a four-digit sequence key for each public library in the U.S. Although in recent years there has been good control of this number so that assignment of keys is improving in consistency over time, in the past the FSCSKEY was not applied consistently. Over time, many libraries were assigned more than one FSCSKEY and some as many as five. This fact had to do with the way NCES published the data. In order to fix this problem so that each library had a unique key that was, in fact, a new key variable, a newkey was created for PLDF3. This page now turns to FSCSKEY and newkey in more detail.
FSCSKEY Examples
Here is a simple example from the data in the libraries in West Virginia to illustrate how things would look when the FSCSKEY is stable over time. The data are sorted by FSCSKEY and year:
FSCSKEY | year | ZIP | state | CITY | LIBNAME |
---|---|---|---|---|---|
WV0001 | 1988 | 25801 | WV | Beckley | Raleigh County Public Library |
WV0001 | 1989 | 25801 | WV | BECKLEY | Raleigh County PL |
WV0001 | 1990 | 25801 | WV | BECKLEY | Raleigh County PL |
WV0001 | 1991 | 25801 | WV | BECKLEY | Raleigh County |
WV0001 | 1992 | 25801 | WV | BECKLEY | Raleigh County |
WV0001 | 1993 | 25801 | WV | BECKLEY | Raleigh County |
WV0001 | 1994 | 25801 | WV | BECKLEY | Raleigh County |
WV0001 | 1995 | 25801 | WV | BECKLEY | Raleigh County Public Library |
WV0001 | 1996 | 25801 | WV | BECKLEY | Raleigh County Public Library |
WV0001 | 1997 | 25801 | WV | BECKLEY | Raleigh County Public Library |
WV0001 | 1998 | 25801 | WV | BECKLEY | RALEIGH COUNTY PUBLIC LIBRARY |
WV0001 | 1999 | 25801 | WV | BECKLEY | RALEIGH COUNTY PUBLIC LIBRARY |
WV0001 | 2000 | 25801 | WV | BECKLEY | RALEIGH COUNTY PUBLIC LIBRARY |
WV0001 | 2001 | 25801 | WV | BECKLEY | RALEIGH COUNTY PUBLIC LIBRARY |
WV0001 | 2002 | 25801 | WV | BECKLEY | RALEIGH COUNTY PUBLIC LIBRARY |
WV0001 | 2003 | 25801 | WV | BECKLEY | RALEIGH COUNTY PUBLIC LIBRARY |
WV0001 | 2004 | 25801 | WV | BECKLEY | RALEIGH COUNTY PUBLIC LIBRARY |
WV0001 | 2005 | 25801 | WV | BECKLEY | RALEIGH COUNTY PUBLIC LIBRARY |
WV0001 | 2006 | 25801 | WV | BECKLEY | RALEIGH COUNTY PUBLIC LIBRARY |
WV0001 | 2007 | 25801 | WV | BECKLEY | RALEIGH COUNTY PUBLIC LIBRARY |
WV0001 | 2008 | 25801 | WV | BECKLEY | RALEIGH COUNTY PUBLIC LIBRARY |
WV0001 | 2009 | 25801 | WV | BECKLEY | RALEIGH COUNTY PUBLIC LIBRARY |
The left column is FSCSKEY, a number that is made by the two digit state code (WV) plus a four-digit number originally assigned sequentially (0001). The year column is in order by the year the data are reported to NCES/IMLS so the listed ZIP, state, CITY, and LIBNAME all come from the data for that year. Note that in 1988 the CITY is “Beckley” while in subsequent years it is “BECKLEY.” In other words, in the 1988 data, Beckley is reported in mixed case and in subsequent years it is uppercase. LIBNAME varies, too, as can be seen. Variations CITY and LIBNAME are common and those in ZIP are not that unusual in this dataset as happens, for instance, when a main library is moved to a new building or as a result of errors. These variations are important, particularly if one wants to examine data from the same library or set of libraries over time. How can one isolate the data from one library with changes in its name, location, or its CITY from year to year? As discussed above, being able to analyze libraries with these variations is made easier with a key variable such as the FSCSKEY.
Now to a more typical case from Colorado, sorted again by FSCSKEY and year:
FSCSKEY | year | ZIP | state | CITY | LIBNAME |
---|---|---|---|---|---|
CO0001 | 1987 | 80421 | CO | BAILEY | PARK CO PL |
CO0001 | 1988 | 80421 | CO | Bailey | Park Co PL |
CO0001 | 1989 | 80229 | CO | Thornton | Adams Co PL |
CO0001 | 1990 | 80229 | CO | Thornton | Adams Co PL |
CO0001 | 1991 | 80229 | CO | Thornton | Adams Co PL |
CO0001 | 1992 | 80229 | CO | Thornton | Adams Co Lib Sys |
CO0001 | 1993 | 80229 | CO | Thornton | Adams Co Lib Sys |
CO0001 | 1994 | 80229 | CO | Thornton | Adams Co Lib Sys |
CO0001 | 1995 | 80229 | CO | THORNTON | ADAMS CO LIB SYS |
CO0001 | 1996 | 80229 | CO | Thornton | Adams Co Lib Sys |
CO0001 | 1997 | 80229 | CO | THORNTON | ADAMS CO LIB SYS |
CO0001 | 1998 | 80229 | CO | THORNTON | ADAMS CO LIB SYS |
CO0001 | 1999 | 80229 | CO | THORNTON | ADAMS COUNTY LIBRARY SYSTEM |
CO0001 | 2000 | 80229 | CO | THORNTON | ADAMS COUNTY LIBRARY SYSTEM |
CO0001 | 2001 | 80229 | CO | THORNTON | ADAMS COUNTY LIBRARY SYSTEM |
CO0001 | 2002 | 80229 | CO | THORNTON | ADAMS COUNTY LIBRARY SYSTEM |
CO0001 | 2003 | 80229 | CO | THORNTON | ADAMS COUNTY LIBRARY SYSTEM |
CO0001 | 2004 | 80229 | CO | THORNTON | RANGEVIEW LIBRARY DISTRICT |
CO0001 | 2005 | 80229 | CO | THORNTON | RANGEVIEW LIBRARY DISTRICT |
CO0001 | 2006 | 80229 | CO | THORNTON | RANGEVIEW LIBRARY DISTRICT |
CO0001 | 2007 | 80234 | CO | NORTHGLEN | RANGEVIEW LIBRARY DISTRICT |
CO0001 | 2008 | 80234 | CO | NORTHGLEN | RANGEVIEW LIBRARY DISTRICT |
CO0001 | 2009 | 80234 | CO | NORTHGLEN | RANGEVIEW LIBRARY DISTRICT |
Here we have selected information on the libraries from 1987-2009 with FSCSKEY CO0001 in Colorado. Note first that 1987 and 1988 appear to be for a different library...but are they? In the next years, there are changes in CITY, LIBNAME, ZIP, and the name of the library. Are we talking about the same library in 1987-88 as we are with the rest of the years?
What happened in FY 2004 with the name change and in 2007 with the change in CITY and ZIP could be just another change in these external details about this library but without affecting the collections, staff, or the library's users. Will data from CO0001 for these years be from the same library? From the West Virginia example, we know we can expect some variation for any given library in these reported variables but how much is too much? Let's look at another example from Colorado to probe the question further:
FSCSKEY | year | ZIP | state | CITY | LIBNAME |
---|---|---|---|---|---|
CO0094 | 1987 | 81502 | CO | GRAND JUNCTION | MESA CO PL |
CO0094 | 1988 | 80027 | CO | Louisville | Louisville PL |
CO0094 | 1989 | 80421 | CO | Bailey | Park Co PL |
CO0094 | 1990 | 80421 | CO | Bailey | Park Co PL |
CO0094 | 1991 | 80421 | CO | Bailey | Park Co PL |
CO0094 | 1992 | 80421 | CO | Bailey | Park Co PL |
CO0094 | 1993 | 80421 | CO | Bailey | Park Co PL |
CO0094 | 1994 | 80421 | CO | Bailey | Park Co PL |
CO0094 | 1995 | 80421 | CO | BAILEY | PARK CO PL |
CO0094 | 1996 | 80421 | CO | Bailey | Park Co PL |
CO0094 | 1997 | 80421 | CO | BAILEY | PARK CO PL |
CO0094 | 1998 | 80421 | CO | BAILEY | PARK CO PL |
CO0094 | 1999 | 80421 | CO | BAILEY | PARK COUNTY PUBLIC LIBRARY |
CO0094 | 2000 | 80421 | CO | BAILEY | PARK COUNTY PUBLIC LIBRARY |
CO0094 | 2001 | 80421 | CO | BAILEY | PARK COUNTY PUBLIC LIBRARY |
CO0094 | 2002 | 80421 | CO | BAILEY | PARK COUNTY PUBLIC LIBRARY |
CO0094 | 2003 | 80421 | CO | BAILEY | PARK COUNTY PUBLIC LIBRARY |
CO0094 | 2004 | 80421 | CO | BAILEY | PARK COUNTY PUBLIC LIBRARY |
CO0094 | 2005 | 80440 | CO | FAIRPLAY | PARK COUNTY PUBLIC LIBRARY |
CO0094 | 2006 | 80440 | CO | FAIRPLAY | PARK COUNTY PUBLIC LIBRARY |
CO0094 | 2007 | 80440 | CO | FAIRPLAY | PARK COUNTY PUBLIC LIBRARY |
CO0094 | 2008 | 80440 | CO | FAIRPLAY | PARK COUNTY PUBLIC LIBRARY |
CO0094 | 2009 | 80440 | CO | FAIRPLAY | PARK COUNTY PUBLIC LIBRARY |
It seems clear from this third example that the data for CO0001 for 1987 and 1988 should be with the other data from Bailey in CO0094. As it happens, the Rangeview Library District in CO0001 is the new name for the Adams County Library District so it is in the right place. But what of Grand Junction and Louisville? Alas, the data from each are located in other FSCSKEYs so we have a Gordian Knot of intertwined libraries and data that have to be untangled. This is what PLDF3 attempts to do: to join the data from one library for one year with the data for that same library in other years. In the early years of the data FSCSKEYs were frequently reassigned. How can we look at trends for one library or a set of libraries, given this condition of the data?
Enter “newkey”
The central change in PLDF3 is the addition of the variable newkey which attempts to create a unique key for each library that is consistent through time. Consider the next example table for newkey CO0001:
newkey | FSCSKEY | year | ZIP | state | CITY | LIBNAME |
---|---|---|---|---|---|---|
CO0001 | CO0028 | 1987 | 80233 | CO | NORTHGLENN | ADAMS CO PL |
CO0001 | CO0048 | 1988 | 80233 | CO | Northglenn | Adams Co PL |
CO0001 | CO0001 | 1989 | 80229 | CO | Thornton | Adams Co PL |
CO0001 | CO0001 | 1990 | 80229 | CO | Thornton | Adams Co PL |
CO0001 | CO0001 | 1991 | 80229 | CO | Thornton | Adams Co PL |
CO0001 | CO0001 | 1992 | 80229 | CO | Thornton | Adams Co Lib Sys |
CO0001 | CO0001 | 1993 | 80229 | CO | Thornton | Adams Co Lib Sys |
CO0001 | CO0001 | 1994 | 80229 | CO | Thornton | Adams Co Lib Sys |
CO0001 | CO0001 | 1995 | 80229 | CO | THORNTON | ADAMS CO LIB SYS |
CO0001 | CO0001 | 1996 | 80229 | CO | Thornton | Adams Co Lib Sys |
CO0001 | CO0001 | 1997 | 80229 | CO | THORNTON | ADAMS CO LIB SYS |
CO0001 | CO0001 | 1998 | 80229 | CO | THORNTON | ADAMS CO LIB SYS |
CO0001 | CO0001 | 1999 | 80229 | CO | THORNTON | ADAMS COUNTY LIBRARY SYSTEM |
CO0001 | CO0001 | 2000 | 80229 | CO | THORNTON | ADAMS COUNTY LIBRARY SYSTEM |
CO0001 | CO0001 | 2001 | 80229 | CO | THORNTON | ADAMS COUNTY LIBRARY SYSTEM |
CO0001 | CO0001 | 2002 | 80229 | CO | THORNTON | ADAMS COUNTY LIBRARY SYSTEM |
CO0001 | CO0001 | 2003 | 80229 | CO | THORNTON | ADAMS COUNTY LIBRARY SYSTEM |
CO0001 | CO0001 | 2004 | 80229 | CO | THORNTON | RANGEVIEW LIBRARY DISTRICT |
CO0001 | CO0001 | 2005 | 80229 | CO | THORNTON | RANGEVIEW LIBRARY DISTRICT |
CO0001 | CO0001 | 2006 | 80229 | CO | THORNTON | RANGEVIEW LIBRARY DISTRICT |
CO0001 | CO0001 | 2007 | 80234 | CO | NORTHGLENN | RANGEVIEW LIBRARY DISTRICT |
CO0001 | CO0001 | 2008 | 80234 | CO | NORTHGLENN | RANGEVIEW LIBRARY DISTRICT |
CO0001 | CO0001 | 2009 | 80234 | CO | NORTHGLENN | RANGEVIEW LIBRARY DISTRICT |
This table is in order by newkey and year. Note that the newkey is the same for all these years while the FSCSKEY (see the second column) changes, as does the CITY. However, we still have data from the Adams County Public Library (or now Rangeview Library District) so, in spite of the change in the city, it still appears we are dealing with the same system. In general, the problems with FSCSKEY appear to be with the early years of the data and not for all states. By 1996 or 1997, FSCSKEY appears to be largely consistent. However, a common occurence in a few states is for a library to miss a few years and reappear with a completely new FSCSKEY in subsequent years. For instance, in the FY 2015 data, I found one.
The short explanation of how newkey is created is by taking the FSCSKEY for the last year the library reported—usually FY 2015 if there is ambiguity, unless the library closed before then—and assigning the value of the FSCSKEY to each institution's record as newkey. That way, users of the latest data can use the newkey for consistent data for earlier years. As mentioned, the FSCSKEY has largely been stable in recent years so changes have only rarely been necessary in the last few years. The longer and more formal explanation is discussed with the “Schedule of Changes” in FSCSKEYs. This step requires programming changes in the data based ultimately on an analysis of all libraries in the dataset. In cases where two libraries are have the same newkey by these two rules, the one that has disappeared is assigned a sequence number above 9000 (for example NE9007). That part of the address space is largely empty. All such changes are in the schedule of changes where the actual programming code is included by state. Of course, the FSCSKEY is kept in PLDF3.
The creation of this dataset is painstaking work and requires careful attention to detail. The process involves programming to take data from NCES/IMLS and making sometimes numerous emendations to these data and as a result, there will be 56 (the states plus DC, Guam, the Northern Marianas, Palau, Puerto Rico, and the Virgin Islands) base programs, a number of ancillary programs per state, plus summary programs. It is a complex process that takes time and I suspect we will be rooting out the odd error here or there for a time. The complete programs themselves are available to anyone who wants them, although the various state programs have the code that creates the newkeys for each library where it differs from that library's FSCSKEY/year pair.
Now for something completely different
For those who share with me an interest in the vagaries of data, take a look at the curious case of the ZZs. I doubt these anomalies are important but they are certainly diverting. I would welcome an idea of what these are about.
March 26, 2018
Back to the main PLDF3 page