Data revision history

Enhanced Longitudinal Public Library Data File (PLDF3)

Robert E. Molyneux


Abstract

The purpose of these pages is to document a set of longitudinal public library data constructed from the annual Public Library Survey of the National Center for Education Statistics (NCES) under the Federal State Cooperative System (FSCS) program and more recently by the Institute of Museum and Library Services (IMLS) through the Public Library Statistics Cooperative (PLSC).

With the addition of the FY 2011 data, there are 220,408 observations in the dataset. An “observation” is the data for one library for one year. From FY 1987 through FY 2011, 10,090 library systems contributed data. The Data Revision History has an accounting of data revisions to the raw NCES and IMLS data and the number of observations resulting from each change. These changes are in reverse chronological order according to when the problems were discovered and the changes made—not by the date of the data changed.

What follows here is complex. These examples are now largely historical but still important. Historical because, as each year has passed, NCES/IMLS and the ace data experts at Census, have made each year's data better. The documentation has always been a model for the library world but the compilers have made the inter-year connections better so there is no need to rehash these problems with current data. While each year is better, the early years are a bit...rough. For that reason, we document them here and the tangled history of newkey.


FSCSKEY and newkey

The background of how PLDF3 came about is discussed separately where a review of its history is outlined—including a discussion of PLDF1 and PLDF2. In order to use the underlying data for trend analysis, problems with the original data had to be addressed. The process of going from PLDF1 to PLDF3 systematically dealt with these problems. PLDF3 is a superset of PLDF2 and renders both PLDF1 and PLDF2 obsolete.

The NCES/IMLS data use a key variable, FSCSKEY, to identify libraries. This key has the two digit state postal code and a four-digit sequence key for each public library in the U.S. Although in recent years there has been good control of this number so that assignment of keys is improving in consistency over time, in the past the FSCSKEY was not applied consistently. Over time, many libraries were assigned more than one FSCSKEY and some as many as five. This fact had to do with the way NCES published the data. In order to fix this problem so that each library had a unique key that was, in fact, a new key variable, a newkey was created for PLDF3. This page now turns to FSCSKEY and newkey in more detail.

FSCSKEY Examples

Here is a simple example from the data in the libraries in West Virginia to illustrate how things would look when the FSCSKEY is stable over time. The data are sorted by FSCSKEY and year:

Example 1: Simple Example from PLDF2
FSCSKEY year ZIP state CITY LIBNAME
WV0001 1988 25801 WV Beckley Raleigh County Public Library
WV0001 1989 25801 WV BECKLEY Raleigh County PL
WV0001 1990 25801 WV BECKLEY Raleigh County PL
WV0001 1991 25801 WV BECKLEY Raleigh County
WV0001 1992 25801 WV BECKLEY Raleigh County
WV0001 1993 25801 WV BECKLEY Raleigh County
WV0001 1994 25801 WV BECKLEY Raleigh County
WV0001 1995 25801 WV BECKLEY Raleigh County Public Library
WV0001 1996 25801 WV BECKLEY Raleigh County Public Library
WV0001 1997 25801 WV BECKLEY Raleigh County Public Library
WV0001 1998 25801 WV BECKLEY RALEIGH COUNTY PUBLIC LIBRARY
WV0001 1999 25801 WV BECKLEY RALEIGH COUNTY PUBLIC LIBRARY
WV0001 2000 25801 WV BECKLEY RALEIGH COUNTY PUBLIC LIBRARY
WV0001 2001 25801 WV BECKLEY RALEIGH COUNTY PUBLIC LIBRARY
WV0001 2002 25801 WV BECKLEY RALEIGH COUNTY PUBLIC LIBRARY
WV0001 2003 25801 WV BECKLEY RALEIGH COUNTY PUBLIC LIBRARY
WV0001 2004 25801 WV BECKLEY RALEIGH COUNTY PUBLIC LIBRARY
WV0001 2005 25801 WV BECKLEY RALEIGH COUNTY PUBLIC LIBRARY
WV0001 2006 25801 WV BECKLEY RALEIGH COUNTY PUBLIC LIBRARY
WV0001 2007 25801 WV BECKLEY RALEIGH COUNTY PUBLIC LIBRARY
WV0001 2008 25801 WV BECKLEY RALEIGH COUNTY PUBLIC LIBRARY
WV0001 2009 25801 WV BECKLEY RALEIGH COUNTY PUBLIC LIBRARY

The left column is FSCSKEY, a number that is made by the two digit state code (WV) plus a four-digit number originally assigned sequentially (0001). The year column is in order by the year the data are reported to NCES/IMLS so the listed ZIP, state, CITY, and LIBNAME all come from the data for that year. Note that in 1988 the CITY is “Beckley” while in subsequent years it is “BECKLEY.” In other words, in the 1988 data, Beckley is reported in mixed case and in subsequent years it is uppercase. LIBNAME varies, too, as can be seen. Variations CITY and LIBNAME are common and those in ZIP are not that unusual in this dataset as happens, for instance, when a main library is moved to a new building or as a result of errors. These variations are important, particularly if one wants to examine data from the same library or set of libraries over time. How can one isolate the data from one library with changes in its name, location, or its CITY from year to year? As discussed above, being able to analyze libraries with these variations is made easier with a key variable such as the FSCSKEY.

Now to a more typical case from Colorado, sorted again by FSCSKEY and year:

Example 2: A Complex Example from PLDF2
FSCSKEY year ZIP state CITY LIBNAME
CO0001 1987 80421 CO BAILEY PARK CO PL
CO0001 1988 80421 CO Bailey Park Co PL
CO0001 1989 80229 CO Thornton Adams Co PL
CO0001 1990 80229 CO Thornton Adams Co PL
CO0001 1991 80229 CO Thornton Adams Co PL
CO0001 1992 80229 CO Thornton Adams Co Lib Sys
CO0001 1993 80229 CO Thornton Adams Co Lib Sys
CO0001 1994 80229 CO Thornton Adams Co Lib Sys
CO0001 1995 80229 CO THORNTON ADAMS CO LIB SYS
CO0001 1996 80229 CO Thornton Adams Co Lib Sys
CO0001 1997 80229 CO THORNTON ADAMS CO LIB SYS
CO0001 1998 80229 CO THORNTON ADAMS CO LIB SYS
CO0001 1999 80229 CO THORNTON ADAMS COUNTY LIBRARY SYSTEM
CO0001 2000 80229 CO THORNTON ADAMS COUNTY LIBRARY SYSTEM
CO0001 2001 80229 CO THORNTON ADAMS COUNTY LIBRARY SYSTEM
CO0001 2002 80229 CO THORNTON ADAMS COUNTY LIBRARY SYSTEM
CO0001 2003 80229 CO THORNTON ADAMS COUNTY LIBRARY SYSTEM
CO0001 2004 80229 CO THORNTON RANGEVIEW LIBRARY DISTRICT
CO0001 2005 80229 CO THORNTON RANGEVIEW LIBRARY DISTRICT
CO0001 2006 80229 CO THORNTON RANGEVIEW LIBRARY DISTRICT
CO0001 2007 80234 CO NORTHGLEN RANGEVIEW LIBRARY DISTRICT
CO0001 2008 80234 CO NORTHGLEN RANGEVIEW LIBRARY DISTRICT
CO0001 2009 80234 CO NORTHGLEN RANGEVIEW LIBRARY DISTRICT

Here we have selected information on the libraries from 1987-2009 with FSCSKEY CO0001 in Colorado. Note first that 1987 and 1988 appear to be for a different library...but are they? In the next years, there are changes in CITY, LIBNAME, ZIP, and the name of the library. Are we talking about the same library in 1987-88 as we are with the rest of the years?

What happened in FY 2004 with the name change and in 2007 with the change in CITY and ZIP could be just another change in these external details about this library but without affecting the collections, staff, or the library's users. Will data from CO0001 for these years be from the same library? From the West Virginia example, we know we can expect some variation for any given library in these reported variables but how much is too much? Let's look at another example from Colorado to probe the question further:

Example 3: Another Complex Example from PLDF2
FSCSKEY year ZIP state CITY LIBNAME
CO0094 1987 81502 CO GRAND JUNCTION MESA CO PL
CO0094 1988 80027 CO Louisville Louisville PL
CO0094 1989 80421 CO Bailey Park Co PL
CO0094 1990 80421 CO Bailey Park Co PL
CO0094 1991 80421 CO Bailey Park Co PL
CO0094 1992 80421 CO Bailey Park Co PL
CO0094 1993 80421 CO Bailey Park Co PL
CO0094 1994 80421 CO Bailey Park Co PL
CO0094 1995 80421 CO BAILEY PARK CO PL
CO0094 1996 80421 CO Bailey Park Co PL
CO0094 1997 80421 CO BAILEY PARK CO PL
CO0094 1998 80421 CO BAILEY PARK CO PL
CO0094 1999 80421 CO BAILEY PARK COUNTY PUBLIC LIBRARY
CO0094 2000 80421 CO BAILEY PARK COUNTY PUBLIC LIBRARY
CO0094 2001 80421 CO BAILEY PARK COUNTY PUBLIC LIBRARY
CO0094 2002 80421 CO BAILEY PARK COUNTY PUBLIC LIBRARY
CO0094 2003 80421 CO BAILEY PARK COUNTY PUBLIC LIBRARY
CO0094 2004 80421 CO BAILEY PARK COUNTY PUBLIC LIBRARY
CO0094 2005 80440 CO FAIRPLAY PARK COUNTY PUBLIC LIBRARY
CO0094 2006 80440 CO FAIRPLAY PARK COUNTY PUBLIC LIBRARY
CO0094 2007 80440 CO FAIRPLAY PARK COUNTY PUBLIC LIBRARY
CO0094 2008 80440 CO FAIRPLAY PARK COUNTY PUBLIC LIBRARY
CO0094 2009 80440 CO FAIRPLAY PARK COUNTY PUBLIC LIBRARY

It seems clear from this third example that the data for CO0001 for 1987 and 1988 should be with the other data from Bailey in CO0094. As it happens, the Rangeview Library District in CO0001 is the new name for the Adams County Library District so it is in the right place. But what of Grand Junction and Louisville? Alas, the data from each are located in other FSCSKEYs so we have a Gordian Knot of intertwined libraries and data that have to be untangled. This is what PLDF3 attempts to do: to join the data from one library for one year with the data for that same library in other years. In the early years of the data FSCSKEYs were frequently reassigned. How can we look at trends for one library or a set of libraries, given this condition of the data?

Enter “newkey”

The central change in PLDF3 is the addition of the variable newkey which attempts to create a unique key for each library that is consistent through time. Consider the next example table for newkey CO0001:

Example 4: Example with newkey
newkey FSCSKEY year ZIP state CITY LIBNAME
CO0001 CO0028 1987 80233 CO NORTHGLENN ADAMS CO PL
CO0001 CO0048 1988 80233 CO Northglenn Adams Co PL
CO0001 CO0001 1989 80229 CO Thornton Adams Co PL
CO0001 CO0001 1990 80229 CO Thornton Adams Co PL
CO0001 CO0001 1991 80229 CO Thornton Adams Co PL
CO0001 CO0001 1992 80229 CO Thornton Adams Co Lib Sys
CO0001 CO0001 1993 80229 CO Thornton Adams Co Lib Sys
CO0001 CO0001 1994 80229 CO Thornton Adams Co Lib Sys
CO0001 CO0001 1995 80229 CO THORNTON ADAMS CO LIB SYS
CO0001 CO0001 1996 80229 CO Thornton Adams Co Lib Sys
CO0001 CO0001 1997 80229 CO THORNTON ADAMS CO LIB SYS
CO0001 CO0001 1998 80229 CO THORNTON ADAMS CO LIB SYS
CO0001 CO0001 1999 80229 CO THORNTON ADAMS COUNTY LIBRARY SYSTEM
CO0001 CO0001 2000 80229 CO THORNTON ADAMS COUNTY LIBRARY SYSTEM
CO0001 CO0001 2001 80229 CO THORNTON ADAMS COUNTY LIBRARY SYSTEM
CO0001 CO0001 2002 80229 CO THORNTON ADAMS COUNTY LIBRARY SYSTEM
CO0001 CO0001 2003 80229 CO THORNTON ADAMS COUNTY LIBRARY SYSTEM
CO0001 CO0001 2004 80229 CO THORNTON RANGEVIEW LIBRARY DISTRICT
CO0001 CO0001 2005 80229 CO THORNTON RANGEVIEW LIBRARY DISTRICT
CO0001 CO0001 2006 80229 CO THORNTON RANGEVIEW LIBRARY DISTRICT
CO0001 CO0001 2007 80234 CO NORTHGLENN RANGEVIEW LIBRARY DISTRICT
CO0001 CO0001 2008 80234 CO NORTHGLENN RANGEVIEW LIBRARY DISTRICT
CO0001 CO0001 2009 80234 CO NORTHGLENN RANGEVIEW LIBRARY DISTRICT

This table is in order by newkey and year. Note that the newkey is the same for all these years while the FSCSKEY (see the second column) changes, as does the CITY. However, we still have data from the Adams County Public Library (or now Rangeview Library District) so, in spite of the change in the city, it still appears we are dealing with the same system. In general, the problems with FSCSKEY appear to be with the early years of the data and not for all states. By 1996 or 1997, FSCSKEY appears to be largely consistent. However, a common occurence in a few states is for a library to miss a few years and reappear with a completely new FSCSKEY in subsequent years. For instance, in the FY 2009 data I have found three such cases.

The short explanation of how newkey is created is by taking the FSCSKEY for the last year the library reported—usually FY 2009 if there is ambiguity, unless the library closed before then—and assigning the value of the FSCSKEY to each institution's record as newkey. That way, users of the latest data can use the newkey for consistent data for earlier years. As mentioned, the FSCSKEY has largely been stable in recent years so changes have only rarely been necessary recently. The longer and more formal explanation is discussed with the “Schedule of Changes” in FSCSKEYs. This step requires programming changes in the data based ultimately on an analysis of all libraries in the dataset. In cases where two libraries are have the same newkey by these two rules, the one that has disappeared is assigned a sequence number above 9000 (for example NE9007). That part of the address space that is largely empty. All such changes are in the schedule of changes where the actual programming code is included by state. Of course, the FSCSKEY is kept in PLDF3.

The creation of this dataset is painstaking work and requires careful attention to detail. The process involves programming to take data from NCES/IMLS and making sometimes numerous emendations to these data and as a result, there will be 56 (the states plus DC, Guam, the Northern Marianas, Palau, Puerto Rico, and the Virgin Islands) base programs, a number of ancillary programs per state, plus summary programs. It is a complex process that takes time and I suspect we will be rooting out the odd error here or there for a time. The complete programs themselves are available to anyone who wants them, although the various state programs have the code that creates the newkeys for each library where it differs from that library's FSCSKEY/year pair.

Now for something completely different

For those who share with me an interest in the vagaries of data, take a look at the curious case of the ZZs. I doubt these anomalies are important but they are certainly diverting. I would welcome an idea of what these are about.


July 11, 2013
Back to the main PLDF3 page
NCLIS 30th Anniversary logoBack to the LDA main page

POPULAR RESOURCES

  • Public Library Statistics & Profiles
    Dive into annual statistics from the Colorado Public Library Annual Report using our interactive tool, results tailored to trustees, and state totals and averages.
  • School Library Impact Studies
    School libraries have a profound impact on student achievement. Explore studies about this topic by LRS and other researchers in our comprehensive guide.
  • Fast Fact Reports
    Looking for a quick rundown of library research? Check out our Fast Facts, which highlight research and statistics about various library topics.

ABOUT

LRS is part of the Colorado State Library, a unit of the Colorado Department of Education. We design and conduct library research for library and education professionals, public officials, and the media to inform practices and assessment needs. We partner with the Library and Information Science program at University of Denver's Morgridge College of Education to provide research fellowships to current MLIS students.

This project is made possible by a grant from the U.S. Institute of Museum and Library Services (IMLS).

Staff & Contact Info