In order to create PLDF3, a series of programs was run on the combined dataset of all year’s data for each state’s data. The data were printed to a file and this file was examined for anomalies—particularly in the library’s name and address. The process is described on the PLDF3 index page and below with a bit more detail.
In effect, a library’s FSCSKEY in the last year of the library’s reporting determines that library’s newkey—with resolution of collisions discussed by methods described below. Next these programs were edited to make changes in the newkey as a result of that examination. The programs were run recursively and the files examined until all anomalies appeared to have been resolved. In some cases, this process was repeated a number of times. When the printout looked correct, it was initially thought to send it to the State Data Coordinators for examination. This idea was not particularly effective because the lists were often long and daunting, and SDCs had more than enough to do. There were no replies to these emails although subsequently I found questions about the odd case here or there often got a reply. After many year-to-year changes in the FSCSKEY in the early years, the lack of systematic assignment of keys has been addressed and anomalies each year are increasingly rare. The two changes in the FY 20
(see Utah and Vermont) are the first in several years. This aspect of these data show yet another of the many incremental improvements in this remarkable series since I first started working with it. It has come a long way.
As discussed previously, the principle followed is that the newkey assigned to each library is identical to the FSCSKEY of the last year of reported data for that library. In recent years, given that there few changes in FSCSKEYs assigned to libraries have been changed, there have been few changes in newkeys each year and those cause small adjustments here and there in the number of libraries reporting by state in earlier years. If you had an earlier copy of the table below, you would find evidence of this effect.
In the case of the states listed below with asterisks, there are no changes in the FSCSKEY from year to year so the newkey is the same as the FSCSKEY. So far, in these states writing the program to create the state’s PLDF3 file has generally been straightforward because there have only been a few cases of ambiguity and one case where untangling of the early years of the FSCSKEY was quite difficult. However, the very complexity of the task with over 276,000 cases has meant that often changes in the FSCSKEY for individual libraries missed in earlier years have been found and adjustments made in the newkey.
The most important variable is newkey
As we saw on the index page, the FSCSKEY is usually stable after about 1996 or 1997 so in those cases, the stable FSCSKEY is also the newkey for that library. If there is ambiguity in the FSCSKEYs, the FSCSKEY from last year of the reported data is that library’s newkey and it is included in the List of Libraries. Otherwise, by definition, the FSCSKEY and newkey should be the same for the year listed here. In the List of Libraries the last year also supplies the ZIP, state, CITY, and LIBNAME. Hence, of the many forms of LIBNAME, CITY, and so forth, one sees in the annual data, the latest of these variables will be what appears in the List. If a LIBNAME changes when a new set of data come out, then it will be changed in that updated List. One characteristic of the CITY and LIBNAME now is the use of upper case. I have been told it is a postal regulation that makes it easier for machines to read. It sure makes it hard for people to read the
The purpose of newkey is to be a unique number for any one library through all years of the data. The newkeys for libraries that die are not reused unless an FSCSKEY is assigned to that key. When two libraries merge, they become a new library and get a different newkey. If they divorce, they get their old newkeys back where possible (and assuming no ambiguity) but there will be years missing for that library during the years it was merged and it is occasionally ambiguous. In some cases, libraries died and the FSCSKEY was reassigned and using the last year of reporting resulted in two different libraries having the same newkey. These kinds of collisions have been resolved by creating a separate newkey for the library that closed or where collusions occur for any other reason starting with XX9????. They are assigned sequentially with ‘XX’ standing for the two character state code and the sequence will usually begin with 9975.
span is a variable that indicates a characteristic of the data for each library. It has three possible values: A indicates the library reports data for All years of the data. Given that states started reporting in different years, A refers to reporting all possible years for that state so the notion of years covered is a function of the first year reporting to FSCS or the Public Libraries Survey (PLS). S libraries report some years. E is a special case of S where the library reports at least the end years of the span of years but are missing years in the middle. A libraries would be useful in measuring behavior of all libraries through the years while A and E libraries would be useful in measuring changes from the first to last years of a span of years. Minnesota did not report data in 2001, as a result, all Minnesota libraries are coded at least with a span of S but many have an E meaning that they reported Some years but two were the end years of the range. For those libraries, one could easily calculate a difference over the range of years.
first year is the first year this library’s state reports data to FSCS or PLS.
last year is the last year this library reports data. Last year is most often 2018 for the current dataset.
The changes documented in the Schedules are primarily to newkey but will be an indicator of deeper changes in the annual data. If interested, I have more detail…
It is a temptation to correct obvious errors that are noted in the printouts and not just the FSCSKEYs. Inspection of the files find clear typographic errors in the demographic variables used in the construction of PLDF3 and, too, there are not-so-obvious errors that the SDCs will be aware of. In addition, there are changes in formats used for the names of systems and libraries. Why not “correct” them? The temptation to correct all these errors must be resisted. The Hippocratic Oath of data is: First Do No Harm. There is no telling what mischief might result from an attempt to correct what is a digital text in an archival sense—these datasets constitute a text worthy of preservation. Creating a superset with new variables as with newkey allows the old variables to be preserved and the new “corrected” (we hope) variables to exist at the same time. Anyone using this edition of these data for analysis may then make such changes as he or she fancies.
Those looking at the code in the Schedules will note that there are cases where changes for CITY for 2002 were commented out. There are two examples where the SDCs agreed that the CITY was wrong in 2002 and it was obvious by inspection. Iowa had three examples, one: LAPORTE CITY was corrected to LA PORTE CITY. The changes were made so that the List of Libraries, which is drawn from the last years of the data, will be as accurate as possible to anyone using it and not ambiguous when checking the data file itself. As I was going over the data in 2003, I decided changing the CITY was not a good idea so I didn’t do subsequently but left the code in as a reminder of the folly of such changes. As a result, these changes have been reversed so that the errors in those files are, again, as they were in the original NCES data.
The Schedules are pasted from the SAS code for each state and are in ASCII for manipulation. The programs themselves are available to anyone who wants to see them.
There are a total number of 285,242 observations in PLDF3 through the FY 2018 data. More information on the actual counts over time can be found in the revision history of PLDF3.
The changes in each program file are in order by the FSCSKEY except for Colorado, Oregon, and Washington which are in order by the CITY. The FSCSKEY is often in order by CITY at least in the early years. In retrospect, I should not have rearranged the data by CITY but left them in the order they were in the data so that my coding could be more easily followed for anyone checking for errors. Live and learn.
How to Read the Schedules of Changes
The programs creating newkey from FSCSKEY first create a dummy newkey from the FSCSKEY and then changes are made in the newkey in the program. The programs are written in SAS and the code changes are what appear in the Schedule of Changes. They are pasted from the programs without editorial changes. They should be readable by people who do not know SAS. Here is an example:
/* OTIS */ if year = 1987 and FSCSKEY = 'CO0071' then newkey = 'CO0091'; if year = 1988 and FSCSKEY = 'CO0102' then newkey = 'CO0091';
These are the changes for Otis which were used as an example from the List of Libraries. The “/*” and “*/” indicate comments and are used here as an easy means of finding lines of code related to different libraries. These comments, typically, have the CITY. The first line indicates that SAS is to look for any case where the year is 1987 and the FSCSKEY is CO0071. In those cases (and this is only one where those two conditions hold), change the newkey to CO0091 from the dummy CO0071 that was used to seed this variable. The last year of Otis was 1989 as noted. And there was no other library that was using that FSCSKEY in a later year so it is used as the newkey. In 1988, Otis had the FSCSKEY of CO0102 and this, too, was changed to CO0091. The semicolon is a SAS end of command indicator.
The Schedules of Changes are available as text files by state. Most of the state Schedules are in order by FSCSKEYs and years changed in the code—often in pairs, that is, in order by the sort of FSCSKEY, then year and often there were pairs of changes in different FSCSKEYs. This method gives the changes no apparent order but they are in order by the data. In a handful of cases, I sorted the libraries by CITY, as mentioned, thinking this would make things easier to find. I regret having done that because rather than making changes easier to find, it has made them harder to find when you do have the data. The most likely use of these files is for people to check my work and sorting by CITY cannot make this process easier. My apologies to those who follow.
Observations in PLDF3
|District of Columbia *||31|
|Northern Mariana Islands*||20|
|Virgin Islands *||16|
|West Virginia *||3,012|
* indicates states with no changes.
** Minnesota did not report in 2001.