Ontario Public Library Statistics
Ontario Public Library Statistics Documentation, 2021
Robert E. Molyneux
This document discusses three spreadsheets that comprise the Ontario Public Libraries’ Statistics (OPLS) dataset and related matters. This dataset reports data from 8,357 libraries published from 2000-2021. “Library” is used here to mean the data for one library for one year. The file is a flat file presented here with two related files as Excel spreadsheets. 480 Ontario libraries have ever reported data that are included in this dataset.
The data are organized in three spreadsheets. The actual library data are in a .csv files and worked with LibreOffice Calc as ods files and saved in .csv..The dataset, OPLS22.csv is left in .csv format while the other two, which have calculations that generated the summary numbers reported here, are in .xlsx (that is, Excel) and .ods (created by Libre Office Calc.) formats. The “22” in the file name indicates it has 22 years of data.
The .csv format is a general one for data and can be read into any spreadsheet program (including Excel or LibreOffice Calc) or by statistical utilities such as R, SAS, SPSS, or Stata. However, I have found that once a file goes into Excel, it is sometimes difficult to get it out and generate a clean .csv file.
The dataset was created by recompiling a longitudinal file of separate annual releases of survey data on Ontario’s public libraries compiled by the Ontario’s Ministry of Tourism, Culture and Sport—as it is currently known. The data were published under an Open Government Licence and available in separate annual spreadsheets via the Ontario Data Catalogue. The OPLS dataset is a superset of the Ministry’s data. That is, the dataset is designed to have the data as published—or corrected by the Ministry—but added with variables necessary to aid the recompilation as a longitudinal file or its use. I note that I am not sure that I have caught all Ministry revisions or corrections. There are two exceptions noted at the end of Section 3 below where the spreadsheet of Ontario libraries ever reporting is discussed. The original data were well organized making the recompilation relatively painless as such things go.
Please report any errors or missed corrections to Bob Molyneux (drdata@molyneux.com). Comments are also welcome.
There are three files with data and after these are listed immediately below. Following this list, anomalies within the data are discussed, then where the population data come from, and last, links to a related series of publications about Ontario’s libraries.
The data files:
- OPLS22.csv presents 22 years of the Ontario public library data data that make up the OPLS dataset. It has 8,357 rows of data with one a title row. There are 441 columns—one for each variable ever reported. The raw data began in 1999 but those early data are sufficiently different from what followed that it was decided to recompile the data beginning in 2000 when data practices had settled down. Those practices have proved consistent since then. There are still data in 1999 awaiting our attention.
- The Ontario variables ever reported are available in ..xlsx and .ods formats. Each lists the short variable names as used in OPLS22.csv but including the longer versions of the variable names from the Ministry. These long variable names are based, generally, on the question that was asked in the survey. For example:
B2.6 –Self Generated Revenue (e.g. fines, fees, sales/fundraising, room rentals, cafe revenue, etc.)
The data for this example variable are in the column in OPLS22.csv labeled SELFGEN. The shorter file names are much easier to deal with in programming but create their own problem: how to map SELFGEN to B2.6 –Self Generated Revenue...? This variables spreadsheet has the short variable names in columns by year to indicate which years have which variables reported. In the right-hand column, the longer name based on the question and/or the column head in the Ministry’s spreadsheet. The right-hand column is dealt with again below. The order of these variables in the variables spreadsheet follows the order of the questions in the survey instrument. The data .csv file does not use the same order because I have put what I thought were the major variables for sorting and identification were moved to the early columns. The rest follow in the order of the survey instrument.
Where did these short variable names come from? Two places:
First, those variables which seemed similar to those found in the US public library data received US variable names in order to ease comparative library analysis. Generally, they are understandable. For example, TOTCIR is where the reported “F1.0 - Total Annual Circulation (Actual Annual Direct Circulation)” will be found. The resident population has an odd short variable name: POPU_LSA (population of the legal service area) because the US has two methods of counting resident population of libraries depending on the practices of the several states and this one most closely resembles the number found in “P1.1 – Resident Population Served.”
Those variables that did not seem to match US variables were named with an attempt to be descriptive and memorable. There are about 250 more variables reported in the Ontario Public Library Statistics dataset than those reported in the US public library data series. This difference is one of the notable strengths of these data and is discussed in an article by Stephen Abram and me that appeared in Public Library Quarterly1. An attempt has been made to use mnemonic devices. Variables beginning with “DWN” related to downloadable content; “PR“ begins the variable names related to programs; those beginning with “TW” are “typical week” data.
A few variables changed over the years. For instance there were two ways of describing the Ontario Library System’s Regions: short name from 2000-2010 (SOLS) and long name beginning in 2011 (Southern Ontario Library Service).
There are two different—but similar—key variables. A key variable is one used in digital files or databases to identify a library (in this case) for analysis. The key variable must be unique and unchanging over the span of years in spite of library reorganizations or name changes. For example, t he data for Addington Highlands Twp are found in the LIBID “5” and the LIBNUMB “L0005.” This change also occurred in 2011. What was done in both of these cases and similar ones was to add these variables for the years they didn’t exist. That is, SOLS was added to the data for 2011+ and Southern Ontario Library Service was added from 2000-2010. Similarly the key variables were added for the years when they weren’t published. These added variable names are in red in this table. This change in the urtext was done as a convenience to analysts. I have misgivings about such monkeying around with a text but I am an analyst, too, and I worked on these data in order to analyze them. It is my belief that by maintaining a core of the data as published, my additions will not interfere with the data. To that end, newkey is another added variable.
This variable was a necessary creation in the longitudinal recompilation of the US public library data (PLDF3). The OPLS dataset has two key variables but there is no case (but for the two “Anomalies” listed below) where any library had more than one key. There are cases in the US data where a library had five during the run of data from 1987 to 2019. It is a very difficult subject and the documentation of the creation of newkey is probably the most boring thing I ever wrote but the result is that with the US data, the only workable key variable is newkey and I added it to the OPLS dataset for a specific analytical purpose: to provide a ready means to compare Ontario and US libraries. An example would be ON0005 for Addington Heights Twp. Newkeys have a two-digit alpha (based on state or province) plus a four-digit sequence key. Each refers to only one library. As a result, Stephen Abram and I were able to do the previously mentioned preliminary comparative analysis of the data from libraries in Ontario and the US. This article was the genesis of adding the 2021 Ontario data to the then dormant Ontario Public Library Statistics. A second projected article will require the 2022 data be added to the OPLS data.
The entries in this right-hand column (“Data survey questions”) have been edited for clarity given the fact that many of the original column headers are ambiguous because they are in that place without the context of the questionnaires when they appear in the annual spreadsheet. This is tricky business and one does not like to alter an original text but such action was necessary because of ambiguity of replies in the annual spreadsheets. For instance, the original Ministry spreadsheets have multiple variables named, simply Y or N. Yes or no...what? There are a number of these in the original spreadsheets which is a source of ambiguity. The order of the original spreadsheets is based on the various survey forms so that it was possible to annotate these Y or Ns. For example, H1.3.1.T was listed as “Y” in the original csv files but here appears as:
H1.3.1.T - Yes (Social Media?)
==> that is: Yes, the library does have a social media presence.
H1.4.1. F - No (Social Media?)
==> that is: No, the library does not have a social media presence.
I think it an odd way to construct this question. It would be more economical and less ambiguous to ask: Does your library have a social media presence and have the answer be Y or N. Then one answer and not two. As the years go by, the number of libraries not having a social media presence will decline and one day, I expect, the answer will be Y for all. This is a well-organized data collection effort so I may well be missing something.
In any case, the survey instrument titles H1.3 as Social Media. H1.3.1 has the header: "Social Media" then lists as set of social media types with a check box. Over time, the survey instruments maintain their organizational structure but take account changes in technology. As the years roll by, readers of this spreadsheet will see that changes in social media sources changes as the fortunes of companies change. Similarly, as technology changes, the Ministry adapts the survey impressively quickly to changes in technology. That is another strength of this series.
By looking at the spreadsheet, the reader will see which variables were reported which years.
- Ontario_Libraries_reporting_2000-2021 in .xlsx and in Ontario_Libraries_reporting_2000-2021.ods
This spreadsheet is organized similarly to the variables spreadsheet: columns by year with the libraries reporting. Total reporting in red above. The count comes from summing the year columns and then dividing by that year’s number so be careful because changing a libraries designation will change the count. I have copied this row but without the formulas to a row above the one with the calculated numbers so it is not a calculated number. This row is labelled “Not Dynamic.” The total currently sums the numbers in the dynamic row. The count of libraries reporting comes from summing the counts for the years in cell CR6 (the last row with data.) The number in CR6 is the sum of the 1’s in the CR column. If a library ever reported, it gets a 1 so the count now is 480 libraries ever reporting.
Note the use of the LIBID. As previously discussed, that is the variable name assigned to the key variable before the change in 2011. LIBNUMB was the variable assigned to the 2011+ variable name. They are equivalent effectively and are used to order these libraries. This is a live spreadsheet so the libraries could be sorted alphabetically if necessary.
Miscellany
- Library Reporting Anomalies
In this series, there were only two apparent errors where a library appeared in two places in a year’s data. West Grey has reported each of the 21 years in the Ministry’s data in Library ID 336 (L0336). In 2007, there were two entries. One in 336 for West Grey and another in 494 for West Grey Twp. The latter entry has the apparently preferred name but aside from the address and such data, the rest of this entry is zeros. In the OPLS dataset, I have deleted the entry at 494.
Gauthier reported from 2000-2008, then 2010 in LIBID 161. In 2011, it was listed in 1096. In the OPLS dataset, I moved the 2011 entry to 161.
- Data anomalies
These were noted in the process of working with the data. I have made no systematic search for such things.
Sheshegwaning FN (L0419) reported total expenditures (TOTOPEXP) of $1,881,740. In library data, almost anything is possible but this value seems unlikely. Averaging the values for 2005 and 2007 gives just under $26,000.
Municipality of Clarington (L0133) in 2012 reported a resident population of 8,861. The figures for 2011 and 2013 are over 89,000.
- Source of population data
Where does “P1.1 - Resident Population Served” come from?
Adam Haviaras, Culture Services Advisor / Conseiller en services culturels, at the Ministry on August 3, 2021 said:
The population and household data we use for the statistics comes from Municipal Affairs’ FIR (Financial Information Return) data which that ministry tries to collect from all municipalities annually. So, the data we receive is submitted by the actual municipalities.
However, as Municipal Affairs pointed out to me last year, they had population and household data ranging from 2017 to 2019 as many municipalities did not submit their data on time.
We use the Provincial FIR data because it is updated annually, vs the census data which is gathered every few years. However, the last few years, there have been many municipalities that have not submitted the latest population and household data. We only have what is submitted to work with.
Other information about Ontario Public Libraries
Public Libraries in Ontario, 1882-1920 by Lorne D. Bruce. This link goes to the publication cited but the link includes a number of articles on aspects of Ontario Public Libraries particularly their history. There is a great deal of information here.
Notes