Introduction
In my previous post in this series, ArcaneBooks – Library of Congress Control Number (LCCN) – An Overview, I provided an overview of the LCCN and the basics of calling its public web API to retrieve data based on the LCCN.
In this post I will demonstrate how to call the API and dissect the data using PowerShell. This will be a code intensive post.
You can find the full ArcaneBooks project on my GitHub site. Please note as of the writing of this post the project is still in development.
The code examples for this post can be located at https://github.com/arcanecode/ArcaneBooks/tree/main/Blog_Posts/005.00_LCCN_API. It contains the script that we’ll be dissecting here.
XML from Library of Congress
For this demo, we’ll be using an LCCN of 54-9698
, Elements of radio servicing by William Marcus. When we call the web API URL in our web browser, we get the following data.
<zs:searchRetrieveResponse xmlns:zs="http://docs.oasis-open.org/ns/search-ws/sruResponse">
<zs:numberOfRecords>2</zs:numberOfRecords>
<zs:records>
<zs:record>
<zs:recordSchema>mods</zs:recordSchema>
<zs:recordXMLEscaping>xml</zs:recordXMLEscaping>
<zs:recordData>
<mods xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.loc.gov/mods/v3" version="3.8" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-8.xsd">
<titleInfo>
<title>Elements of radio servicing</title>
</titleInfo>
<name type="personal" usage="primary">
<namePart>Marcus, William. [from old catalog]</namePart>
</name>
<name type="personal">
<namePart>Levy, Alex,</namePart>
<role>
<roleTerm type="text">joint author</roleTerm>
</role>
</name>
<typeOfResource>text</typeOfResource>
<originInfo>
<place>
<placeTerm type="code" authority="marccountry">nyu</placeTerm>
</place>
<dateIssued encoding="marc">1955</dateIssued>
<issuance>monographic</issuance>
<place>
<placeTerm type="text">New York</placeTerm>
</place>
<agent>
<namePart>McGraw Hill</namePart>
</agent>
<dateIssued>[1955]</dateIssued>
<edition>2d ed.</edition>
</originInfo>
<language>
<languageTerm authority="iso639-2b" type="code">eng</languageTerm>
</language>
<physicalDescription>
<form authority="marcform">print</form>
<extent>566 p. illus. 24 cm.</extent>
</physicalDescription>
<subject authority="lcsh">
<topic>Radio</topic>
<topic>Repairing. [from old catalog]</topic>
</subject>
<classification authority="lcc">TK6553 .M298 1955</classification>
<identifier type="lccn">54009698</identifier>
<recordInfo>
<recordContentSource authority="marcorg">DLC</recordContentSource>
<recordCreationDate encoding="marc">820525</recordCreationDate>
<recordChangeDate encoding="iso8601">20040824072855.0</recordChangeDate>
<recordIdentifier>6046000</recordIdentifier>
<recordOrigin>Converted from MARCXML to MODS version 3.8 using MARC21slim2MODS3-8_XSLT1-0.xsl (Revision 1.172 20230208)</recordOrigin>
</recordInfo>
</mods>
</zs:recordData>
<zs:recordPosition>1</zs:recordPosition>
</zs:record>
</zs:records>
<zs:nextRecordPosition>2</zs:nextRecordPosition>
<zs:echoedSearchRetrieveRequest>
<zs:version>2.0</zs:version>
<zs:query>bath.lccn=54009698</zs:query>
<zs:maximumRecords>1</zs:maximumRecords>
<zs:recordXMLEscaping>xml</zs:recordXMLEscaping>
<zs:recordSchema>mods</zs:recordSchema>
</zs:echoedSearchRetrieveRequest>
<zs:diagnostics xmlns:diag="http://docs.oasis-open.org/ns/search-ws/diagnostic">
<diag:diagnostic>
<diag:uri>info:srw/diagnostic/1/5</diag:uri>
<diag:details>2.0</diag:details>
<diag:message>Unsupported version</diag:message>
</diag:diagnostic>
</zs:diagnostics>
</zs:searchRetrieveResponse>
Let’s see how to retrieve this data then parse it using PowerShell.
Parsing LCCN Data
First, we’ll start by setting the LCCN in a variable. This is the LCCN for "Elements of radio servicing" by William Marcus
$LCCN = '54-9698'
To pass in the LCCN to the web API, we need to remove any dashes or spaces.
$lccnCleaned = $LCCN.Replace('-', '').Replace(' ', '')
After 2001 the LCCN started using a four digit year. By that time however, books were already printing the ISBN instead of the LCCN. For those books we’ll be using the ISBN, so for this module we can safely assume the LCCNs we are receiving only have a two digit year.
With that said, we’ll use the following code to extract the two digit year.
$lccnPrefix = $lccnCleaned.Substring(0,2)
Since digits 0 and 1 are the year, we’ll start getting the rest of the LCCN at the third digit, which is in position 2 and go to the end of the string, getting the characters.
Next, the API requires the remaining part of the LCCN must be six digits. So we’ll use the PadLeft
method to put 0’s in front to make it six digits.
$lccnPadded = $lccnCleaned.Substring(2).PadLeft(6, '0')
Now combine the reformatted LCCN and save it to a variable.
$lccnFormatted ="$($lccnPrefix)$($lccnPadded)"
Now we’ll combine all the parts to create the URL needed to call the web API.
$baseURL = "http://lx2.loc.gov:210/lcdb?version=3&operation=searchRetrieve&query=bath.lccn="
$urlParams = "&maximumRecords=1&recordSchema=mods"
$url = "$($baseURL)$($lccnFormatted)$($urlParams)"
It’s time now to get the LCCN data from the Library of Congress site. We’ll wrap it in a try/catch
so in case the call fails, for example from the internet going down, it will provide a message and exit.
Note at the end of the Write-Host
line we use the PowerShell line continuation character of ` (a single backtick) so we can put the foreground color on the next line, making the code a bit more readable.
try {
$bookData = Invoke-RestMethod $url
}
catch {
Write-Host "Failed to retrieve LCCN $LCCN. Possible internet connection issue. Script exiting." `
-ForegroundColor Red
# If there's an error, quit running the script
exit
}
Now we need to see if the book was found in the archive. If not the title will be null. We’ll use an if
to check to see if the LCCN was found in their database. If not, the title property will be null. If so we display a message to that effect.
If it was found, we fall through into the else
clause to process the data. The remaining code resides within the else
.
# We let the user know, and skip the rest of the script
if ($null -eq $bookData.searchRetrieveResponse.records.record.recordData.mods.titleInfo.title)
{
Write-Host = "Retrieving LCCN $LCCN returned no data. The book was not found."
}
else # Great, the book was found, assign the data to variables
{
To get the data, we start at the root object, $bookData
. The main node in the returned XML is searchRetrieveResponse
. From here we can use standard dot notation to work our way down the XML tree to get the properties we want.
Our first entry gets the Library of Congress Number. The syntax is a little odd. If we walk XML tree, we find this stored in:
<identifier type="lccn">54009698</identifier>
If we display the identifier property using this code:
$bookData.searchRetrieveResponse.records.record.recordData.mods.identifier
We get this result.
type #text
---- -----
lccn 54009698
The LCCN we want is stored in the property named #text
. But #text
isn’t a valid property name in PowerShell. We can still use it though if we wrap the name in quotes.
$LibraryOfCongressNumber = $bookData.searchRetrieveResponse.records.record.recordData.mods.identifier.'#text'
From here we can process other properties that are easy to access.
$Title = $bookData.searchRetrieveResponse.records.record.recordData.mods.titleInfo.title
$PublishDate = $bookData.searchRetrieveResponse.records.record.recordData.mods.originInfo.dateIssued.'#text'
$LibraryOfCongressClassification = $bookData.searchRetrieveResponse.records.record.recordData.mods.classification.'#text'
$Description = $bookData.searchRetrieveResponse.records.record.recordData.mods.physicalDescription.extent
$Edition = $bookData.searchRetrieveResponse.records.record.recordData.mods.originInfo.edition
Now we get to the section where an XML property can contain one or more values.
Books can have multiple authors, each is returned in its own item in an array. One example is the book subjects. Here is a sample of the XML:
<subject authority="lcsh">
<topic>Radio</topic>
<topic>Repairing. [from old catalog]</topic>
</subject>
As you can see, this has two topics. What we need to do is retrieve the root, in this case subject
, then loop over each item.
For our purposes we don’t need them individually, a single string will do. So in the PowerShell we’ll create a new object of type StringBuilder
. For more information on how to use StringBuilder, see my post Fun With PowerShell – StringBuilder.
In the loop if the variable used to hold the string builder is empty, we’ll just add the first item. If it’s not empty, we’ll append a comma, then append the next value.
$authors = [System.Text.StringBuilder]::new()
foreach ($a in $bookData.searchRetrieveResponse.records.record.recordData.mods.name)
{
if ($a.Length -gt 1)
{ [void]$authors.Append(", $($a.namePart)") }
else
{ [void]$authors.Append($a.namePart) }
}
$Author = $authors.ToString()
As a final step we used the ToString
method to convert the data in the string builder back to a normal string and store it in the $Author
variable.
From here, we’ll repeat this logic for several other items that can hold multiple values. The books subjects is one example.
$subjects = [System.Text.StringBuilder]::new()
$topics = $bookData.searchRetrieveResponse.records.record.recordData.mods.subject | Select topic
foreach ($s in $topics.topic)
{
if ($subjects.Length -gt 1)
{ [void]$subjects.Append(", $($s)") }
else
{ [void]$subjects.Append($s) }
}
$Subject = $subjects.ToString()
A book could have multiple publishers over time. The author could shift to a new publisher, or more likely a publishing house could be purchased and the new owners name used. The data is returned as an array, so combine them as we did with authors and subjects.
Note that in the returned data, the publisher is stored as an "agent". We’ll use the name Publisher to keep it consistent with the ISBN data.
$thePublishers = [System.Text.StringBuilder]::new()
foreach ($p in $bookData.searchRetrieveResponse.records.record.recordData.mods.originInfo.agent)
{
if ($thePublishers.Length -gt 1)
{ [void]$thePublishers.Append(", $($p.namePart)") }
else
{ [void]$thePublishers.Append($p.namePart) }
}
$Publishers = $thePublishers.ToString()
Since there could be multiple publishers, logically there could be multiple publishing locations. This section will combine them to a single location.
$locations = [System.Text.StringBuilder]::new()
foreach ($l in $bookData.searchRetrieveResponse.records.record.recordData.mods.originInfo.place.placeTerm)
{
if ($locations.Length -gt 1)
{ [void]$locations.Append(", $($l.'#text')") }
else
{ [void]$locations.Append($l.'#text') }
}
$PublisherLocation = $locations.ToString()
All done! We’ll give a success message to let the user know.
Write-Host "Successfully retrieved data for LCCN $LCCN" -ForegroundColor Green
Finally, we’ll display the results. Note some fields may not have data, that’s fairly normal. The Library of Congress only has the data provided by the publisher. In addition some of the LCCN data dates back many decades, so the data supplied in the 1940’s may be different than what is supplied today.
"LCCN: $LCCN"
"Formatted LCCN: $lccnFormatted"
"Library Of Congress Number: $LibraryOfCongressNumber"
"Title: $Title"
"Publish Date: $PublishDate"
"Library Of Congress Classification: $LibraryOfCongressClassification"
"Description: $Description"
"Edition: $Edition"
"Author: $Author"
"Subject: $Subject"
"Publishers: $Publishers"
"Publisher Location: $PublisherLocation"
}
The Result
Here is the result of the above code.
LCCN: 54-9698
Formatted LCCN: 54009698
Library Of Congress Number: 54009698
Title: Elements of radio servicing
Publish Date: 1955
Library Of Congress Classification: TK6553 .M298 1955
Description: 566 p. illus. 24 cm.
Edition: 2d ed.
Author: Marcus, William. [from old catalog], Levy, Alex,
Subject: Radio, Repairing. [from old catalog]
Publishers: McGraw Hill
Publisher Location: nyu, New York
As you can see it returned a full dataset. Not all books my have data for all the fields, but this one had the full details on record with the Library of Congress.
See Also
This section has links to other blog posts or websites that you may find helpful.
The ArcaneBooks Project – An Introduction
ArcaneBooks – ISBN Overview, PowerShell, and the Simple OpenLibrary ISBN API
ArcaneBooks – PowerShell and the Advanced OpenLibrary ISBN API
ArcaneBooks – Library of Congress Control Number (LCCN) – An Overview
Fun With PowerShell – StringBuilder
The GitHub Site for ArcaneBooks
Conclusion
In this document we covered the basics of the LCCN as well as the web API provided by the Library of Congress. Understanding this information is important when we integrate the call into our PowerShell code.