Collection Group Designation
CGD | Shareable | Retention Commitment | Notes |
Committed | Shareable | Yes |
|
Shared | Shareable | Yes | |
Open | Shareable | No |
|
Uncommittable | Shareable | No |
|
Private | Not shareable | No |
|
Ongoing Matching Algorithm
The ongoing matching algorithm is run on items with the same Material Type when:
- new items are accessioned to SCSB with a CGD of “Shared”
- existing items are updated using the Submit Collection API to a CGD of “Shared”
The new/updated item is compared to all the existing items in the database with a CGD of Shared or Committed.
When two Bibs match,
- The first check is to identify if either of the matches has undergone Initial Matching, if so the one which went under Initial Matching will remain Shared and the rest are set to Open.
- If neither Bib underwent Initial Matching Algorithm then the item that was accessioned earlier will retain the Shared status, while the newer item is set to Open.
Bibs are considered a “Multi Match” when at least two of the following control numbers match:
- ISBN
- ISSN
- LCCN
- OCLC
In the case of a “Multi Match” the title field is not compared.
Bibs are considered a “Single Match” when only one control number matches and the first 4 words of the title matches.
Normalization
- ISBN, ISSN, LCCN, OCLC - non-numeric characters are removed
- Title - diacritics are removed and the case is ignored
Title Match Comparison
- Field <245> <a> is considered for Title Match Comparison. Diacritics and the blank spaces are removed from the title. All titles are converted to lower case before comparison.
Exceptions
- Matching copies from a single institution are not compared. It is up to the submitting institutions to only submit one copy of matching items to SCSB as “Shared,” and the rest as “Open.”
- When two items have a single number match but having different bib levels; one is a monograph and the other a serial. That is when the material type does not match, the matching algorithm is not applied.
- When the CGD of an item is changed manually to Shared using the SCSB UI, the matching algorithm is not run on the item. This means that there could be two items with the Shared CGD.
Reports
The following reports are generated after each run of the matching algorithm:
- Matching Summary Report
- Matching Serial MVM Report
- Title Exception Report
- CGD Round Trip Report
Matching Summary Report
- Example file name: MatchingSummaryReport-27Jul2021080127.csv
- Columns: Institution, Total Bibs, Total Items, Shared Items Before Matching, Shared Items After Matching, Difference of Shared Items, Open Items Before Matching, Open Items After Matching, Difference of Open Items
- Rows: PUL, CUL, NYPL, HL
- Example data:
Matching Serial MVM Report
- Includes items that were changed from Shared to Open for Serial and MVM material types
- All items that match are changed from Shared to Open
- Example file name: MatchingSerialMvmReport-27Jul2021080114.csv
- Columns: OwningInstitutionId, Title, Summary Holdings, Volume Part Year, Use Restriction, BibId, OwningInstitutionBibId, Barcode
Example data:
Title Exception Report
- Items that match only one number and not the first four words of the normalized title will be included in this report. Previously, these items were considered a match, and the CGD was affected, but as of v4.3 (July 2021), they are not considered matched.
- Columns: OwningInstitution, BibId, OwningInstitutionBibId, MaterialType, OCLCNumber, ISBN, ISSN, LCCN, Title1, Title2, Title3, Title4, Title5, Title6, Title7, Title8, Title9, Title10, Title11, Title12, Title13, Title14, Title15, Title16, Title17, Title18, Title19
Example data:
CGD Round Trip Report
- A report will be created if any item’s CGD is changed by the matching algorithm.
- All the items with a change to the CGD will be included in the report.
- The report will be written to the SCSB AWS S3 bucket.
- The report is institution-specific and will be put into the corresponding directory for that report and institution.
- The directory in the S3 bucket will be:
- reports/cgd-round-trip/<institution>/
- The name of the report will be CGD_RoundTripReport_<timestamp>.csv
- ex: CGD_RoundTripReport_20210322_185905.csv
- Columns: Item Barcode, Old CGD, CGD, Date of Action
Example data:
Prior to v4.3
- The matching algorithm will no longer consider Use Restrictions as of v4.3 (July 2021) and beyond.
- The matching algorithm will no longer consider two items to be a match when only one control number matches.
Note / Submit Collection API
When items are updated with the Submit Collection API, the CGD in SCSB is not updated when the data files are in the “cgd_protection” folder but are updated if they are in the “cgd_no_protection” folder.
The technical Documentation for matching algorithm - Technical Documentation for Matching Algorithm