Collection Group Designation
CGD | Shareable | Retention Commitment | Notes |
Committed | Shareable | Yes |
|
Shared | Shareable | Yes | |
Open | Shareable | No |
|
Uncommittable | Shareable | No |
|
Private | Not shareable | No |
|
Ongoing Matching Algorithm
The ongoing matching algorithm is run on items with the same Material Type when:
- new items are accessioned to SCSB with a CGD of “Shared”
- existing items are updated to a CGD of “Shared” and in the “cgd_no_protection” folder
The new/updated item is compared to all the existing items in the database with a CGD of Shared or Committed.
When two items match, the item that was accessioned earlier will retain the Shared status, while the newer item will be bumped to Open.
Items are considered a “match” when two of the following points match:
- ISBN
- ISSN
- LCCN
- OCLC
- Title (first 4 words)
Normalization
- ISBN, ISSN, LCCN, OCLC - non numeric characters are removed
- Title - diacritics are removed
Exceptions
- Matching copies from a single institution are not compared. It is up to the submitting institutions to only submit one copy of matching items to SCSB as “Shared,” and the rest as “Open.”
- When two items have a single number match but having different bib levels; one is a monograph and the other a serial. That is when the material type does not match, the matching algorithm is not applied.
- When the CGD of an item is changed manually to Shared using the SCSB UI, the matching algorithm is not run on the item. This means that there could be two items with the Shared CGD.
Reports
The following reports are generated after each run of the matching algorithm:
- Matching Summary Report
- Matching Serial MVM Report
- Title Exception Report
- CGD Round Trip Report
Matching Summary Report
- Example file name: MatchingSummaryReport-27Jul2021080127.csv
- Columns: Institution, Total Bibs, Total Items, Shared Items Before Matching, Shared Items After Matching, Difference of Shared Items, Open Items Before Matching, Open Items After Matching, Difference of Open Items
- Rows:PUL, CUL, NYPL, HL
- Example data:
Matching Serial MVM Report
- Includes items that were changed from Shared to Open for Serial and MVM material types
- All items that match are changed from Shared to Open
- Example file name: MatchingSerialMvmReport-27Jul2021080114.csv
- Columns: OwningInstitutionId, Title, Summary Holdings, Volume Part Year, Use Restriction, BibId, OwningInstitutionBibId, Barcode
- Example data:
Title Exception Report
- Items that match on only one number and not the first four words of the normalized title will be included in this report. Previously, these items were considered a match and the CGD was affected, but as of v4.3 (July 2021), they are not considered matched.
- Columns: OwningInstitution, BibId, OwningInstitutionBibId, MaterialType, OCLCNumber, ISBN, ISSN, LCCN, Title1, Title2, Title3, Title4, Title5, Title6, Title7, Title8, Title9, Title10, Title11, Title12, Title13, Title14, Title15, Title16, Title17, Title18, Title19
- Example data:
CGD Round Trip Report
- A report will be created if any item’s CGD is changed by the matching algorithm.
- All the items with a change to the CGD will be included in the report.
- The report will be written to the SCSB AWS S3 bucket.
- The report is institution specific and will be put into the corresponding directory for that report and institution.
- The directory in the S3 bucket will be:
- reports/cgd-round-trip/<institution>/
- The name of the report will be CGD_RoundTripReport_<timestamp>.csv
- ex: CGD_RoundTripReport_20210322_185905.csv
- Columns: Item Barcode, Old CGD, CGD, Date of Action
- Example data:
Prior to v4.3
- The matching algorithm will no longer consider Use Restrictions as of v4.3 (July 2021) and beyond.
The technical Documentation for matching algorithm - Technical Documentation for Matching Algorithm