Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The matching algorithm was developed to meet the specific needs of the ReCAP all partners to identify a single item in a duplicate with CGD (Collection Group Designation) set as “Shared” . Additional copies in a matched set are designated “Open.”

...

  1. If duplicate copies from within the same institutions are in ReCAP and HD, they will not be resolved unless there is also a copy from another partner. Therefore, it is up to the submitting institutions to only submit one copy of duplicates to SCSB as “Shared,” and the rest as “Open.”
  2. If one or more numbers match, the items will be processed so only one item is “Shared”. The initial designations are assigned in a way designed to create an even distribution of “Shared” designations among the partners, while considering use restrictions.
  3. However, if only one number matches, an additional title verification will take place. The data will be normalised and the first four words of the title will be compared. The report will be places on the FTP serverAWS S3. Partners can then decide whether to take any further action on the set. 
    1. Even though the titles show an exception, the matches will be processed according to the initial rubric (⅓) or the ongoing rubric (first in gets the designation.)These title exception records will not be eligible for grouping of bibs as well as CGD update process (Applies for both Initial Matching Algorithm and Ongoing Matching Algorithm).
  4. Occasionally an item will have a single number match but having different bib levels; one is a monograph and the other a serial. In this case, no further processing is done.
  5. All the set of 13+ million records created initially in SCSB will be submitted to the matching algorithm and processed as follows:
    1. A summary report will be created and available in the FTP AWS S3
    2. Monographs and multi-volume monographs with only one item - designated per the ⅓ algorithm
    3. Serials (with one to N items) and MVMs with more than one item - all items attached to matching bibs will be designated ‘Open.”  
    4. A title report of the serials that match will be available in the FTP AWS S3. at [FTP address]AWS S3 /scsb-{environment}/reports/matching-reports]  
  6. ONGOING MATCHING: The matching algorithm will be run on an ongoing basis after accession. On an ongoing basis, the initial date the item is accessioned into SCSB is considered, with the item that was first in getting the designation.
  7. All items that are designated as Shared via the accession process or the submit collection update will be matched against the entire database of shared items if they are submitted through the “unprotected” process. Items can be processed through the CGD Protected Submit Collection Process and regardless of the CGD, the items will not be submitted to the Matching Algorithm'Shared' and 'Committed' items.
  8. Title exception Report and , Serials title report and CGD change round trip report will be written after the running of the matching algorithm.
    1. The ongoing reports are available at the [FTP AWS S3 Location]


The technical Documentation for matching algorithm - Technical Documentation for Matching Algorithm