Matching Algorithm

The matching algorithm was developed to meet the specific needs of the ReCAP partners to identify a single item in a duplicate with CGD (Collection Group Designation) set as “Shared” . Additional copies in a matched set are designated “Open.”

All items attached to matched bibs for serials and for multi-volume monographs with more than one holding will be designated “Open.”

Business Rules:

Some revisions and modifications have taken place throughout the development process and are summarized here:

  1. If duplicate copies from within the same institutions are in ReCAP, they will not be resolved unless there is also a copy from another partner. Therefore, it is up to the submitting institutions to only submit one copy of duplicates to SCSB as “Shared,” and the rest as “Open.”
  2. If one or more numbers match, the items will be processed so only one item is “Shared”. The initial designations are assigned in a way designed to create an even distribution of “Shared” designations among the partners, while considering use restrictions.
  3. However, if only one number matches, an additional title verification will take place. The data will be normalised and the first four words of the title will be compared. The report will be places on the FTP server. Partners can then decide whether to take any further action on the set. 
    1. Even though the titles show an exception, the matches will be processed according to the initial rubric (⅓) or the ongoing rubric (first in gets the designation.)
  4. Occasionally an item will have a single number match but having different bib levels; one is a monograph and the other a serial. In this case, no further processing is done.
  5. All the set of 13+ million records created initially in SCSB will be submitted to the matching algorithm and processed as follows:
    1. A summary report will be created and available in the FTP. 
    2. Monographs and multi-volume monographs with only one item - designated per the ⅓ algorithm
    3. Serials (with one to N items) and MVMs with more than one item - all items attached to matching bibs will be designated ‘Open.”  
    4. A title report of the serials that match will be available in the FTP. at [FTP address]
  6. ONGOING MATCHING: The matching algorithm will be run on an ongoing basis after accession. On an ongoing basis, the initial date the item is accessioned into SCSB is considered, with the item that was first in getting the designation.
  7. All items that are designated as Shared via the accession process or the submit collection update will be matched against the entire database of shared items if they are submitted through the “unprotected” process.
  8. Items can be processed through the CGD Protected Submit Collection Process and regardless of the CGD, the items will not be submitted to the Matching Algorithm.
  9. Title exception Report and Serials title report will be written after the running of the matching algorithm.
    1. The ongoing reports are available at the [FTP Location]


The technical Documentation for matching algorithm - Technical Documentation for Matching Algorithm