Submit Collection... (in progress)

This use case handles updates to already accessioned records. The partner ILS will provide collection information and updates to SCSB database.

Related JIRAs : RECAP-12, RECAP-42, RECAP-313

Submit Collection API is a REST service where users can provide MARC content in either SCSB XML or MARC XML formats and update the underlying record in SCSB. Sample XML content data can be found here.

Bulk Upload

Submit Collection also allows bulk uploading of data. The XML files are uploaded to <AWS S3>/share/recap/submitcollection/tst/<InstitutionName>/<folder> from where SCSB picks up the data and uploads it into the DB. SCSB checks and loads the files from the AWS S3 location every 20 seconds. An email is sent out after the completion of the bulk processing of the files. The email ids are configured under the external properties (external.submit.collection.email.pul.to for Princeton) file for each Institution.

The bulk upload process allows two different ways in which the records are processed. If the library wishes to protect the CGD as assigned in SCSB, a protected folder (cgd_protection) is provided. Records uploaded from this folder won't have their CGD updated in SCSB (so whatever designation is already in SCSB will be retained, regardless of the designation in the XML.) There is an unprotected folder (no_cgd_protection) for items that should be processed in the regular way; that is, Shared items are sent through the matching algorithm and may be designated Open depending if they are duplicates of titles already in SCSB. (Open and Private items are not send through the matching algorithm.) (Related JIRA : RECAP-781)

Steps to follow to upload files for Submit Collection:

  • Access the AWS S3 location through your valid credentials.
  • Navigate to <AWS S3>/share/recap/submitcollection/<Environment>/<InstitutionName> folder. For the production environment the path for Columbia would be, /share/recap/submitcollection/prod/cul
  • Inside there are three different folders. The cgd_protection folder as the name suggests protects the CGD of the records in the uploaded file. In other words, the CGD of these records is not updated with the value in the record. The no_cgd_protection folder doesn't offer any protection to the CGD of the records. In other words, it updates the CGD of these records with the values in the uploaded file. The .done folder holds the files that have been processed.
  • The filename of the uploaded file must not exceed 30 characters in length and must not have spaces or special characters in them.
  • The filename can have date and time stamp for reference by partners (within the 30 character length) but is not mandated by SCSB.
  • SCSB processes these records on a First In, First Out (FIFO) basis. If the same record is edited twice through two different files, the record in the finally processed file prevails. 
  • The FIFO algorithm considers the files that were uploaded to the FTP first.
  • The file content must be XML, either MARCXML or SCSBXML (see here for sample).
  • Uploaded file can either be with .xml extension or a compressed file with a .gz extension.

Constraints on what can be edited

Except Availability status and Customer Code everything else can be updated through the Submit Collection API. However, when the item is out (Not available in ReCAP), the Collection Group Designation (CGD) cannot be updated. There is a limit on the number of records that can be edited through the API in a single call. It is currently set at 100 and is configured in the external properties file under external.submit.collection.input.limit. Items that have been deaccessioned cannot be edited through Submit Collection.