Export Data Dump


The Export Data Dump is a REST Service that allows partners to export bibliographic records in SCSB database into their chosen format (MARCXML or SCSB Schema XML). The use case detailing this service can be found here. Information on the parameters that are to be passed as part of the REST call are detailed in the Swagger documentation found in this link, <environment>:9093/swagger-ui/index.html#/data-dump-rest-controller, where, <environment> can be http://uat-recap.htcinc.com (or) http://tst-recap.htcinc.com

Related JIRA -  RECAP-39 - Getting issue details... STATUS

The above diagram (Source) depicts the optimizations carried out to improve the performance of the REST service API while handling huge volume of data.

The parameters required by the API call is well defined on the swagger documentation.

NOTE: Generating a full dump (fetchType = Full) is a time and resource consuming job and typically is run only the very first time to get all data from SCSB. To prevent unauthorized or unintentional initialization of a full export, SCSB allows configuring the value (external.datadump.fetchtype.full) that is used as the full data dump parameter in the API.

The resultant xml is generated as a MARCXML or SCSBXML, the schema is defined here.

Bulk Export

The response which contains the XML content can either be returned as HTML or through FTP. This is configured through the request parameter, transmissionType. A cap of 100 records is set for HTML response. Anything above this cap or if the parameter is left empty then the result is transmitted as FTP. The location where the XML files are uploaded in the FTP is configured in the application-<environment>.properties file. The file is found under scsb-etl/src/main/resources/ folder.

Usually, the FTP location is /share/storagelocation/data-dump/<environment>/<partnercode>/<outputformat>. So for example, if someone is looking for a file generated for an Institution (requestingInstitutionCode input parameter) in tst environment with MarcXML as the output format, then the FTP location is /share/storagelocation/data-dump/tst/Institution/MarcXml

Parameters

ParameterData TypeDescription
institutionCodesStringInstitution code(s) of the partner(s) whose shared or open item information updates are being requested. Use PUL for Princeton, CUL for Columbia and NYPL for New York Public Library. Two institution codes can be simultaneously requested by separating the codes with a comma (,) without any space between. For example, to request for Princeton and Columbia records, use PUL,CUL as the value.
requestingInstitutionCodeStringInstitution code of the partner who is requesting for the shared or open item information updates. Use PUL for Princeton, CUL for Columbia and NYPL for New York Public Library.
fetchTypeStringThere are two types of export - Incremental and Deleted. Incremental would bring those records that have been incrementally added (through Ongoing Accession) to SCSB. Deleted would bring those records that have been removed (through Deaccession) from SCSB and those records that earlier had an Open or Shared Collection Group Designation (CGD) but have been moved to Private. Use 1 to choose Incremental and 2 to choose Deleted.
outputFormatStringSCSB allows exporting of records in two different formats - MARCXML and SCSBXML (more details here). Deleted record information is always exported in the JSON format. The values in this parameter allows the user to select the format type. Use 0 for MARCXML, 1 for SCSBXML and 2 for the JSON format.
dateStringDate and Time mentioned here is considered and only those data that were added, removed or modified are retrieved and exported as part of the export process. The format followed is YYYY-MM-DD HH:MM. For example, to consider data from 1:00 PM, 09th of June 2017, use 2017-06-09 13:00
collectionGroupIdsStringUser can choose to export only those records with CGD as Shared or Open using this parameter. Use 1 to export records with Shared CGD, 2 to export records with Open CGD. Leaving this parameter empty without any value considers both Shared and Open CGD records for export.
transmissionTypeStringTransmission is either through HTTP (show as part of response) or through FTP (export and save files to a FTP location configured in SCSB). There is a limit of 100 records to show as part of the HTTP transmission type. Requests that return more than a 100 records are exported through FTP automatically. Use 0 for FTP transmission type and 1 for HTTP transmission type. Leaving this parameter empty without any value considers the transmission type as FTP.
emailToAddressStringEmail address to whom an email will be sent upon completion of the export process.

Response

Input ParametersResponse
With incremental fetch type, output format as MARCXML and transmission type as HTTPMARCXML as defined here.
With incremental fetch type, output format as SCSBXML and transmission type as HTTPSCSBXML as defined here.
With incremental fetch type, output format as JSON (deaccessioned items and those that were moved from shared and open CGD to private) and transmission type as HTTP
[  
   {  
      "bibId":"5",
      "itemBarcodes":[  
         "32101075649143"
      ]
   },
   {  
      "bibId":"786782",
      "itemBarcodes":[  
         "32101075852192"
      ]
   },
   {  
      "bibId":"786787",
      "itemBarcodes":[  
         "32101075852226"
      ]
   }
]
With incremental fetch type and transmission type as HTTP with more than 100 records expected in response
There are more than 100 records. Use transmission type ftp to dump the data
With incremental fetch type and transmission type as FTP
Export process has started and we will send an email notification upon completion
Transmission type as FTP with date and email address empty
1. Please enter the date
2. Please enter a valid email address