Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

1. How it works in the current coreMIS version? <TO-DO>

  • De duplication

    • This only works if there are unique field(s) present. If there are no unique fields de duplication does not work.

    • If there are unique field(s), identify both in the Extraction and the Loading part of the ETL process, for example “National Identification” column in the Excel file uploaded to “national_id” column in the PostGreSQL database.

    • During the Loading stage of the ETL check and confirm if data exists, this is a simple count query with the data in the Excel column, if it exists flag and do not insert record. Possible data to flag; number of record in Excel file, entire line of the record in the Excel file to help with identification.

    • After ETL process is completed generate a report summary of all records that were flagged as duplicates.

    • Stakeholders are granted a means to fix duplicates, either by accepting that flagged duplicates should not be saved or undo the Excel file upload fix the duplicates and reimport for the cycle to start again.

  • Validation

    • This only works if there fields to be validated. If there are no unique fields de duplication does not work.

    • If there are field(s) to be validated, for example ID format, phone number, age. Define regex or validation parameters.

    • During the Loading stage of the ETL check and confirm if data meets validation criteria, if it does not meet the validation criteria and do not insert record. Possible data to flag; number of record in Excel file, entire line of the record in the Excel file to help with identification.

    • After ETL process is completed generate a report summary of validation

    • Stakeholders are granted a means to fix validation issues, either by accepting that validation errors should not be saved or undo the Excel file upload fix the data entry issues and reimport for the cycle to start again.

2. Our knowledge based on the meetings

...