Database migration to PostgreSQL (explorative pilot)

Content

Objective

Explore the migration of openIMIS database from Microsoft SQL Server to PostgreSQL

Description

The project includes the following parts:

  1. Become familiar with the existing openIMIS implementation in general and the openIMIS database in particular.

  2. Create the openIMIS database structure in PostgreSQL: tables, views, user-defined types.

  3. Migrate Stored Procedures and Functions to PostgreSQL. Analyse and propose optimisation during the migration of Stored Procedures. This can also include the replacement of stored procedures by application-level code.

  4. Migrate in-line SQL scripts in all openIMIS related components: Web Application (legacy and modular), REST API, Web Services, Windows Services. Optional: construct SQL scripts using framework's ORM (Entity Framework or Django ORM) to become DBMS independent.

  5. Develop a tool for the transfer of data from the MSSQL database to the PostgreSQL database.

  6. Start the migration of SP from the database level to Python code in the modular backend using Django ORM (DBMS independent).

Further Reading

Discussions

  • DB-structure & fields:

    • how compatible are MS & PSQL data types?

    • include the data migration tool:

      • pages or streaming

  • Stored procedures:

    • how many stored procedures still need migration in openIMIS?

    • can stored procedures (which are not business logic) be integrated into github?

    • go incremental

    • start with a small one to get used to the migration

    • reports are important (21 SP)

    • how to unittests (python in stored proc)?

  • In-line SQL scripts:

    • avoid in-line SQL, instead use django flavor

    • look at generic DB access (is it possible to have the current functional DB access work on MS SQL and PostgreSQL?)

  • Data migration tool:

    • is already a goal in itself:

    • certainly if we need to cope with large data sets (streaming/pagination,…)

    • data types may require transformations (dates, -large- text fields)

    • … use existing (open source?) ETL?

 

  • Report of difficulties, attention points ('hacks' put in place),… is more important than the result itself as it would anticipate a difficulty to migrate the related ‘module’ in new architecture later on.

Preliminary Notes

  • Using a different database system requires edits to the settings.py file within the folder “<openIMIS base directory>\openIMIS-be\openimis-be_py\openIMIS\openIMIS

    • We still need to look into this: Can we have one universal settings.py file that can figure out dynamically what type of database server is being used and load drivers and preferences accordingly? Otherwise the file needs to be changed manually whenever the database is migrated. I suspect that doing this manually in all facilities in a country could lead to issues.

    • PostgreSQL - unlike SQL Server - does not require a specific driver to be named within the database options of that settings file. But Django / Python still require a driver to be installed.

      • For a new system this can be done by adding a new line to requirements.txt (where Python / Pip gets all requirements for an openIMIS installation): The package psycopg2 should be added.

      • In a system that already has openIMIS installed, running the command “pip install psycopg2“ from within openIMIS’s virtual environment satisfies the requirement.

      • Django requires psycopg2 version 2.5.4 or higher (from the official documentation: https://docs.djangoproject.com/en/3.0/ref/databases/#postgresql-notes ). Currently we’re using version 2.8.3

      • More information on the driver, should You be interested: https://www.psycopg.org/

  • PostgreSQL can be installed alongside an “Application Stack Builder“. This allows PostgreSQL users to install tools and extensions for their server.

    • One of those is the Migration Toolkit (mtk). mtk allows us to migrate a database from one type of SQL to another (For us: From SQL Server to PostgreSQL). However: It can only migrate tables, views, and the data in those tables and view. Stored Procedures still need to be migrated and adapted manually.

      • Considering these limitations: It may be preferable to create our own tool which can be more flexible (e.g: Have an option to not migrate historic data or only migrate data that is newer than a specified date). Also, if we have an in-house tool it can easily be adapted in the future.

    • Another important tool is pgAdmin4. This is PostgreSql’s equivalent to the Microsoft SQL Server Management Studio program. pgAdmin4 uses a web interface.

  • Currently, we are working with PostgreSQL version 10.12. It is the newest version with considerable compatibility for older operating systems: It supports both 32-bit and 64-bit architectures (newer versions have dropped 32-bit support) and it supports Windows versions all the way back to Windows 7 and Windows Server versions all the way back to Windows Server 2008 R1. It also supports Mac OSX 10.10 and newer as well as many different flavours of Linux.

    • Upgrading to a newer version should easily be possible. As far as we know our setup also supports newer versions (we can even use the same driver). However, if this is a concern, it should preferably be done before we send this to our clients

  • Migration should be possible and simple from within Django. However, we need to be careful because all our modules have many columns in their tables commented out (in each module’s models.py file): Are these columns still required (e.g. for backwards compatibility) or can they be safely ignored? This needs to be investigated. Keeping the new database as lean as possible will certainly help with speed.

Updates to Preliminary Notes

  • PostgreSQL versions 11.5 and newer should be considered instead of version 10 that was chosen. Considering You are already requiring 64-bit installations for openIMIS, losing support for 32-bit systems is no downside. Additionally, only versions 11.5 and newer support stored procedures. In older versions these SPs are stored as functions instead.

  • Some of the commented out columns are still required by some functions or procedures, most notably the startcyle1 - startcycle4 columns in the tblProduct table. Thankfully, most of them are correctly defined so that no additional work (beyond uncommenting the lines) is needed. The migration tool also had no issue filling the tables with data, suggesting that the columns are properly defined.

Child Pages

Did you encounter a problem or do you have a suggestion?

Please contact our Service Desk



This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. https://creativecommons.org/licenses/by-sa/4.0/