Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Current »

The original openIMIS software was using Microsoft SQL Server with a heavy use of stored procedures, making the port to another (open source) database difficult. Within the modular transformation, the software was adapted to support PostgreSQL.

This page explains how to configure, migrate the databases and provides guidelines for proper support of both databases during development.

The original migration tool was documented here:

Database migration to PostgreSQL (explorative pilot)

Configuring the database

Starting from a clean database

Migrating an existing database

Portable development guidelines

Django ORM to the rescue

Django ORM was designed to make applications portable. So, almost all code written with the Django ORM will work on any underlying database.

Switching between databases

To provide raw SQL for each database system, one can use the database engine specification:

'select [db] from [microsoft]' if "sql_server" in settings.DB_ENGINE else 'select "DB" from "Microsoft"'

When the need for native SQL is limited, the use RawSQL limits the intervention to the missing part, not the whole query. For example in the “sort by random” situation:

RawSQL("NEWID()", params=[]) \
if "sql_server" in settings.DB_ENGINE else \
RawSQL("RANDOM()", params=[])

Case sensitivity

MS SQL uses SELECT [ClaimID] from [tblClaims]format while PostgreSQL uses SELECT "ClaimId" from "tblClaims" When the columns/tables are created in all lowercase, the SQL is always case insensitive. But when they were created with mixed case, MS SQL remains insensitive while PostgreSQL requires the specific case to be used, between double quotes.

MS SQL tolerates the PostgreSQL spelling, so all raw SQL should be specified with the PostgreSQL format for compatibility.

This forced the code to use proper casing, especially in LegacyId vs LegacyID.

Use proper column types

UUID should be stored as UUID data type. It will be faster, case and dash insensitive.

Bit data type should be replaced with Boolean. Beware that in SQL scripts, you can’t insert 0 or 1 as litterals but you can append ::boolean to convert the value. It will also work with null::boolean.

TinyInt doesn’t exist in Postgres, use smallint.

JSON

JSON is poorly supported in SQL Server. The columns are defined as TEXT data type. This means that all conditions on those will require the database to parse the whole table instead of using an index.

PostgreSQL supports the JSONB data type with pretty good performance.

In the migration, if using SQL, to add the column:

migrations.RunSQL('ALTER TABLE [tblClaim] ADD [JsonExt] TEXT'
                  if "sql_server" in settings.DB_ENGINE else
                  'ALTER TABLE "tblClaim" ADD "JsonExt" jsonb')

In the ORM, either specify the field like:

json_ext = FallbackJSONField(db_column="JsonExt", blank=True, null=True)

or set the Model to extend core_models.ExtendableModel

The huge advantage of PostgreSQL with JSONB is that the individual fields can be indexed:

create index idx_example on "tblClaim" (("JsonExt" ->> 'foo'))

For some other operations like checking whether a certain key exists anywhere in the JSON, it is also possible to use GIN indexes. Please see documentation if the need arises.

To query the data, one can simply use the same format as when traversing table joins:

Claim.objects.filter(json_ext__AI__evaluation="pass")

In plain SQL, the corresponding operator is -> or ->>. The → gets the subelement JSON object while ->> gets the value of the field.

select jsonExt -> 'AI' ->> 'evaluation' for tblClaims;

jsonExt -> ‘AI' -> ‘evaluation would be a JSONB object like {"evaluation": "pass"} while ->> 'evaluation' would return a string “pass”.

Sequences and auto-increment

PostgreSQL uses a sequence to implement auto-increment keys. When loading data manually, it is important to make sure that the next ID is greater than the data in the table. The scripts use commands like:

 SELECT setval(
     '"public"."tblUsers_UserID_seq"', 
     (select coalesce(max("UserID"), 1) from "tblUsers"),
     true);

DateAdd()

SQL Server had a peculiar way of adding months to dates that required a specific component for Django. Obviously, that is supported by default on PostgreSQL but in raw queries, this is a different to consider.

DateAdd(m, months, date)  -- SQL Server
date + months * INTERVAL ‘1 month’  -- PostgreSQL

Django migrations

Most migrations in Django work transparently between databases. However, we had a few SQL Server specific migrations: clustered indexes, creation of missing role rights…

The creation of role rights was using short TSQL procedures. To make them more portable, they were ported to Python code: insert_role_right_for_system (core.utils) It make the migrations more readable and portable.

One migration had to be split because SQL queries affecting the columns created in the same migration (therefore transaction) were not possible.

MD5

HashBytes('MD5', ‘string’)  -- SQL Server
MD5(‘string’)  -- PostgreSQL

MM/DD/YYYY format

CONVERT(nvarchar(10), getdate(), 103)  -- SQL Server
TO_CHAR(now(), ‘DD-MM-YYYY’)  -- PostgreSQL (actually, standard SQL)

ISNULL

ISNULL(column, ‘default’)  -- SQL Server
NULLIF(column, ‘default’)  -- PostgreSQL, 2 elements
COALESCE(item1, item2, item3…)  -- PostgreSQL, first null of any number of elements

  • No labels