Proposed optimisation Strategies

Current consensus suggest that moving away from stored procedures (and using in-app code instead) is the preferred method of operation. This will take time. In the meantime, I’ll quickly look at the uspSSRSPolicyStatus procedure to propose a few optimisations that can be implemented. Further down I’ll propose a few other, more general strategies.

A Specific Example

As is the nature of this explorative pilot, the new (and hopefully optimised) code is made for PostgreSQL. On the previous page I uploaded a file with the complete code of the postgres script of this procedure. Ill copy it here for ease of access so You can look at it. The original procedure is in the current SQL Server database and can be looked at there. I’ll post snippets of both versions so You can see what I’m talking about. Here’s the new version:

Now let’s look at a few things. First, the original procedure declares a loop (line 77 of the original script) and immediately after enters this loop.

/* Original */ DECLARE LOOP1 CURSOR LOCAL FORWARD_ONLY FOR SELECT /*[ I've cut out the long select statement for convenience]*/ OPEN LOOP1 FETCH NEXT FROM LOOP1 INTO /*[...]*/

Using a for-loop can simplify the syntax (and hopefully speed up execution). If nothing else, it vastly improves readability. Aslo note that loop has been labelled “loop1”:

/* Optimised */ << loop1 >> FOR tempRow IN SELECT /*[...]*/ LOOP SELECT /*[...]*/

Next, the original script uses this block of select statements starting at line 132:

/* Original */ SELECT @ProductCode = ProductCode from tblProduct WHERE ProdID = @ProductID SELECT @ProductName = ProductName from tblProduct WHERE ProdID = @ProductID SELECT @ProductFromDate = DateFrom from tblProduct WHERE ProdID = @ProductID SELECT @ProductToDate = DateTo from tblProduct WHERE ProdID = @ProductID

Notice how all four statements select from the same table and with the same conditions? It would be faster to combine them into one select statement:

Again, this optimised version is using postgres syntax, but the same thing can be done in SQL server by just assigning the four variables in the same select statement. Minimising the number of select statements will speed up any procedure.

Next, the original script uses goto statements on lines 130 and 138. On line 130 it looks like this:

All that “NextPolicy” does is setting up the next iteration of the loop (get the next line of the result set). Because the proposed optimised version uses a proper for-loop, this setup is automated. Instead we can just continue the labelled loop:

In the thesis I have mentioned that the proposed new script introduces an issue of its own. The results of this loop are stored in a temporary table and at the end the contents of this table are returned. I have noticed that the new version returns the correct number of lines, but all lines have the same data (the last line of the table). The only way I’ve found to fix this, is taking every line of the temp table as soon it is written and inserting it in the return query and then truncating the temp table (so that the same line only appears once in the return statement). This is two operations for every line of data returned (instead of just one operation for the entire table). If any one of You can find out what I did wrong, feel free to fix it. I left the “wrong” solution in the script as a comment block.

General Strategies

There are a few more general optimisations that come to mind. The team is already good at limiting results when only one or a few entries are required (with the limit statement) and, as far as observed, the team never selects more columns than are necessary. Here are two more strategies to employ when writing or optimising scripts:

  • Indexing: Creating indices for large tables can improve performance. It is assumed that this may change from client to client as each client may prioritise different data. I also do not know how many indices already exist in the database.

  • Implicit Casting: OpenIMIS has quite a few columns with ambiguous data types. For example many tables have a columns with the phrase “ID” in their name. Some of them (HfID, ProdID, ClaimID) are integers while others (mostly RowID, across multiple tables) are text. Checking for an integer when the columns is defined as text evokes an implicit cast. Avoiding these can save time and boost performance. No cases of this has been found but some may have been overlooked.

Finally, when moving over to PostgreSQL there is one last thing to note: Capital letters. For example take the two tables tblClaim and tblHF. Both have a column named HFID, but in tblClaim it is called “HFID“ while in tblHF it is named “HfID“. The difference is that the letter f is only capitalised in one column. Postgres expects you to get this difference right every time or You will get errors. There are also instances where verious IDs are either “ID” or “Id”. I've mostly notices that in views it tends to be “Id” with lower case d and in tables it tends to be “ID” with capital D. SQL Server does checking in the background and gives You the right results no matter if You got the name wrong (or at least the capitalisation). PostgreSQL does not do this for You. Going over the eintire database and unifying everything so that all ID’s (all other names) are always traded the same way may speed up development (at the cost of the initial unification work).

I dont know if fixing (is it okay to call this change “fixing“ it?) this will lead to a performance increase because I don’t know how much performance is affected whenever SQL Server needs to look for the correct column name. Maybe performance is not affected at all. But (once you switch to postgres)your developers will be thankful if this is fixed. Also note that all functions and views would need to be updated in postgres once you change names.

 

Did you encounter a problem or do you have a suggestion?

Please contact our Service Desk



This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. https://creativecommons.org/licenses/by-sa/4.0/