Skip to end of metadata
Go to start of metadata

You are viewing an old version of this content. View the current version.

Compare with Current View Version History

Version 1 Current »

The performance issues in your mutation log system stem from using UUIDs as primary keys, which create fundamental database inefficiencies. Here's a comprehensive analysis and solution based on database engineering principles and the provided search results:

1. Performance Tweak : Mutation Log

Problem Analysis

  1. Index Fragmentation
    UUIDv4's randomness causes poor index locality, leading to expensive B+ tree rebalancing during inserts28. This manifests as:

    • Slower insertion rates (15-30% degradation observed)3

    • Increased page splits in indexes9

    • 2x longer full table scans vs sequential IDs (1.2s vs 526ms in benchmarks)9

  2. Storage Overhead
    UUIDs (16 bytes) vs INT (4 bytes) creates:

    • 4x larger primary key storage

    • 75% more index pages required8

    • Reduced cache efficiency (fewer records per memory page)3

  3. Sorting Inefficiency
    Non-sequential keys force expensive filesort operations for chronological queries:

    sql
    SELECT * FROM core_Mutation_Log ORDER BY request_date_time DESC LIMIT 100
    

    Requires full index scan despite date filter79

Recommended Solution

Following recommendation solution only serves as knowledge purpose, which is not tested.

1. Switch to Integer Primary Key

python
class MutationLog(models.Model):
    id = models.BigAutoField(primary_key=True)  # 8-byte sequential key
    uuid = models.UUIDField(default=uuid.uuid4, editable=False, unique=True)
    # As it is fields...

Benefits:

  • 50% reduction in index size58

  • Native chronological ordering via sequential inserts9

  • 2-3x faster JOIN operations68

2. Add Composite Index for Temporal Queries

python
class Meta:
    indexes = [
        models.Index(fields=['-request_date_time', 'status'],
                     name='timestamp_status_idx')
    ]

Optimizes both dashboard queries and background workers accessing recent mutations.

3. Migration Strategy

  1. Create new table with INT PK:

    sql
    CREATE TABLE mutation_log_new (
      id BIGSERIAL PRIMARY KEY,
      uuid UUID NOT NULL UNIQUE,
      ... existing columns ...
    );
    
  2. Batch copy data (500k records/batch):

    sql
    INSERT INTO mutation_log_new (uuid, json_content, ...)
    SELECT uuid, json_content, ... FROM mutation_log;
    
  3. Atomic switch during maintenance window:

    sql
    ALTER TABLE mutation_log RENAME TO mutation_log_old;
    ALTER TABLE mutation_log_new RENAME TO mutation_log;
    

Performance Expectations

Metric

UUID (v4)

INT PK

Improvement

Insert Throughput

2.3k ops/s

4.1k ops/s

78% faster38

Index Size

12GB

3GB

75% smaller9

ORDER BY date LIMIT 100

420ms

23ms

18x faster9

The following strategy works for better performance.

Mitigation strategy used in production server

  • Archive old mutationlog table

  • Create new mutationlog table

2. Performance tweak : Healthfacility page optimization

Before:

image-20250211-065546.png

After:

image-20250211-065745.png

The hospital name is sample for your convenience

Problem:

Query Parameter for fetching claims: healthFacility { id uuid name code }, joins to all nested health facility (parent and child related health-facility)

    {
      claims(status: 2,orderBy: ["-dateClaimed"],first: 10)
      {
        totalCount
        
    pageInfo { hasNextPage, hasPreviousPage, startCursor, endCursor}
    edges
    {
      node
      {
        healthFacility { id uuid name code },uuid,code,jsonExt,dateClaimed,dateProcessed,feedbackStatus,reviewStatus,claimed,approved,status,restoreId,insuree{id, uuid, chfId, lastName, otherNames, dob},preAuthorization,attachmentsCount
      }
    }
      }
    }

Below solution works and has been implemented in openIMIS modular version in Social Security Fund which is the major optimization required.

Solution:

    {
      claims(status: 2,orderBy: ["-dateClaimed"],first: 10)
      {
        totalCount
        
    pageInfo { hasNextPage, hasPreviousPage, startCursor, endCursor}
    edges
    {
      node
      {
        healthFacilityStr,uuid,code,jsonExt,dateClaimed,dateProcessed,feedbackStatus,reviewStatus,claimed,approved,status,restoreId,insuree{id, uuid, chfId, lastName, otherNames, dob},preAuthorization,attachmentsCount
      }
    }
      }
    }

The healthFacilityStr only returns name of the hospital required form the claim page.

Backend healthFacilityStr fix (string only):

class ClaimGQLType(DjangoObjectType):
    """
    Main element for a Claim. It can contain items and/or services.
    The filters are possible on BatchRun, Insuree, HealthFacility, Admin and ICD in addition to the Claim fields
    themselves.
    """
    attachments_count = graphene.Int()
    client_mutation_id = graphene.String()
    entry_by_login_id = graphene.String()
    submit_by_login_id = graphene.String()
    process_by_login_id = graphene.String()
    review_by_login_id = graphene.String()
    date_processed_to = graphene.Date()
    claim_review_vault_jsonstr = graphene.String()
    bypass_insuree = graphene.Int()
    submissionNo = graphene.String()
    pid_insuree = graphene.Field(InsureeGQLType)
    health_facilty_str = graphene.String()

    def resolve_health_facilty_str(self, info):
        return f"{self.health_facility.code} {self.health_facility.name}"

Frontend changes: omit PublishedComponent and just put plain c.healthFacilityStr, in ClaimSearcher.js

            c => c.healthFacilityStr,
            // c => <PublishedComponent
            //     readOnly={true}
            //     id="location.HealthFacilityPicker" withLabel={false} value={c.healthFacility}
            // />,

Claim : actions.js

export function fetchClaimSummaries(mm, filters, withAttachmentsCount, args) {
  var projections = [
    "uuid", "code", "jsonExt", "dateClaimed", "feedbackStatus", "reviewStatus", "claimed", "approved", "status", "paymentStatus",
    "clientMutationId", "paymentDate",
    //"healthFacility" + mm.getProjection("location.HealthFacilityPicker.projection"),
    "insuree" + mm.getProjection("insuree.InsureePicker.projection"),
    "product" + mm.getProjection("claim.ClaimPicker.projection"),
    "bypassInsuree",
    "submissionNo",
    "healthFacilityStr"
  ]

Result: This will avoid endless nested location call from landing pages which drastically improves openIMIS performance

  • No labels