The performance issues in your mutation log system stem from using UUIDs as primary keys, which create fundamental database inefficiencies. Here's a comprehensive analysis and solution based on database engineering principles and the provided search results:
1. Performance Tweak : Mutation Log
Problem Analysis
Index Fragmentation
UUIDv4's randomness causes poor index locality, leading to expensive B+ tree rebalancing during inserts28
. This manifests as:Slower insertion rates (15-30% degradation observed)
3
Increased page splits in indexes
9
2x longer full table scans vs sequential IDs (1.2s vs 526ms in benchmarks)
9
Storage Overhead
UUIDs (16 bytes) vs INT (4 bytes) creates:4x larger primary key storage
75% more index pages required
8
Reduced cache efficiency (fewer records per memory page)
3
Sorting Inefficiency
Non-sequential keys force expensive filesort operations for chronological queries:sql SELECT * FROM core_Mutation_Log ORDER BY request_date_time DESC LIMIT 100
Requires full index scan despite date filter
79
Recommended Solution
Following recommendation solution only serves as knowledge purpose, which is not tested.
1. Switch to Integer Primary Key
python class MutationLog(models.Model): id = models.BigAutoField(primary_key=True) # 8-byte sequential key uuid = models.UUIDField(default=uuid.uuid4, editable=False, unique=True) # As it is fields...
Benefits:
50% reduction in index size
58
Native chronological ordering via sequential inserts
9
2-3x faster JOIN operations
68
2. Add Composite Index for Temporal Queries
python class Meta: indexes = [ models.Index(fields=['-request_date_time', 'status'], name='timestamp_status_idx') ]
Optimizes both dashboard queries and background workers accessing recent mutations.
3. Migration Strategy
Create new table with INT PK:
sql CREATE TABLE mutation_log_new ( id BIGSERIAL PRIMARY KEY, uuid UUID NOT NULL UNIQUE, ... existing columns ... );
Batch copy data (500k records/batch):
sql INSERT INTO mutation_log_new (uuid, json_content, ...) SELECT uuid, json_content, ... FROM mutation_log;
Atomic switch during maintenance window:
sql ALTER TABLE mutation_log RENAME TO mutation_log_old; ALTER TABLE mutation_log_new RENAME TO mutation_log;
Performance Expectations
Metric | UUID (v4) | INT PK | Improvement |
---|---|---|---|
Insert Throughput | 2.3k ops/s | 4.1k ops/s | 78% faster |
Index Size | 12GB | 3GB | 75% smaller |
ORDER BY date LIMIT 100 | 420ms | 23ms | 18x faster |
The following strategy works for better performance.
Mitigation strategy used in production server
Archive old mutationlog table
Create new mutationlog table
2. Performance tweak : Healthfacility page optimization
Before:
After:
The hospital name is sample for your convenience
Problem:
Query Parameter for fetching claims: healthFacility { id uuid name code },
joins to all nested health facility (parent and child related health-facility)
{ claims(status: 2,orderBy: ["-dateClaimed"],first: 10) { totalCount pageInfo { hasNextPage, hasPreviousPage, startCursor, endCursor} edges { node { healthFacility { id uuid name code },uuid,code,jsonExt,dateClaimed,dateProcessed,feedbackStatus,reviewStatus,claimed,approved,status,restoreId,insuree{id, uuid, chfId, lastName, otherNames, dob},preAuthorization,attachmentsCount } } } }
Below solution works and has been implemented in openIMIS modular version in Social Security Fund which is the major optimization required.
Solution:
{ claims(status: 2,orderBy: ["-dateClaimed"],first: 10) { totalCount pageInfo { hasNextPage, hasPreviousPage, startCursor, endCursor} edges { node { healthFacilityStr,uuid,code,jsonExt,dateClaimed,dateProcessed,feedbackStatus,reviewStatus,claimed,approved,status,restoreId,insuree{id, uuid, chfId, lastName, otherNames, dob},preAuthorization,attachmentsCount } } } }
The healthFacilityStr only returns name of the hospital required form the claim page.
Backend healthFacilityStr fix (string only):
class ClaimGQLType(DjangoObjectType): """ Main element for a Claim. It can contain items and/or services. The filters are possible on BatchRun, Insuree, HealthFacility, Admin and ICD in addition to the Claim fields themselves. """ attachments_count = graphene.Int() client_mutation_id = graphene.String() entry_by_login_id = graphene.String() submit_by_login_id = graphene.String() process_by_login_id = graphene.String() review_by_login_id = graphene.String() date_processed_to = graphene.Date() claim_review_vault_jsonstr = graphene.String() bypass_insuree = graphene.Int() submissionNo = graphene.String() pid_insuree = graphene.Field(InsureeGQLType) health_facilty_str = graphene.String() def resolve_health_facilty_str(self, info): return f"{self.health_facility.code} {self.health_facility.name}"
Frontend changes: omit PublishedComponent and just put plain c.healthFacilityStr, in ClaimSearcher.js
c => c.healthFacilityStr, // c => <PublishedComponent // readOnly={true} // id="location.HealthFacilityPicker" withLabel={false} value={c.healthFacility} // />,
Claim : actions.js
export function fetchClaimSummaries(mm, filters, withAttachmentsCount, args) { var projections = [ "uuid", "code", "jsonExt", "dateClaimed", "feedbackStatus", "reviewStatus", "claimed", "approved", "status", "paymentStatus", "clientMutationId", "paymentDate", //"healthFacility" + mm.getProjection("location.HealthFacilityPicker.projection"), "insuree" + mm.getProjection("insuree.InsureePicker.projection"), "product" + mm.getProjection("claim.ClaimPicker.projection"), "bypassInsuree", "submissionNo", "healthFacilityStr" ]
Result: This will avoid endless nested location call from landing pages which drastically improves openIMIS performance