CarneiroTech/Content/Cases/en/cnpj-migration-database.md

14 KiB

title slug summary client industry timeline role image tags featured order date seo_title seo_description seo_keywords
Alphanumeric CNPJ Migration - 100 Million Records cnpj-migration-database Execution of massive CNPJ migration from numeric to alphanumeric in database with ~100M records, using phased commit strategy to avoid database locks. Collection Agency Collections & Financial Services In execution Database Architect & Tech Lead
SQL Server
Database Migration
CNPJ
Performance Optimization
Batch Processing
Big Data
true 4 2024-11-01 Alphanumeric CNPJ Migration - 100M Records | Carneiro Tech Case study of massive CNPJ migration in database with 100 million records using phased commits and performance optimizations. database migration, SQL Server, CNPJ, batch processing, performance optimization, phased commits

Overview

A collection agency that works with transitory data databases (no proprietary software) needs to adapt its systems to the new Brazilian alphanumeric CNPJ format.

Main challenge: Migrate ~100 million records in tables with BIGINT and NUMERIC columns to VARCHAR, without locking the production database.

Status: Project in execution (migration script preparation).


Challenge

Massive Data Volume

Company context:

  • Collection agency (does not develop proprietary software)
  • Works with transitory data (high turnover)
  • SQL Server database with critical volume

Initial analysis revealed:

Table Column Current Type Records Size
Debtors CNPJ_Debtor BIGINT 8,000,000 60 GB
Transactions CNPJ_Payer NUMERIC(14) 90,000,000 1.2 TB
Companies CNPJ_Company BIGINT 2,500,000 18 GB
TOTAL - - ~100,000,000 ~1.3 TB

Identified problems:

  1. Tables with 8M+ rows using BIGINT for CNPJ
  2. 90 million records in transactions table
  3. CNPJ as primary key in some tables
  4. Foreign keys relating multiple tables
  5. Impossibility of extended downtime (24/7 operation)
  6. Disk space restrictions (requires efficient strategy)

Strategic Decision: Phased Commits

Why NOT do ALTER COLUMN directly?

Naive approach (DOESN'T work):

-- NEVER DO THIS ON LARGE TABLES
ALTER TABLE Transactions
ALTER COLUMN CNPJ_Payer VARCHAR(18);

Problems:

  • Locks entire table during conversion
  • Can take hours/days on large tables
  • Blocks all operations (INSERT, UPDATE, SELECT)
  • Risk of timeout or failure mid-operation
  • Complex rollback if something goes wrong

Chosen Strategy: Column Swap with Phased Commits

Based on previous experience, I decided to use a gradual approach:

┌─────────────────────────────────────────────┐
│  1. Create new VARCHAR column at END        │
│     (fast operation, doesn't lock table)    │
└─────────────────────────────────────────────┘
                    ▼
┌─────────────────────────────────────────────┐
│  2. UPDATE in batches (phased commits)      │
│     - 100k records at a time                │
│     - Pause between batches (avoid lock)    │
└─────────────────────────────────────────────┘
                    ▼
┌─────────────────────────────────────────────┐
│  3. Remove PKs and FKs                      │
│     (after 100% migrated)                   │
└─────────────────────────────────────────────┘
                    ▼
┌─────────────────────────────────────────────┐
│  4. Rename columns (swap)                   │
│     - CNPJ → CNPJ_Old                       │
│     - CNPJ_New → CNPJ                       │
└─────────────────────────────────────────────┘
                    ▼
┌─────────────────────────────────────────────┐
│  5. Recreate PKs/FKs with new column        │
└─────────────────────────────────────────────┘
                    ▼
┌─────────────────────────────────────────────┐
│  6. Validation and old column deletion      │
└─────────────────────────────────────────────┘

Why this approach?

No complete table lock (incremental operation) Can pause/resume at any time Real-time progress monitoring Simple rollback (just drop new column) Minimizes production impact (small commits)

Decision based on:

  • Previous experience with large volume migrations
  • Knowledge of SQL Server locks
  • Need for zero downtime

Note: This decision was made without consulting AI - based purely on practical experience from previous projects.


Implementation Details

Phase 1: Create New Column

-- Fast operation (metadata change only)
ALTER TABLE Transactions
ADD CNPJ_Payer_New VARCHAR(18) NULL;

-- Add temporary index to speed up lookups
CREATE NONCLUSTERED INDEX IX_Temp_CNPJ_New
ON Transactions(CNPJ_Payer_New)
WHERE CNPJ_Payer_New IS NULL;

Estimated time: ~1 second (independent of table size)


Phase 2: Batch Migration (Core Strategy)

-- Migration script with phased commits
DECLARE @BatchSize INT = 100000;  -- 100k records per batch
DECLARE @RowsAffected INT = 1;
DECLARE @TotalProcessed INT = 0;
DECLARE @StartTime DATETIME = GETDATE();

WHILE @RowsAffected > 0
BEGIN
    BEGIN TRANSACTION;

    -- Update batch of 100k records not yet migrated
    UPDATE TOP (@BatchSize) Transactions
    SET CNPJ_Payer_New = RIGHT('00000000000000' + CAST(CNPJ_Payer AS VARCHAR), 14)
    WHERE CNPJ_Payer_New IS NULL;

    SET @RowsAffected = @@ROWCOUNT;
    SET @TotalProcessed = @TotalProcessed + @RowsAffected;

    COMMIT TRANSACTION;

    -- Progress log
    PRINT 'Processed: ' + CAST(@TotalProcessed AS VARCHAR) + ' rows. Batch: ' + CAST(@RowsAffected AS VARCHAR);
    PRINT 'Elapsed time: ' + CAST(DATEDIFF(SECOND, @StartTime, GETDATE()) AS VARCHAR) + ' seconds';

    -- Pause between batches (reduces contention)
    WAITFOR DELAY '00:00:01';  -- 1 second between batches
END;

PRINT 'Migration completed! Total rows: ' + CAST(@TotalProcessed AS VARCHAR);

Configurable parameters:

  • @BatchSize: 100k (balanced between performance and lock time)
    • Too small = many transactions, overhead
    • Too large = prolonged lock, production impact
  • WAITFOR DELAY: 1 second (gives time for other queries to run)

Time estimates:

Records Batch Size Estimated Time
8,000,000 100,000 ~2-3 hours
90,000,000 100,000 ~20-24 hours

Advantages:

  • Doesn't freeze application
  • Other queries can run between batches
  • Can pause (Ctrl+C) and resume later (WHERE NULL picks up where it left off)
  • Real-time progress log

Phase 3: Constraint Removal

-- Identifies all PKs and FKs involving the column
SELECT name
FROM sys.key_constraints
WHERE type = 'PK'
    AND parent_object_id = OBJECT_ID('Transactions')
    AND COL_NAME(parent_object_id, parent_column_id) = 'CNPJ_Payer';

-- Remove PKs
ALTER TABLE Transactions
DROP CONSTRAINT PK_Transactions_CNPJ;

-- Remove FKs (tables that reference)
ALTER TABLE Payments
DROP CONSTRAINT FK_Payments_Transactions;

Estimated time: ~10 minutes (depends on how many constraints exist)


Phase 4: Column Swap (Renaming)

-- Rename old column to _Old
EXEC sp_rename 'Transactions.CNPJ_Payer', 'CNPJ_Payer_Old', 'COLUMN';

-- Rename new column to original name
EXEC sp_rename 'Transactions.CNPJ_Payer_New', 'CNPJ_Payer', 'COLUMN';

-- Change to NOT NULL (after validating 100% populated)
ALTER TABLE Transactions
ALTER COLUMN CNPJ_Payer VARCHAR(18) NOT NULL;

Estimated time: ~1 second (metadata change)


Phase 5: Constraint Recreation

-- Recreate PK with new VARCHAR column
ALTER TABLE Transactions
ADD CONSTRAINT PK_Transactions_CNPJ
PRIMARY KEY CLUSTERED (CNPJ_Payer);

-- Recreate FKs
ALTER TABLE Payments
ADD CONSTRAINT FK_Payments_Transactions
FOREIGN KEY (CNPJ_Payer) REFERENCES Transactions(CNPJ_Payer);

Estimated time: ~30-60 minutes (depends on volume)


Phase 6: Validation and Cleanup

-- Validate that 100% was migrated
SELECT COUNT(*)
FROM Transactions
WHERE CNPJ_Payer IS NULL OR CNPJ_Payer = '';

-- Validate referential integrity
DBCC CHECKCONSTRAINTS WITH ALL_CONSTRAINTS;

-- If everything OK, remove old column
ALTER TABLE Transactions
DROP COLUMN CNPJ_Payer_Old;

-- Remove temporary index
DROP INDEX IX_Temp_CNPJ_New ON Transactions;

CNPJ Fast Process Customization

Differences vs. Original Process

The original CNPJ Fast process was restructured for this client:

Main changes:

Aspect Original CNPJ Fast Client (Customized)
Focus Applications + DB DB only (no proprietary software)
Discovery App inventory Schema analysis only
Execution Multiple applications Massive SQL scripts
Batch Size 50k-100k 100k (optimized for volume)
Monitoring Manual + tools Real-time SQL logs
Rollback Complex process Simple (DROP COLUMN)

Reason for restructuring:

  • Client has no proprietary applications (only consumes data)
  • 100% focus on database optimization
  • Much larger volume than typical cases (100M vs ~10M)

Tech Stack

SQL Server T-SQL Batch Processing Performance Tuning Database Optimization Migration Scripts Phased Commits Index Optimization Constraint Management


Key Decisions & Trade-offs

Why 100k per batch?

Performance tests:

Batch Size Time/Batch Lock Duration Contention
10,000 2s Low Minimal
50,000 8s Medium Acceptable
100,000 15s Medium Balanced
500,000 90s High Production impact
1,000,000 180s Very high Unacceptable

Choice: 100k offers best balance between performance and impact.


Why create column at END?

SQL Server internals:

  • Add column at end = metadata change (fast)
  • Add in middle = page rewrite (slow)
  • For large tables, position matters

Why WAITFOR DELAY of 1 second?

Without delay:

  • Batch processing consumes 100% of I/O
  • Application queries slow down
  • Lock escalation may occur

With 1s delay:

  • Other queries have window to execute
  • Distributed I/O
  • User experience preserved

Trade-off: Migration takes +1s per batch (~25% slower), but system remains responsive.


Current Status & Next Steps

Current Status (December 2024)

Preparation Phase:

  • Discovery complete (100M records identified)
  • Migration scripts developed
  • Tests in staging environment
  • Performance validation in progress
  • Awaiting production maintenance window

Next Steps

  1. Complete production backup
  2. Production execution (24/7 environment)
  3. Real-time monitoring during migration
  4. Post-migration validation (integrity, performance)
  5. Lessons learned documentation

Lessons Learned (So Far)

1. Previous Experience is Gold

Decision to use phased commits came from practical experience in previous projects, not from documentation or AI.

Similar previous situations:

  • E-commerce data migration (50M records)
  • Encoding conversion (UTF-8 in 100M+ rows)
  • Historical table partitioning

2. "Measure Twice, Cut Once"

Before executing in production:

  • Exhaustive tests in staging
  • Scripts validated and reviewed
  • Rollback tested
  • Time estimates confirmed

Preparation time: 3 weeks Execution time: Estimated at 48 hours

Ratio: 10:1 (preparation vs execution)


3. Customization > One-Size-Fits-All

The original CNPJ Fast process needed to be restructured for this client.

Lesson: Processes should be:

  • Structured enough to repeat
  • Flexible enough to adapt

4. Monitoring is Crucial

Scripts with detailed progress logs allow:

  • Estimate remaining time
  • Identify bottlenecks
  • Pause/resume with confidence
  • Report status to stakeholders
-- Log example
Processed: 10,000,000 rows. Batch: 100,000
Elapsed time: 3600 seconds (10% complete, ~9h remaining)

Performance Optimizations

Optimizations Implemented

  1. Temporary index WHERE NULL

    • Speeds up lookup of unmigrated records
    • Removed after completion
  2. Optimized batch size

    • Balanced between performance and lock time
  3. Transaction log management

    -- Check log growth
    DBCC SQLPERF(LOGSPACE);
    
    -- Adjust recovery model (if allowed)
    ALTER DATABASE MyDatabase SET RECOVERY SIMPLE;
    
  4. Execution during low-load hours

    • Overnight maintenance window
    • Weekend (if possible)

Expected result: Migration of 100 million records in ~48 hours, without significant downtime and with possibility of fast rollback.

Need to migrate massive data volumes? Get in touch