14 KiB
| title | slug | summary | client | industry | timeline | role | image | tags | featured | order | date | seo_title | seo_description | seo_keywords | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Alphanumeric CNPJ Migration - 100 Million Records | cnpj-migration-database | Execution of massive CNPJ migration from numeric to alphanumeric in database with ~100M records, using phased commit strategy to avoid database locks. | Collection Agency | Collections & Financial Services | In execution | Database Architect & Tech Lead |
|
true | 4 | 2024-11-01 | Alphanumeric CNPJ Migration - 100M Records | Carneiro Tech | Case study of massive CNPJ migration in database with 100 million records using phased commits and performance optimizations. | database migration, SQL Server, CNPJ, batch processing, performance optimization, phased commits |
Overview
A collection agency that works with transitory data databases (no proprietary software) needs to adapt its systems to the new Brazilian alphanumeric CNPJ format.
Main challenge: Migrate ~100 million records in tables with BIGINT and NUMERIC columns to VARCHAR, without locking the production database.
Status: Project in execution (migration script preparation).
Challenge
Massive Data Volume
Company context:
- Collection agency (does not develop proprietary software)
- Works with transitory data (high turnover)
- SQL Server database with critical volume
Initial analysis revealed:
| Table | Column | Current Type | Records | Size |
|---|---|---|---|---|
| Debtors | CNPJ_Debtor | BIGINT | 8,000,000 | 60 GB |
| Transactions | CNPJ_Payer | NUMERIC(14) | 90,000,000 | 1.2 TB |
| Companies | CNPJ_Company | BIGINT | 2,500,000 | 18 GB |
| TOTAL | - | - | ~100,000,000 | ~1.3 TB |
Identified problems:
- Tables with 8M+ rows using
BIGINTfor CNPJ - 90 million records in transactions table
- CNPJ as primary key in some tables
- Foreign keys relating multiple tables
- Impossibility of extended downtime (24/7 operation)
- Disk space restrictions (requires efficient strategy)
Strategic Decision: Phased Commits
Why NOT do ALTER COLUMN directly?
Naive approach (DOESN'T work):
-- NEVER DO THIS ON LARGE TABLES
ALTER TABLE Transactions
ALTER COLUMN CNPJ_Payer VARCHAR(18);
Problems:
- Locks entire table during conversion
- Can take hours/days on large tables
- Blocks all operations (INSERT, UPDATE, SELECT)
- Risk of timeout or failure mid-operation
- Complex rollback if something goes wrong
Chosen Strategy: Column Swap with Phased Commits
Based on previous experience, I decided to use a gradual approach:
┌─────────────────────────────────────────────┐
│ 1. Create new VARCHAR column at END │
│ (fast operation, doesn't lock table) │
└─────────────────────────────────────────────┘
▼
┌─────────────────────────────────────────────┐
│ 2. UPDATE in batches (phased commits) │
│ - 100k records at a time │
│ - Pause between batches (avoid lock) │
└─────────────────────────────────────────────┘
▼
┌─────────────────────────────────────────────┐
│ 3. Remove PKs and FKs │
│ (after 100% migrated) │
└─────────────────────────────────────────────┘
▼
┌─────────────────────────────────────────────┐
│ 4. Rename columns (swap) │
│ - CNPJ → CNPJ_Old │
│ - CNPJ_New → CNPJ │
└─────────────────────────────────────────────┘
▼
┌─────────────────────────────────────────────┐
│ 5. Recreate PKs/FKs with new column │
└─────────────────────────────────────────────┘
▼
┌─────────────────────────────────────────────┐
│ 6. Validation and old column deletion │
└─────────────────────────────────────────────┘
Why this approach?
No complete table lock (incremental operation) Can pause/resume at any time Real-time progress monitoring Simple rollback (just drop new column) Minimizes production impact (small commits)
Decision based on:
- Previous experience with large volume migrations
- Knowledge of SQL Server locks
- Need for zero downtime
Note: This decision was made without consulting AI - based purely on practical experience from previous projects.
Implementation Details
Phase 1: Create New Column
-- Fast operation (metadata change only)
ALTER TABLE Transactions
ADD CNPJ_Payer_New VARCHAR(18) NULL;
-- Add temporary index to speed up lookups
CREATE NONCLUSTERED INDEX IX_Temp_CNPJ_New
ON Transactions(CNPJ_Payer_New)
WHERE CNPJ_Payer_New IS NULL;
Estimated time: ~1 second (independent of table size)
Phase 2: Batch Migration (Core Strategy)
-- Migration script with phased commits
DECLARE @BatchSize INT = 100000; -- 100k records per batch
DECLARE @RowsAffected INT = 1;
DECLARE @TotalProcessed INT = 0;
DECLARE @StartTime DATETIME = GETDATE();
WHILE @RowsAffected > 0
BEGIN
BEGIN TRANSACTION;
-- Update batch of 100k records not yet migrated
UPDATE TOP (@BatchSize) Transactions
SET CNPJ_Payer_New = RIGHT('00000000000000' + CAST(CNPJ_Payer AS VARCHAR), 14)
WHERE CNPJ_Payer_New IS NULL;
SET @RowsAffected = @@ROWCOUNT;
SET @TotalProcessed = @TotalProcessed + @RowsAffected;
COMMIT TRANSACTION;
-- Progress log
PRINT 'Processed: ' + CAST(@TotalProcessed AS VARCHAR) + ' rows. Batch: ' + CAST(@RowsAffected AS VARCHAR);
PRINT 'Elapsed time: ' + CAST(DATEDIFF(SECOND, @StartTime, GETDATE()) AS VARCHAR) + ' seconds';
-- Pause between batches (reduces contention)
WAITFOR DELAY '00:00:01'; -- 1 second between batches
END;
PRINT 'Migration completed! Total rows: ' + CAST(@TotalProcessed AS VARCHAR);
Configurable parameters:
@BatchSize: 100k (balanced between performance and lock time)- Too small = many transactions, overhead
- Too large = prolonged lock, production impact
WAITFOR DELAY: 1 second (gives time for other queries to run)
Time estimates:
| Records | Batch Size | Estimated Time |
|---|---|---|
| 8,000,000 | 100,000 | ~2-3 hours |
| 90,000,000 | 100,000 | ~20-24 hours |
Advantages:
- Doesn't freeze application
- Other queries can run between batches
- Can pause (Ctrl+C) and resume later (WHERE NULL picks up where it left off)
- Real-time progress log
Phase 3: Constraint Removal
-- Identifies all PKs and FKs involving the column
SELECT name
FROM sys.key_constraints
WHERE type = 'PK'
AND parent_object_id = OBJECT_ID('Transactions')
AND COL_NAME(parent_object_id, parent_column_id) = 'CNPJ_Payer';
-- Remove PKs
ALTER TABLE Transactions
DROP CONSTRAINT PK_Transactions_CNPJ;
-- Remove FKs (tables that reference)
ALTER TABLE Payments
DROP CONSTRAINT FK_Payments_Transactions;
Estimated time: ~10 minutes (depends on how many constraints exist)
Phase 4: Column Swap (Renaming)
-- Rename old column to _Old
EXEC sp_rename 'Transactions.CNPJ_Payer', 'CNPJ_Payer_Old', 'COLUMN';
-- Rename new column to original name
EXEC sp_rename 'Transactions.CNPJ_Payer_New', 'CNPJ_Payer', 'COLUMN';
-- Change to NOT NULL (after validating 100% populated)
ALTER TABLE Transactions
ALTER COLUMN CNPJ_Payer VARCHAR(18) NOT NULL;
Estimated time: ~1 second (metadata change)
Phase 5: Constraint Recreation
-- Recreate PK with new VARCHAR column
ALTER TABLE Transactions
ADD CONSTRAINT PK_Transactions_CNPJ
PRIMARY KEY CLUSTERED (CNPJ_Payer);
-- Recreate FKs
ALTER TABLE Payments
ADD CONSTRAINT FK_Payments_Transactions
FOREIGN KEY (CNPJ_Payer) REFERENCES Transactions(CNPJ_Payer);
Estimated time: ~30-60 minutes (depends on volume)
Phase 6: Validation and Cleanup
-- Validate that 100% was migrated
SELECT COUNT(*)
FROM Transactions
WHERE CNPJ_Payer IS NULL OR CNPJ_Payer = '';
-- Validate referential integrity
DBCC CHECKCONSTRAINTS WITH ALL_CONSTRAINTS;
-- If everything OK, remove old column
ALTER TABLE Transactions
DROP COLUMN CNPJ_Payer_Old;
-- Remove temporary index
DROP INDEX IX_Temp_CNPJ_New ON Transactions;
CNPJ Fast Process Customization
Differences vs. Original Process
The original CNPJ Fast process was restructured for this client:
Main changes:
| Aspect | Original CNPJ Fast | Client (Customized) |
|---|---|---|
| Focus | Applications + DB | DB only (no proprietary software) |
| Discovery | App inventory | Schema analysis only |
| Execution | Multiple applications | Massive SQL scripts |
| Batch Size | 50k-100k | 100k (optimized for volume) |
| Monitoring | Manual + tools | Real-time SQL logs |
| Rollback | Complex process | Simple (DROP COLUMN) |
Reason for restructuring:
- Client has no proprietary applications (only consumes data)
- 100% focus on database optimization
- Much larger volume than typical cases (100M vs ~10M)
Tech Stack
SQL Server T-SQL Batch Processing Performance Tuning Database Optimization Migration Scripts Phased Commits Index Optimization Constraint Management
Key Decisions & Trade-offs
Why 100k per batch?
Performance tests:
| Batch Size | Time/Batch | Lock Duration | Contention |
|---|---|---|---|
| 10,000 | 2s | Low | Minimal |
| 50,000 | 8s | Medium | Acceptable |
| 100,000 | 15s | Medium | Balanced |
| 500,000 | 90s | High | Production impact |
| 1,000,000 | 180s | Very high | Unacceptable |
Choice: 100k offers best balance between performance and impact.
Why create column at END?
SQL Server internals:
- Add column at end = metadata change (fast)
- Add in middle = page rewrite (slow)
- For large tables, position matters
Why WAITFOR DELAY of 1 second?
Without delay:
- Batch processing consumes 100% of I/O
- Application queries slow down
- Lock escalation may occur
With 1s delay:
- Other queries have window to execute
- Distributed I/O
- User experience preserved
Trade-off: Migration takes +1s per batch (~25% slower), but system remains responsive.
Current Status & Next Steps
Current Status (December 2024)
Preparation Phase:
- Discovery complete (100M records identified)
- Migration scripts developed
- Tests in staging environment
- Performance validation in progress
- Awaiting production maintenance window
Next Steps
- Complete production backup
- Production execution (24/7 environment)
- Real-time monitoring during migration
- Post-migration validation (integrity, performance)
- Lessons learned documentation
Lessons Learned (So Far)
1. Previous Experience is Gold
Decision to use phased commits came from practical experience in previous projects, not from documentation or AI.
Similar previous situations:
- E-commerce data migration (50M records)
- Encoding conversion (UTF-8 in 100M+ rows)
- Historical table partitioning
2. "Measure Twice, Cut Once"
Before executing in production:
- Exhaustive tests in staging
- Scripts validated and reviewed
- Rollback tested
- Time estimates confirmed
Preparation time: 3 weeks Execution time: Estimated at 48 hours
Ratio: 10:1 (preparation vs execution)
3. Customization > One-Size-Fits-All
The original CNPJ Fast process needed to be restructured for this client.
Lesson: Processes should be:
- Structured enough to repeat
- Flexible enough to adapt
4. Monitoring is Crucial
Scripts with detailed progress logs allow:
- Estimate remaining time
- Identify bottlenecks
- Pause/resume with confidence
- Report status to stakeholders
-- Log example
Processed: 10,000,000 rows. Batch: 100,000
Elapsed time: 3600 seconds (10% complete, ~9h remaining)
Performance Optimizations
Optimizations Implemented
-
Temporary index WHERE NULL
- Speeds up lookup of unmigrated records
- Removed after completion
-
Optimized batch size
- Balanced between performance and lock time
-
Transaction log management
-- Check log growth DBCC SQLPERF(LOGSPACE); -- Adjust recovery model (if allowed) ALTER DATABASE MyDatabase SET RECOVERY SIMPLE; -
Execution during low-load hours
- Overnight maintenance window
- Weekend (if possible)
Expected result: Migration of 100 million records in ~48 hours, without significant downtime and with possibility of fast rollback.