CarneiroTech/Content/Cases/en/cnpj-migration-database.md

470 lines
14 KiB
Markdown

---
title: "Alphanumeric CNPJ Migration - 100 Million Records"
slug: "cnpj-migration-database"
summary: "Execution of massive CNPJ migration from numeric to alphanumeric in database with ~100M records, using phased commit strategy to avoid database locks."
client: "Collection Agency"
industry: "Collections & Financial Services"
timeline: "In execution"
role: "Database Architect & Tech Lead"
image: ""
tags:
- SQL Server
- Database Migration
- CNPJ
- Performance Optimization
- Batch Processing
- Big Data
featured: true
order: 4
date: 2024-11-01
seo_title: "Alphanumeric CNPJ Migration - 100M Records | Carneiro Tech"
seo_description: "Case study of massive CNPJ migration in database with 100 million records using phased commits and performance optimizations."
seo_keywords: "database migration, SQL Server, CNPJ, batch processing, performance optimization, phased commits"
---
## Overview
A collection agency that works with transitory data databases (no proprietary software) needs to adapt its systems to the new Brazilian **alphanumeric CNPJ** format.
**Main challenge:** Migrate ~**100 million records** in tables with `BIGINT` and `NUMERIC` columns to `VARCHAR`, without locking the production database.
**Status:** Project in execution (migration script preparation).
---
## Challenge
### Massive Data Volume
**Company context:**
- Collection agency (does not develop proprietary software)
- Works with **transitory data** (high turnover)
- SQL Server database with critical volume
**Initial analysis revealed:**
| Table | Column | Current Type | Records | Size |
|--------|--------|------------|-----------|---------|
| Debtors | CNPJ_Debtor | BIGINT | 8,000,000 | 60 GB |
| Transactions | CNPJ_Payer | NUMERIC(14) | 90,000,000 | 1.2 TB |
| Companies | CNPJ_Company | BIGINT | 2,500,000 | 18 GB |
| **TOTAL** | - | - | **~100,000,000** | **~1.3 TB** |
**Identified problems:**
1. **Tables with 8M+ rows** using `BIGINT` for CNPJ
2. **90 million records** in transactions table
3. **CNPJ as primary key** in some tables
4. **Foreign keys** relating multiple tables
5. **Impossibility of extended downtime** (24/7 operation)
6. **Disk space restrictions** (requires efficient strategy)
---
## Strategic Decision: Phased Commits
### Why NOT do ALTER COLUMN directly?
**Naive approach (DOESN'T work):**
```sql
-- NEVER DO THIS ON LARGE TABLES
ALTER TABLE Transactions
ALTER COLUMN CNPJ_Payer VARCHAR(18);
```
**Problems:**
- Locks entire table during conversion
- Can take hours/days on large tables
- Blocks all operations (INSERT, UPDATE, SELECT)
- Risk of timeout or failure mid-operation
- Complex rollback if something goes wrong
---
### Chosen Strategy: Column Swap with Phased Commits
**Based on previous experience**, I decided to use a gradual approach:
```
┌─────────────────────────────────────────────┐
│ 1. Create new VARCHAR column at END │
│ (fast operation, doesn't lock table) │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│ 2. UPDATE in batches (phased commits) │
│ - 100k records at a time │
│ - Pause between batches (avoid lock) │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│ 3. Remove PKs and FKs │
│ (after 100% migrated) │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│ 4. Rename columns (swap) │
│ - CNPJ → CNPJ_Old │
│ - CNPJ_New → CNPJ │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│ 5. Recreate PKs/FKs with new column │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│ 6. Validation and old column deletion │
└─────────────────────────────────────────────┘
```
**Why this approach?**
**No complete table lock** (incremental operation)
**Can pause/resume** at any time
**Real-time progress monitoring**
**Simple rollback** (just drop new column)
**Minimizes production impact** (small commits)
**Decision based on:**
- Previous experience with large volume migrations
- Knowledge of SQL Server locks
- Need for zero downtime
**Note:** This decision was made **without consulting AI** - based purely on practical experience from previous projects.
---
## Implementation Details
### Phase 1: Create New Column
```sql
-- Fast operation (metadata change only)
ALTER TABLE Transactions
ADD CNPJ_Payer_New VARCHAR(18) NULL;
-- Add temporary index to speed up lookups
CREATE NONCLUSTERED INDEX IX_Temp_CNPJ_New
ON Transactions(CNPJ_Payer_New)
WHERE CNPJ_Payer_New IS NULL;
```
**Estimated time:** ~1 second (independent of table size)
---
### Phase 2: Batch Migration (Core Strategy)
```sql
-- Migration script with phased commits
DECLARE @BatchSize INT = 100000; -- 100k records per batch
DECLARE @RowsAffected INT = 1;
DECLARE @TotalProcessed INT = 0;
DECLARE @StartTime DATETIME = GETDATE();
WHILE @RowsAffected > 0
BEGIN
BEGIN TRANSACTION;
-- Update batch of 100k records not yet migrated
UPDATE TOP (@BatchSize) Transactions
SET CNPJ_Payer_New = RIGHT('00000000000000' + CAST(CNPJ_Payer AS VARCHAR), 14)
WHERE CNPJ_Payer_New IS NULL;
SET @RowsAffected = @@ROWCOUNT;
SET @TotalProcessed = @TotalProcessed + @RowsAffected;
COMMIT TRANSACTION;
-- Progress log
PRINT 'Processed: ' + CAST(@TotalProcessed AS VARCHAR) + ' rows. Batch: ' + CAST(@RowsAffected AS VARCHAR);
PRINT 'Elapsed time: ' + CAST(DATEDIFF(SECOND, @StartTime, GETDATE()) AS VARCHAR) + ' seconds';
-- Pause between batches (reduces contention)
WAITFOR DELAY '00:00:01'; -- 1 second between batches
END;
PRINT 'Migration completed! Total rows: ' + CAST(@TotalProcessed AS VARCHAR);
```
**Configurable parameters:**
- `@BatchSize`: 100k (balanced between performance and lock time)
- Too small = many transactions, overhead
- Too large = prolonged lock, production impact
- `WAITFOR DELAY`: 1 second (gives time for other queries to run)
**Time estimates:**
| Records | Batch Size | Estimated Time |
|-----------|------------|----------------|
| 8,000,000 | 100,000 | ~2-3 hours |
| 90,000,000 | 100,000 | ~20-24 hours |
**Advantages:**
- Doesn't freeze application
- Other queries can run between batches
- Can pause (Ctrl+C) and resume later (WHERE NULL picks up where it left off)
- Real-time progress log
---
### Phase 3: Constraint Removal
```sql
-- Identifies all PKs and FKs involving the column
SELECT name
FROM sys.key_constraints
WHERE type = 'PK'
AND parent_object_id = OBJECT_ID('Transactions')
AND COL_NAME(parent_object_id, parent_column_id) = 'CNPJ_Payer';
-- Remove PKs
ALTER TABLE Transactions
DROP CONSTRAINT PK_Transactions_CNPJ;
-- Remove FKs (tables that reference)
ALTER TABLE Payments
DROP CONSTRAINT FK_Payments_Transactions;
```
**Estimated time:** ~10 minutes (depends on how many constraints exist)
---
### Phase 4: Column Swap (Renaming)
```sql
-- Rename old column to _Old
EXEC sp_rename 'Transactions.CNPJ_Payer', 'CNPJ_Payer_Old', 'COLUMN';
-- Rename new column to original name
EXEC sp_rename 'Transactions.CNPJ_Payer_New', 'CNPJ_Payer', 'COLUMN';
-- Change to NOT NULL (after validating 100% populated)
ALTER TABLE Transactions
ALTER COLUMN CNPJ_Payer VARCHAR(18) NOT NULL;
```
**Estimated time:** ~1 second (metadata change)
---
### Phase 5: Constraint Recreation
```sql
-- Recreate PK with new VARCHAR column
ALTER TABLE Transactions
ADD CONSTRAINT PK_Transactions_CNPJ
PRIMARY KEY CLUSTERED (CNPJ_Payer);
-- Recreate FKs
ALTER TABLE Payments
ADD CONSTRAINT FK_Payments_Transactions
FOREIGN KEY (CNPJ_Payer) REFERENCES Transactions(CNPJ_Payer);
```
**Estimated time:** ~30-60 minutes (depends on volume)
---
### Phase 6: Validation and Cleanup
```sql
-- Validate that 100% was migrated
SELECT COUNT(*)
FROM Transactions
WHERE CNPJ_Payer IS NULL OR CNPJ_Payer = '';
-- Validate referential integrity
DBCC CHECKCONSTRAINTS WITH ALL_CONSTRAINTS;
-- If everything OK, remove old column
ALTER TABLE Transactions
DROP COLUMN CNPJ_Payer_Old;
-- Remove temporary index
DROP INDEX IX_Temp_CNPJ_New ON Transactions;
```
---
## CNPJ Fast Process Customization
### Differences vs. Original Process
The original **CNPJ Fast** process was **restructured** for this client:
**Main changes:**
| Aspect | Original CNPJ Fast | Client (Customized) |
|---------|-------------------|---------------------|
| **Focus** | Applications + DB | DB only (no proprietary software) |
| **Discovery** | App inventory | Schema analysis only |
| **Execution** | Multiple applications | Massive SQL scripts |
| **Batch Size** | 50k-100k | 100k (optimized for volume) |
| **Monitoring** | Manual + tools | Real-time SQL logs |
| **Rollback** | Complex process | Simple (DROP COLUMN) |
**Reason for restructuring:**
- Client has no proprietary applications (only consumes data)
- 100% focus on database optimization
- Much larger volume than typical cases (100M vs ~10M)
---
## Tech Stack
`SQL Server` `T-SQL` `Batch Processing` `Performance Tuning` `Database Optimization` `Migration Scripts` `Phased Commits` `Index Optimization` `Constraint Management`
---
## Key Decisions & Trade-offs
### Why 100k per batch?
**Performance tests:**
| Batch Size | Time/Batch | Lock Duration | Contention |
|------------|-------------|---------------|-----------|
| 10,000 | 2s | Low | Minimal |
| 50,000 | 8s | Medium | Acceptable |
| **100,000** | 15s | **Medium** | **Balanced** |
| 500,000 | 90s | High | Production impact |
| 1,000,000 | 180s | Very high | Unacceptable |
**Choice:** 100k offers best balance between performance and impact.
---
### Why create column at END?
**SQL Server internals:**
- Add column at end = metadata change (fast)
- Add in middle = page rewrite (slow)
- For large tables, position matters
---
### Why WAITFOR DELAY of 1 second?
**Without delay:**
- Batch processing consumes 100% of I/O
- Application queries slow down
- Lock escalation may occur
**With 1s delay:**
- Other queries have window to execute
- Distributed I/O
- User experience preserved
**Trade-off:** Migration takes +1s per batch (~25% slower), but system remains responsive.
---
## Current Status & Next Steps
### Current Status (December 2024)
**Preparation Phase:**
- Discovery complete (100M records identified)
- Migration scripts developed
- Tests in staging environment
- Performance validation in progress
- Awaiting production maintenance window
### Next Steps
1. **Complete production backup**
2. **Production execution** (24/7 environment)
3. **Real-time monitoring** during migration
4. **Post-migration validation** (integrity, performance)
5. **Lessons learned documentation**
---
## Lessons Learned (So Far)
### 1. Previous Experience is Gold
Decision to use phased commits came from **practical experience** in previous projects, not from documentation or AI.
**Similar previous situations:**
- E-commerce data migration (50M records)
- Encoding conversion (UTF-8 in 100M+ rows)
- Historical table partitioning
---
### 2. "Measure Twice, Cut Once"
Before executing in production:
- Exhaustive tests in staging
- Scripts validated and reviewed
- Rollback tested
- Time estimates confirmed
**Preparation time:** 3 weeks
**Execution time:** Estimated at 48 hours
**Ratio:** 10:1 (preparation vs execution)
---
### 3. Customization > One-Size-Fits-All
The original CNPJ Fast process needed to be **restructured** for this client.
**Lesson:** Processes should be:
- Structured enough to repeat
- Flexible enough to adapt
---
### 4. Monitoring is Crucial
Scripts with **detailed progress logs** allow:
- Estimate remaining time
- Identify bottlenecks
- Pause/resume with confidence
- Report status to stakeholders
```sql
-- Log example
Processed: 10,000,000 rows. Batch: 100,000
Elapsed time: 3600 seconds (10% complete, ~9h remaining)
```
---
## Performance Optimizations
### Optimizations Implemented
1. **Temporary index WHERE NULL**
- Speeds up lookup of unmigrated records
- Removed after completion
2. **Optimized batch size**
- Balanced between performance and lock time
3. **Transaction log management**
```sql
-- Check log growth
DBCC SQLPERF(LOGSPACE);
-- Adjust recovery model (if allowed)
ALTER DATABASE MyDatabase SET RECOVERY SIMPLE;
```
4. **Execution during low-load hours**
- Overnight maintenance window
- Weekend (if possible)
---
**Expected result:** Migration of 100 million records in ~48 hours, without significant downtime and with possibility of fast rollback.
[Need to migrate massive data volumes? Get in touch](#contact)