--- title: "Alphanumeric CNPJ Migration - 100 Million Records" slug: "cnpj-migration-database" summary: "Execution of massive CNPJ migration from numeric to alphanumeric in database with ~100M records, using phased commit strategy to avoid database locks." client: "Collection Agency" industry: "Collections & Financial Services" timeline: "In execution" role: "Database Architect & Tech Lead" image: "" tags: - SQL Server - Database Migration - CNPJ - Performance Optimization - Batch Processing - Big Data featured: true order: 4 date: 2024-11-01 seo_title: "Alphanumeric CNPJ Migration - 100M Records | Carneiro Tech" seo_description: "Case study of massive CNPJ migration in database with 100 million records using phased commits and performance optimizations." seo_keywords: "database migration, SQL Server, CNPJ, batch processing, performance optimization, phased commits" --- ## Overview A collection agency that works with transitory data databases (no proprietary software) needs to adapt its systems to the new Brazilian **alphanumeric CNPJ** format. **Main challenge:** Migrate ~**100 million records** in tables with `BIGINT` and `NUMERIC` columns to `VARCHAR`, without locking the production database. **Status:** Project in execution (migration script preparation). --- ## Challenge ### Massive Data Volume **Company context:** - Collection agency (does not develop proprietary software) - Works with **transitory data** (high turnover) - SQL Server database with critical volume **Initial analysis revealed:** | Table | Column | Current Type | Records | Size | |--------|--------|------------|-----------|---------| | Debtors | CNPJ_Debtor | BIGINT | 8,000,000 | 60 GB | | Transactions | CNPJ_Payer | NUMERIC(14) | 90,000,000 | 1.2 TB | | Companies | CNPJ_Company | BIGINT | 2,500,000 | 18 GB | | **TOTAL** | - | - | **~100,000,000** | **~1.3 TB** | **Identified problems:** 1. **Tables with 8M+ rows** using `BIGINT` for CNPJ 2. **90 million records** in transactions table 3. **CNPJ as primary key** in some tables 4. **Foreign keys** relating multiple tables 5. **Impossibility of extended downtime** (24/7 operation) 6. **Disk space restrictions** (requires efficient strategy) --- ## Strategic Decision: Phased Commits ### Why NOT do ALTER COLUMN directly? **Naive approach (DOESN'T work):** ```sql -- NEVER DO THIS ON LARGE TABLES ALTER TABLE Transactions ALTER COLUMN CNPJ_Payer VARCHAR(18); ``` **Problems:** - Locks entire table during conversion - Can take hours/days on large tables - Blocks all operations (INSERT, UPDATE, SELECT) - Risk of timeout or failure mid-operation - Complex rollback if something goes wrong --- ### Chosen Strategy: Column Swap with Phased Commits **Based on previous experience**, I decided to use a gradual approach: ``` ┌─────────────────────────────────────────────┐ │ 1. Create new VARCHAR column at END │ │ (fast operation, doesn't lock table) │ └─────────────────────────────────────────────┘ ▼ ┌─────────────────────────────────────────────┐ │ 2. UPDATE in batches (phased commits) │ │ - 100k records at a time │ │ - Pause between batches (avoid lock) │ └─────────────────────────────────────────────┘ ▼ ┌─────────────────────────────────────────────┐ │ 3. Remove PKs and FKs │ │ (after 100% migrated) │ └─────────────────────────────────────────────┘ ▼ ┌─────────────────────────────────────────────┐ │ 4. Rename columns (swap) │ │ - CNPJ → CNPJ_Old │ │ - CNPJ_New → CNPJ │ └─────────────────────────────────────────────┘ ▼ ┌─────────────────────────────────────────────┐ │ 5. Recreate PKs/FKs with new column │ └─────────────────────────────────────────────┘ ▼ ┌─────────────────────────────────────────────┐ │ 6. Validation and old column deletion │ └─────────────────────────────────────────────┘ ``` **Why this approach?** **No complete table lock** (incremental operation) **Can pause/resume** at any time **Real-time progress monitoring** **Simple rollback** (just drop new column) **Minimizes production impact** (small commits) **Decision based on:** - Previous experience with large volume migrations - Knowledge of SQL Server locks - Need for zero downtime **Note:** This decision was made **without consulting AI** - based purely on practical experience from previous projects. --- ## Implementation Details ### Phase 1: Create New Column ```sql -- Fast operation (metadata change only) ALTER TABLE Transactions ADD CNPJ_Payer_New VARCHAR(18) NULL; -- Add temporary index to speed up lookups CREATE NONCLUSTERED INDEX IX_Temp_CNPJ_New ON Transactions(CNPJ_Payer_New) WHERE CNPJ_Payer_New IS NULL; ``` **Estimated time:** ~1 second (independent of table size) --- ### Phase 2: Batch Migration (Core Strategy) ```sql -- Migration script with phased commits DECLARE @BatchSize INT = 100000; -- 100k records per batch DECLARE @RowsAffected INT = 1; DECLARE @TotalProcessed INT = 0; DECLARE @StartTime DATETIME = GETDATE(); WHILE @RowsAffected > 0 BEGIN BEGIN TRANSACTION; -- Update batch of 100k records not yet migrated UPDATE TOP (@BatchSize) Transactions SET CNPJ_Payer_New = RIGHT('00000000000000' + CAST(CNPJ_Payer AS VARCHAR), 14) WHERE CNPJ_Payer_New IS NULL; SET @RowsAffected = @@ROWCOUNT; SET @TotalProcessed = @TotalProcessed + @RowsAffected; COMMIT TRANSACTION; -- Progress log PRINT 'Processed: ' + CAST(@TotalProcessed AS VARCHAR) + ' rows. Batch: ' + CAST(@RowsAffected AS VARCHAR); PRINT 'Elapsed time: ' + CAST(DATEDIFF(SECOND, @StartTime, GETDATE()) AS VARCHAR) + ' seconds'; -- Pause between batches (reduces contention) WAITFOR DELAY '00:00:01'; -- 1 second between batches END; PRINT 'Migration completed! Total rows: ' + CAST(@TotalProcessed AS VARCHAR); ``` **Configurable parameters:** - `@BatchSize`: 100k (balanced between performance and lock time) - Too small = many transactions, overhead - Too large = prolonged lock, production impact - `WAITFOR DELAY`: 1 second (gives time for other queries to run) **Time estimates:** | Records | Batch Size | Estimated Time | |-----------|------------|----------------| | 8,000,000 | 100,000 | ~2-3 hours | | 90,000,000 | 100,000 | ~20-24 hours | **Advantages:** - Doesn't freeze application - Other queries can run between batches - Can pause (Ctrl+C) and resume later (WHERE NULL picks up where it left off) - Real-time progress log --- ### Phase 3: Constraint Removal ```sql -- Identifies all PKs and FKs involving the column SELECT name FROM sys.key_constraints WHERE type = 'PK' AND parent_object_id = OBJECT_ID('Transactions') AND COL_NAME(parent_object_id, parent_column_id) = 'CNPJ_Payer'; -- Remove PKs ALTER TABLE Transactions DROP CONSTRAINT PK_Transactions_CNPJ; -- Remove FKs (tables that reference) ALTER TABLE Payments DROP CONSTRAINT FK_Payments_Transactions; ``` **Estimated time:** ~10 minutes (depends on how many constraints exist) --- ### Phase 4: Column Swap (Renaming) ```sql -- Rename old column to _Old EXEC sp_rename 'Transactions.CNPJ_Payer', 'CNPJ_Payer_Old', 'COLUMN'; -- Rename new column to original name EXEC sp_rename 'Transactions.CNPJ_Payer_New', 'CNPJ_Payer', 'COLUMN'; -- Change to NOT NULL (after validating 100% populated) ALTER TABLE Transactions ALTER COLUMN CNPJ_Payer VARCHAR(18) NOT NULL; ``` **Estimated time:** ~1 second (metadata change) --- ### Phase 5: Constraint Recreation ```sql -- Recreate PK with new VARCHAR column ALTER TABLE Transactions ADD CONSTRAINT PK_Transactions_CNPJ PRIMARY KEY CLUSTERED (CNPJ_Payer); -- Recreate FKs ALTER TABLE Payments ADD CONSTRAINT FK_Payments_Transactions FOREIGN KEY (CNPJ_Payer) REFERENCES Transactions(CNPJ_Payer); ``` **Estimated time:** ~30-60 minutes (depends on volume) --- ### Phase 6: Validation and Cleanup ```sql -- Validate that 100% was migrated SELECT COUNT(*) FROM Transactions WHERE CNPJ_Payer IS NULL OR CNPJ_Payer = ''; -- Validate referential integrity DBCC CHECKCONSTRAINTS WITH ALL_CONSTRAINTS; -- If everything OK, remove old column ALTER TABLE Transactions DROP COLUMN CNPJ_Payer_Old; -- Remove temporary index DROP INDEX IX_Temp_CNPJ_New ON Transactions; ``` --- ## CNPJ Fast Process Customization ### Differences vs. Original Process The original **CNPJ Fast** process was **restructured** for this client: **Main changes:** | Aspect | Original CNPJ Fast | Client (Customized) | |---------|-------------------|---------------------| | **Focus** | Applications + DB | DB only (no proprietary software) | | **Discovery** | App inventory | Schema analysis only | | **Execution** | Multiple applications | Massive SQL scripts | | **Batch Size** | 50k-100k | 100k (optimized for volume) | | **Monitoring** | Manual + tools | Real-time SQL logs | | **Rollback** | Complex process | Simple (DROP COLUMN) | **Reason for restructuring:** - Client has no proprietary applications (only consumes data) - 100% focus on database optimization - Much larger volume than typical cases (100M vs ~10M) --- ## Tech Stack `SQL Server` `T-SQL` `Batch Processing` `Performance Tuning` `Database Optimization` `Migration Scripts` `Phased Commits` `Index Optimization` `Constraint Management` --- ## Key Decisions & Trade-offs ### Why 100k per batch? **Performance tests:** | Batch Size | Time/Batch | Lock Duration | Contention | |------------|-------------|---------------|-----------| | 10,000 | 2s | Low | Minimal | | 50,000 | 8s | Medium | Acceptable | | **100,000** | 15s | **Medium** | **Balanced** | | 500,000 | 90s | High | Production impact | | 1,000,000 | 180s | Very high | Unacceptable | **Choice:** 100k offers best balance between performance and impact. --- ### Why create column at END? **SQL Server internals:** - Add column at end = metadata change (fast) - Add in middle = page rewrite (slow) - For large tables, position matters --- ### Why WAITFOR DELAY of 1 second? **Without delay:** - Batch processing consumes 100% of I/O - Application queries slow down - Lock escalation may occur **With 1s delay:** - Other queries have window to execute - Distributed I/O - User experience preserved **Trade-off:** Migration takes +1s per batch (~25% slower), but system remains responsive. --- ## Current Status & Next Steps ### Current Status (December 2024) **Preparation Phase:** - Discovery complete (100M records identified) - Migration scripts developed - Tests in staging environment - Performance validation in progress - Awaiting production maintenance window ### Next Steps 1. **Complete production backup** 2. **Production execution** (24/7 environment) 3. **Real-time monitoring** during migration 4. **Post-migration validation** (integrity, performance) 5. **Lessons learned documentation** --- ## Lessons Learned (So Far) ### 1. Previous Experience is Gold Decision to use phased commits came from **practical experience** in previous projects, not from documentation or AI. **Similar previous situations:** - E-commerce data migration (50M records) - Encoding conversion (UTF-8 in 100M+ rows) - Historical table partitioning --- ### 2. "Measure Twice, Cut Once" Before executing in production: - Exhaustive tests in staging - Scripts validated and reviewed - Rollback tested - Time estimates confirmed **Preparation time:** 3 weeks **Execution time:** Estimated at 48 hours **Ratio:** 10:1 (preparation vs execution) --- ### 3. Customization > One-Size-Fits-All The original CNPJ Fast process needed to be **restructured** for this client. **Lesson:** Processes should be: - Structured enough to repeat - Flexible enough to adapt --- ### 4. Monitoring is Crucial Scripts with **detailed progress logs** allow: - Estimate remaining time - Identify bottlenecks - Pause/resume with confidence - Report status to stakeholders ```sql -- Log example Processed: 10,000,000 rows. Batch: 100,000 Elapsed time: 3600 seconds (10% complete, ~9h remaining) ``` --- ## Performance Optimizations ### Optimizations Implemented 1. **Temporary index WHERE NULL** - Speeds up lookup of unmigrated records - Removed after completion 2. **Optimized batch size** - Balanced between performance and lock time 3. **Transaction log management** ```sql -- Check log growth DBCC SQLPERF(LOGSPACE); -- Adjust recovery model (if allowed) ALTER DATABASE MyDatabase SET RECOVERY SIMPLE; ``` 4. **Execution during low-load hours** - Overnight maintenance window - Weekend (if possible) --- **Expected result:** Migration of 100 million records in ~48 hours, without significant downtime and with possibility of fast rollback. [Need to migrate massive data volumes? Get in touch](#contact)