Masking
PrivacyReplacing sensitive data values with realistic substitutes that preserve format, joins, distributions, and application behavior.
SOFI ships 50+ masking rules: Faker pt_BR for names, valid CPF/CNPJ generators, format-preserving email, region-aware phone numbers, distribution-preserving numerics, JSON path masking, XML XPath masking. Masking is deterministic by default, so the same source value always maps to the same masked value — that's what keeps joins between databases consistent.
See also:Deterministic MaskingPII DetectionFormat Preserving
PII Detection
PrivacyAutomatically identifying which columns hold Personally Identifiable Information so masking rules can be applied.
SOFI detects 27 PII categories including CPF, CNPJ, RG, email, name, phone, address, IBAN, credit card, IP, geolocation, date of birth, salary. Detection combines column-name heuristics, value pattern matching, and statistical profiling so it works even when columns are named cryptically.
See also:MaskingSchema ProfilingCompliance
Deterministic Masking
PrivacySame input always produces the same masked output. Keeps joins between databases consistent even when masked separately.
SOFI uses a `consistency_key` (typically `customer_id`) so that the user with CPF X gets the same masked CPF X' across every database in the org. Without determinism, a join between `users` and `orders` masked separately would break.
See also:MaskingCross-DB Consistency