resources

// glossary // 12 terms

The vocabulary of safe test data.

Definitions our solutions engineers actually use on customer calls. Short, plain, useful.

// Provisioning

Virtual Database (VDB)

Provisioning

An isolated, writable database created from a snapshot of a source. Behaves like a real database but uses virtual clone storage.

SOFI VDBs are provisioned in seconds because they reference unchanged blocks in the source snapshot instead of copying them. Every VDB has its own connection URL, isolated schemas, configurable TTL, and an audit trail attaching it back to the policy and approver that authorized it.

See also:Virtual CloneSnapshotCopy-on-Write (CoW)

Subsetting

Provisioning

Producing a smaller VDB that preserves referential integrity. Useful when 100GB is too big for a dev laptop.

SOFI subsetting starts from a root table (e.g., 1,000 customers) and follows foreign keys to produce a coherent slice. The output is a real, working database — not random rows that violate FK constraints.

See also:Virtual Database (VDB)Referential Integrity

// Privacy

Masking

Privacy

Replacing sensitive data values with realistic substitutes that preserve format, joins, distributions, and application behavior.

SOFI ships 50+ masking rules: Faker pt_BR for names, valid CPF/CNPJ generators, format-preserving email, region-aware phone numbers, distribution-preserving numerics, JSON path masking, XML XPath masking. Masking is deterministic by default, so the same source value always maps to the same masked value — that's what keeps joins between databases consistent.

See also:Deterministic MaskingPII DetectionFormat Preserving

PII Detection

Privacy

Automatically identifying which columns hold Personally Identifiable Information so masking rules can be applied.

SOFI detects 27 PII categories including CPF, CNPJ, RG, email, name, phone, address, IBAN, credit card, IP, geolocation, date of birth, salary. Detection combines column-name heuristics, value pattern matching, and statistical profiling so it works even when columns are named cryptically.

See also:MaskingSchema ProfilingCompliance

Deterministic Masking

Privacy

Same input always produces the same masked output. Keeps joins between databases consistent even when masked separately.

SOFI uses a `consistency_key` (typically `customer_id`) so that the user with CPF X gets the same masked CPF X' across every database in the org. Without determinism, a join between `users` and `orders` masked separately would break.

See also:MaskingCross-DB Consistency

// Replication

Change Data Capture (CDC)

Replication

A pattern for streaming database changes — inserts, updates, deletes — to downstream systems in real time.

SOFI tails PostgreSQL WAL (logical replication, pgoutput plugin) and MySQL binlog (row-based, GTIDs) to produce CDCEvent streams: operation, schema, table, data, old_data, primary_key, LSN, transaction ID. Events feed incremental snapshots, refreshes, and downstream pipelines.

See also:Logical ReplicationWALBinlog

// Storage

Copy-on-Write (CoW)

Storage

A storage technique where new VDBs reference unchanged blocks instead of copying them. Writes allocate new blocks only when data diverges.

SOFI splits source databases into 4MB blocks identified by SHA-256 hash and compressed with zstd. New VDBs only allocate blocks that diverge from the source. A 100GB production database typically produces an 8GB clone — 12x dedup ratio is normal in OLTP workloads.

See also:Virtual CloneSnapshotBlock Storage

Virtual Clone

Storage

A clone that uses copy-on-write to share storage with its source. Provisioning is fast and cheap.

SOFI virtual clones boot in seconds because no bytes are copied initially. Storage cost grows only as the clone diverges. A 7-day VDB that runs daily ETL tests typically ends up using 200–500MB of new storage on top of the shared base.

See also:Copy-on-Write (CoW)Virtual Database (VDB)

// Compliance

LGPD

Compliance

Lei Geral de Proteção de Dados — Brazil's general data protection law, equivalent to GDPR.

LGPD requires explicit consent, purpose limitation, data minimization, and breach notification. SOFI attaches LGPD evidence to every masking job: which fields were processed, which rules applied, who approved, and when the result was used. ANPD audits can be answered with one export.

See also:GDPRCompliance EvidenceANPD

GDPR

Compliance

EU's General Data Protection Regulation. SOFI's masking and audit trail support GDPR Article 32 (security of processing) requirements.

GDPR fines reach 4% of global revenue. SOFI ensures lower environments hold no personal data of EU residents, with audit evidence per refresh. Right-to-erasure requests propagate through the masking layer rather than requiring a full re-clone.

See also:LGPDCompliance EvidenceArticle 32

// Concepts

Test Data Management (TDM)

Concepts

The discipline of providing safe, realistic, fast database copies for non-production work.

TDM sits at the intersection of platform engineering, data privacy, and developer productivity. Done well, it removes a major bottleneck for software teams. Done poorly, it becomes a compliance liability. SOFI is opinionated TDM: VDB-first, masking-by-default, audit-mandatory.

See also:Virtual Database (VDB)MaskingCompliance

// Governance

Snapshot Lock

Governance

An immutable marker on a snapshot, preventing accidental or malicious deletion.

SOFI snapshots can be locked by an approver. Locked snapshots cannot be deleted, even by admins, until explicitly unlocked. This is the foundation for compliance evidence: an auditor can be guaranteed that the snapshot they're reviewing is exactly the one that was provisioned.

See also:Audit TrailCompliance EvidenceImmutability

missing a term?

Ask us — we'll add it.