[ 200 OK ][ .SQL ][ MASKED ][ .VDB ]

Test data, masked at the source

SOFI creates masked, production-like databases inside your own network — for dev, QA, analytics and compliance — without copying sensitive data into an external SaaS. Runs in your VPC or on-premise.

private.sofi.local/vdbs/customer_360_qa
200 OK · 87 ms

// provisioning flow

1snapshot

CoW · 1.2M rows

2mask

pii.default

3clone

thin · 0 copy

Readycustomer_360_qa

isolated VDB · masked · always-on

// sample row · masked VDB

customer_idc_8421
email****@acme.com
ltv4823.50
last_order_at2026-04-29T18:11Z

// trusted lower-environment data

For teams that need realistic test data without exposing production PII in every tool.

Private deploy

VPC / on-prem

Governance

RBAC + masking

Audit

Every refresh logged

Automation

CI/CD + API
[ 01 / 08 ] How it works

// For teams who build //

From production to a masked clone in minutes

Connect a source, profile the PII, apply masking, and provision isolated test databases on demand.

1

Connect

prod.customers
2

Profile

pii.detected
3

Mask

pii.default
4

Provision

customer_360_qa

// provisioning flow

Lineage preserved from source to environment

Every VDB keeps its source, snapshot, masking policy and owner. The same definition feeds psql/pgwire, JDBC/ODBC, REST and your CI pipelines.

// schema preview

customer_id
uuid
key
email
varchar
masked
lifetime_value
decimal
visible
last_order_at
timestamp
visible
# pip install sofi
from sofi import Sofi

app = Sofi(api_key="YOUR_KEY")
vdb = app.provision(
    source="prod.customers",
    masking="pii_default",
    engine="postgres",
)
print(vdb.connection_uri)
[ 02 / 08 ] Capabilities

// what SOFI does //

Everything to run masked lower environments

Virtualize, mask, refresh, connect, snapshot and operate — one platform inside your network.

Virtualize

Create isolated VDBs from production snapshots — no full copy, no copy tickets, no waiting.

Mask

Replace PII with realistic values while keeping joins, constraints, formats and totals intact.

Refresh

Keep dev, QA, demo and analytics current without re-running slow export-and-sanitize jobs.

Connect

Postgres, MySQL, SQL Server, Oracle, MongoDB, ClickHouse, Snowflake, BigQuery and more.

Snapshot

Copy-on-write snapshots with block dedup — lock, rewind, undo and time-travel any VDB.

Operate

Run inside your VPC or bare metal with SSO, RBAC, audit logs and private routing.

[ 03 / 08 ] Automation

// CI/CD ready //

Provision test data from your pipeline

The same masked VDB from the API, the CLI or a GitHub Action — RBAC, masking and audit on every call.

$ sofi vdb create
API-native

// automation

Provision from your terminal

Sourceprod.customers
Maskingpii_default
Enginepostgres
$ sofi login --host private.sofi.local
$ sofi vdb create \
    --source prod.customers \
    --masking pii_default \
    --name customer_360_qa

✓ snapshot ready        1.2M rows
✓ masking applied       3 PII columns
✓ vdb provisioned       customer_360_qa
→ psql "host=private.sofi.local dbname=customer_360_qa"

// every environment

Four guarantees before a team sees a row

REST, CLI and CI share one plan: every environment inherits the same policy, with no duplicated logic.

1
Identity

JWT or API key resolves the requester and tenant.

2
RBAC

Policy by role, scope, purpose and environment.

3
Mask

PII masked per column before the snapshot clones.

4
Audit

Requester, source and rows recorded on every refresh.

avg provision · <90 s per environment

[ 04 / 08 ] Core

// production-ready //

Performance for real workloads

Copy-on-write snapshots, block dedup and hyperscale masking that handles 100M+ rows without OOM.

new environment

<0

seconds

storage per VDB

~0

% of source

data movement

0

bytes leave network

audit + lineage

0

% coverage

// benchmark

Same 1.2M-row source, three ways to get a test DB

Standing up a masked dev database from prod.customers with PII rules applied.

Export + copy + sanitize script3 days
SOFI · first masked snapshot4 min
SOFI · virtual clone from snapshot87 s

// why it's fast

Four engineering choices

Copy-on-write

Virtual clones share blocks; only changed data costs storage.

Block dedup

SHA-256, 4MB blocks, zstd — snapshots stay small and fast.

Hyperscale masking

Tables split into chunks, fanned out via Celery for 100M+ rows.

Policy in the plan

RBAC, masking and audit live in the provisioning path, not a wrapper.

[ 05 / 08 ] Governance

// governed by default //

Policy inside the snapshot

Masking, access, lineage, audit and lifecycle in a single operational layer.

SQL → VDB

// Sources to VDBs

Clone without copying

Virtual clone snapshots so many environments share one masked footprint instead of full duplicates.

email ****
cpf ****
phone ****

// Masked at the source

Mask before it lands

Deterministic, format-preserving masking applied to the snapshot before any team touches it.

consistency_key · tenant
same PII → same value

// Cross-source

Consistent everywhere

The same email maps to the same fake value across every database, so joins keep working.

dev allow
contractor mask-only
analytics allow

// RBAC by default

Access with controls

Role, tenant and purpose decide who can provision, refresh or export which environment.

audit.tail()
action=refresh
result=masked

// Every refresh

Queryable audit

Each provision, refresh and access becomes a structured event with role, scope and rows.

Lock · Rewind
Undo · Subset
V2P export

// One surface

Lock, rewind, revoke

Lock a snapshot, rewind a VDB, undo a refresh or revoke an environment without redeploys.

[ 06 / 08 ] Use cases

// use cases //

Turn production into safe environments

Ready-made patterns to deliver realistic data with control, reuse and traceability.

Masked copies for dev & QA, in minutes

Give every developer and tester an isolated, production-like database without copy tickets or exposed PII.

Prod

postgres · read-only

Snapshot

CoW · masked

VDBs

34 · isolated

// shared snapshot

customer_360_qa

One masked snapshot, many virtual clone environments for dev, QA and demos.

// status

customer_iduuid
emailvarchar
lifetime_valuedecimal
last_order_attimestamp

Dev & QA databases

Production-like, masked copies for every engineer and tester, on demand.

1 snapshot · N VDBs

CI / CD pipelines

Ephemeral per-PR databases that spin up fast and tear down on merge.

< 90 s · per-PR

Analytics & sandboxes

Refresh sandbox and demo data on a schedule without ETL round-trips.

nightly · scheduled

LGPD compliance

Mask PII, audit every refresh and prove lineage for lower environments.

100% audit coverage

Database migration

Validate Oracle → Postgres or version upgrades against masked clones first.

0 downtime

Training & support

Give partners and support realistic data that never exposes real customers.

safe by default

[ 07 / 08 ] Integrations

// connect everything //

Databases, warehouses and pipelines

38+ connectors with read-only source access, plus CI/CD, identity and observability.

Relational · 12

Postgres, MySQL, SQL Server, Oracle

postgresql
mysql
sqlserver
oracle

Distributed · 8

CockroachDB, TiDB, MariaDB, DB2

CR
tidb
mariadb
db2

Warehouses · 8

Snowflake, BigQuery, Redshift, Databricks

snowflake
bigquery
redshift
databricks

NoSQL, Graph & Search · 6

MongoDB, Cassandra, Elastic, Neo4j

mongodb
cassandra
elasticsearch
neo4j

Streaming & cache

Kafka, Redis, ClickHouse, Trino

kafka
redis
clickhouse
trino

Files, CI & automation

CSV, JSON, Parquet, Jenkins, GitHub Actions, Terraform

npx @sofi/cli init
sofi vdb create --source prod.customers
uses: sofi/sofi-provision@v1

[ 08 / 08 ] Customers & comparison

// data teams //

Masked environments in production

What engineering, QA and governance gain after the first SOFI deployment.

We replaced a three-day copy-and-sanitize job with a masked snapshot that clones in seconds. Same governance, no nightly batch.
Lead Data Engineer · tier-1 bank
Every pull request now gets its own masked database. QA stopped sharing one stale environment overnight.
Head of QA · healthcare
Our LGPD audit went from a quarterly spreadsheet to a SQL query with evidence ready.
Data Governance Officer · public sector
// capabilitySOFILegacy TDMDIY scripts
Masked, production-like data in lower environmentsCustom
Virtual clone VDBs instead of full copiesPartial
Deterministic, cross-source maskingLimited
Self-hosted private deploymentAdd-on
Built-in LGPD / RBAC / audit trailPartial
Ephemeral DBs in CI (REST + CLI + Action)Custom
Time-to-first-environment< 2 weeks3-6 months6-12 months
Storage footprint per environment~2%100%100%

// enterprise only //

One contract, one platform

The pricing conversation is about deployment, risk, governance and success criteria — not picking between smaller SaaS tiers.

// enterprise

SOFI Enterprise Platform

Custom

Annual contract for organizations that need masked, governed test data in regulated, private or large-scale environments.

Talk to sales →

Virtualization engine

Virtual clone VDBs from snapshots, no full copies to another store.

Masking layer

Deterministic PII masking, RBAC, audit and LGPD flows in the provisioning path.

Snapshots & lifecycle

Copy-on-write snapshots, lock, rewind, undo and subset.

Endpoints

REST, psql/pgwire, JDBC/ODBC, CLI and the CI GitHub Action.

Enterprise identity

SSO, SAML/OIDC, SCIM, custom domains and tenant controls.

Private deploy

Private VPC, on-prem, air-gapped or managed enterprise cloud.

[ VIRTUALIZE ][ MASK ][ REFRESH ][ AUDIT ]

// start now

Ready to mask your test data?

Stand up a masked, production-like database over your own data in under two weeks. Nothing leaves your environment and every refresh is auditable by design.

No credit card · Private deployment available

// FAQ //

Frequently asked questions

The essentials for evaluating the SOFI deployment model.

Does production data ever leave my network?

No. SOFI runs inside your VPC, private cloud or bare metal. Source access is read-only and masked data never leaves your environment.

How does masking keep my app working?

Masking is deterministic and format-preserving: the same input always maps to the same realistic fake value, so joins, constraints, totals and formats stay intact across every database.

How fast is a new test database?

Most VDBs provision in under two minutes from an existing masked snapshot, using copy-on-write virtual clones that take about 2% of the source footprint.

Which databases are supported?

PostgreSQL, MySQL, MariaDB, SQL Server, Oracle, DB2, MongoDB, Cassandra, ClickHouse, Snowflake, BigQuery, Redshift, Databricks and more — plus CSV, JSON and Parquet files.

How do I use it in CI/CD?

Use the REST API, the sofi CLI or the GitHub Action to provision an ephemeral masked database per pull request and tear it down automatically on merge.

How is sensitive data protected?

Column masking, row-level controls, RBAC, encrypted credentials and a queryable audit trail are applied before any environment is handed to a team.