Scan depth
The depth parameter controls how much metadata is extracted from files, folders, datasets, and databases. Set it once as a global default, or override it for any add entry.
depth: variable # global default
add:
- folder: ./data # inherits "variable"
- folder: ./big
depth: stat # override for this entry
- database: sqlite:///db.sqlite
depth: dataset| Feature | dataset | variable | stat | value (default) |
|---|---|---|---|---|
| Folders | ✓ | ✓ | ✓ | ✓ |
| Datasets (format, path, mtime) | ✓ | ✓ | ✓ | ✓ |
| Variables (names, types) | ✓ | ✓ | ✓ | |
| DB introspection (PK, FK, comments) | ✓ | ✓ | ✓ | |
| Row count, statistics | ✓ | ✓ | ||
| Enumerations, frequencies, patterns | ✓ | |||
| Auto-tagging (format, security, text) | ✓ |
Note: At
depth="variable", CSV and Excel files only extract column names (types require reading data, available fromdepth="stat"). All other formats provide types at this level.
Typical use cases:
dataset— quick inventory of available files/tables without reading datavariable— lightweight schema discovery (column names and types)stat— data profiling without enumeration detection (faster thanvalue)value— full catalog with frequency tables and enumeration assignment (default)
Auto-tagging
At depth="value" (default), string columns are automatically tagged by content type. Tags use a two-level hierarchy under the auto parent:
| Category | Tags |
|---|---|
| Format | auto---email, auto---phone, auto---uuid, auto---iban |
| Security | auto---bcrypt, auto---argon2, auto---jwt, auto---secret |
| Text | auto---structured, auto---semi-structured, auto---free-text → auto---natural-text |
Each variable receives at most one leaf tag. The frontend can use parent_id to filter by category (e.g. selecting auto---security shows all bcrypt/argon2/jwt/secret variables).
Security-tagged columns (bcrypt, argon2, jwt, secret) have their raw frequency values suppressed — only pattern frequencies are emitted, so no actual secrets appear in the exported catalog.
Policy tags
Policy tags let you manually control scan behavior for specific variables. Like auto-tags, they live under the scan hierarchy and are auto-created — no tag.csv entry needed.
| Tag | Effect |
|---|---|
policy---frequency-hidden | Suppress all frequency and enumeration data (stats remain visible) |
Assign the tag in your variable.csv metadata:
id,tag_ids
source---employees_csv---first_name,"policy---frequency-hidden"This is useful for sensitive columns that auto-tagging does not flag (e.g. first names, internal IDs, business codes).