Output & exports
Output
# Complete standalone app
app_path: ./my-catalog
open_browser: true
# JSON metadata only (for existing datannur instance)
output_dir: ./outputWhen exports are not run in quiet mode, datannurpy prints a size report by table after writing the database. The report includes raw .json, raw .json.js, and estimated gzipped .json sizes with percentages, which helps identify the tables that dominate catalog weight.
Large exports
Large catalogs are usually dominated by frequency and value, because those tables store repeated values for many variables. Use the export size report to check which tables matter before changing scan settings.
If the export is larger than expected, the main levers are scan depth, frequency generation, and sampling. depth: stat keeps variable statistics without writing frequency tables or enumerations; depth: variable keeps only schema-level metadata; freq_threshold controls when high-cardinality string columns switch from value frequencies to pattern frequencies; sample_size limits the rows used for frequency counts and enumeration detection while keeping core statistics on the full dataset.
.json.js reflects local or shared-folder usage (file://), .json reflects uncompressed HTTP, and .json.gz reflects HTTP served with gzip.
Dataset previews
At stat and value depth, datannurpy exports small dataset previews by default. Database-only exports write them to <output_dir>/preview/<dataset_id>.json and <output_dir>/preview/<dataset_id>.json.js; app exports place the same files under data/db/preview/. The JSON file is an array of row objects, and the JSON-JS file uses jsonjs.data['<dataset_id>'], matching the metadata table convention.
Use preview_rows to control the maximum number of rows per dataset. The default is 100; set preview_rows: 0 or preview_rows: false globally or on an individual add entry to disable previews for sensitive sources. Previews are not collected at dataset or variable depth, because those modes do not read data rows.
Incremental scan
Re-run with the same app_path to only rescan changed files (compares mtime) or tables (compares schema + row count):
app_path: ./my-catalog
add:
- folder: ./data # skips unchanged filesUse refresh: true to force a full rescan.
Existing app exports update data/db by default and preserve local app state under data/. To refresh the bundled front-end app files after upgrading datannurpy, set update_app: true or call catalog.export_app(update_app=True).
When app_path/data/db-ui exists, it is loaded automatically as the last metadata source before export. See Manual metadata for merge ordering and overlay instructions.
Evolution tracking
Changes between exports are automatically tracked in evolution.json:
- add: new folder, dataset, variable, enumeration, etc.
- update: modified field (shows old and new value)
- delete: removed entity
Cascade filtering: when a parent entity is added or deleted, its children are automatically filtered out to reduce noise. For example, adding a new dataset won't generate separate entries for each variable.
Disable tracking:
track_evolution: falsecopy_assets
Copy local files or directories into the exported catalog after export:
copy_assets:
- from: ./staging/docs
to: data/doc
include: "*.pdf"
clean: true
- from: ./data
to: data/sourceRules:
fromis resolved relative to the YAML config directorytois resolved relative to the export directory and must stay inside it- directories are copied recursively; single files are copied into the destination directory
includeis optional and accepts a glob string or list of globsclean: trueremoves destination files that are not present in the filtered source set- copies are incremental: a file is updated only when it is missing, its size changed, or its source
mtimeis newer
Works with both app_path and output_dir exports.
post_export
Run Python scripts automatically after export:
# Single script (bare name → python-scripts/start_app.py)
post_export: start_app
# Multiple scripts
post_export:
- export_dcat
- start_appScript resolution:
| Format | Resolved path |
|---|---|
start_app | {output}/python-scripts/start_app.py |
hook.py | {config_dir}/hook.py |
scripts/hook.py | {config_dir}/scripts/hook.py |
/absolute/path.py | /absolute/path.py |
Explicit script paths are resolved relative to the YAML config file directory, like the other path-based options.
copy_assets runs before post_export, so custom scripts can consume copied files.
Works with both app_path and output_dir exports.