Metadata & configuration
Manual metadata
# Load from a folder containing metadata files
metadata_path: ./metadata
# Or from a database
metadata_path: sqlite:///metadata.db
# Or a list of sources applied in order (later overrides/extends earlier)
metadata_path:
- ./metadata-base # shared or generated metadata
- ./metadata-overlay # local overrides (e.g. add policy tags)Can be used alone or combined with auto-scanned sources (add_folder, add_database). Metadata is applied automatically before export.
Expected structure: One file/table per entity, named after the entity type:
metadata/
├── variable.csv # Variables (descriptions, tags...)
├── dataset.xlsx # Datasets
├── organization.json # Organizations (owners, managers)
├── tag.csv # Tags
├── concept.csv # Business glossary concepts
├── enumeration.csv # Enumerations
├── value.csv # Enumeration values
├── config.csv # Web app config (see app_config)
└── ...Supported formats: CSV, Excel (.xlsx), JSON, SAS (.sas7bdat), or database tables.
File format: Standard tabular structure following datannur schemas.
# variable.csv
id,description,tag_ids
source---employees_csv---salary,"Monthly gross salary in euros","finance,hr"
source---employees_csv---department,"Department code","hr"The most common metadata columns are the same whether the source is CSV, Excel, JSON, SAS, or a database table:
| Entity/table | Useful columns |
|---|---|
folder | id, parent_id, name, description, manager_id, owner_id, tag_ids, doc_ids, license, link, localisation |
dataset | id, folder_id, name, description, manager_id, owner_id, tag_ids, doc_ids, license, data_path, link, localisation, start_date, end_date, updating_each |
variable | id, name, dataset_id, description, tag_ids, enumeration_ids, concept_id, type, start_date, end_date |
organization | id, parent_id, name, description, email, phone, tag_ids, doc_ids |
tag | id, parent_id, name, description, doc_ids |
doc | id, name, description, path, type, last_update |
concept | id, parent_id, name, description, tag_ids, doc_ids |
enumeration | id, folder_id, name, description, type |
value | enumeration_id, value, description |
config | id, value |
For folder-based metadata sources, name each file after the entity (folder.csv, folder.xlsx, folder.json, etc.). List fields such as tag_ids, doc_ids, and enumeration_ids can be written as comma-separated values in tabular files, or as arrays in JSON. For full schema details, use the linked datannur schemas as the reference.
Merge behavior:
- Existing entities are updated (manual values override auto-scanned values)
- New entities are created
- List fields (
tag_ids,doc_ids, etc.) are merged - Empty cells and JSON empty arrays leave existing values unchanged
Ordering: Metadata is automatically applied before export/finalization, after all add_folder, add_dataset, and add_database calls, so manual values take precedence. If app_path/data/db-ui exists, it is loaded automatically as the last metadata source, after configured metadata_path sources. This lets local app edits override scanned metadata without adding data/db-ui to the configuration. data/db-ui uses the same supported metadata formats as any other metadata source; data/db remains the generated export.
Overlay instructions
Metadata sources can include small instructions for clearing fields, removing individual relations, or deleting entities before export. These instructions are consumed by the builder and are not written to data/db.
Use ! as the exact value of a scalar field to clear it:
id,description
source---employees_csv,!Use ! as the exact value of a relation field to clear all accumulated relations for that field:
id,tag_ids
source---employees_csv,!Use !id inside a relation list to remove one accumulated relation while keeping or adding others:
[
{
"id": "source---employees_csv---salary",
"tag_ids": ["finance", "!auto---numeric"]
}
]Relation removals are applied in metadata source order. If one row contains both id and !id, removal wins for that relation. Supported relation fields are tag_ids, doc_ids, enumeration_ids, and source_var_ids.
Use _delete: true to remove an entity from the final catalog:
[
{
"id": "source---old_dataset",
"_delete": true
}
]Deletion is supported for all ID-keyed catalog entities. Composite tables (value and frequency) are not deleted directly; they are removed through cascades from their parent enumeration, variable, dataset, or folder.
Cascades are applied before export:
- deleting a folder removes its descendant folders, contained datasets, dataset variables, frequencies, previews, and contained enumerations;
- deleting a dataset removes its variables, frequencies, and previews;
- deleting a variable removes its frequencies;
- deleting enumerations, tags, docs, concepts, or organizations also cleans related references where needed.
Metadata-first pattern
When the structure of the catalog (folders, dataset IDs, hierarchy) is defined entirely by metadata_path and the filesystem only provides files to scan for technical info (variables, stats, sizes, formats), use create_folders=False:
metadata_path: ./metadata
add:
- folder: ./data/parquet
create_folders: falseIn this mode:
- No folder is created from the scanned directory; the hierarchy is taken from
metadata/folder.csv. - Each scanned file is matched to its metadata entry via the
data_pathcolumn ofmetadata/dataset.csv. The scan reuses the metadata-definedidandfolder_id. - Files with no metadata match are reported according to
on_unmatched:"warn"(default),"skip", or"error".
Matching: the scan computes an absolute path for each file and compares it to the resolved data_path from metadata/dataset.csv. Relative data_path values are resolved against the metadata source directory (the folder containing dataset.csv). URLs and missing files are not matched.
Example — ./metadata/dataset.csv:
id,name,folder_id,data_path
sales-2024,Sales 2024,finance,../data/parquet/sales-2024.parquet
hr-headcount,HR headcount,hr,../data/parquet/hr-headcount.parquetCombined with add_folder('./data/parquet', create_folders=False), the scan attaches variables and stats to sales-2024 and hr-headcount (the IDs declared in metadata) without creating any folder from the disk layout.
Constraints:
create_folders=Falseis incompatible withmetadata=inadd_folder()(no folder is created).- The
data_pathyou write inmetadata/dataset.csvis also exported as-is in the output (it is the public link to the data, not a private match key). Use a URL or a deployment-relative path if you want it to remain valid in the front-end.
Environment variables
Environment variables ($VAR or ${VAR}) are expanded in all YAML values. All sources are loaded — env:, env_file, and .env next to the YAML file:
env:
data_dir: /shared/data
db_host: db.example.com
env_file: /secure/path/.env # secrets: DB_USER, DB_PASSWORD
add:
- folder: ${data_dir}/sales
- folder: ${data_dir}/hr
- database: oracle://${DB_USER}:${DB_PASSWORD}@${db_host}:1521/ORCLenv_file supports a list of paths (last overrides first):
env_file:
- /shared/common.env # defaults
- /secure/credentials.env # overrides common.envPriority (first set wins): system env vars > env: YAML > env_file > .env local.
app_config
Configure the web app with key-value entries (written as config.json):
app_path: ./my-catalog
app_config:
contact_email: contact@example.com
more_info: "Data from [open data portal](https://example.com)."If app_config is not provided, config.csv/config.xlsx/config.json (columns id, value) from metadata_path is used instead. If neither is provided, no config.json is generated.