Column statistics

Column statistics are not templates. They describe tabular and database columns from sampled rows and appear in format-specific metadata blocks. Delimited files and columnar binaries use a compact shared shape; SQLite uses a richer per-column struct.

Compact `columns` array (`ColumnStat`)

Used by CSV/TSV/tab/psv and columnar formats (Parquet, Arrow IPC, Avro, ORC) via flattened ColumnarCommonFields:

json

{
  "row_count": 50000,
  "column_count": 12,
  "stats_rows_sampled": 2000,
  "columns": [
    {
      "i": 0,
      "name": "user_id",
      "t": "number",
      "pt": "Int64",
      "null_pct": 0.0,
      "num": {
        "min": 1,
        "max": 98421,
        "mean": 44102.5,
        "median": 43800,
        "range": 98420,
        "iqr": 22000,
        "stdev": 28104.2
      }
    },
    {
      "i": 3,
      "name": "status",
      "t": "string",
      "null_pct": 1.2,
      "uniq": 4
    },
    {
      "i": 7,
      "name": "created_at",
      "t": "date",
      "date": {
        "span_days": 412.5,
        "span_minutes": 594000,
        "min": "2024-01-02T00:00:00Z",
        "max": "2025-02-18T23:59:59Z"
      }
    },
    {
      "i": 9,
      "name": "active",
      "t": "boolean",
      "bool": { "true_pct": 73.4 }
    }
  ]
}

`ColumnStat` field	Meaning
`i`	Column index
`name`	Column name when known
`t`	Inferred logical type: `number`, `string`, `date`, `boolean`, …
`pt`	Physical / schema dtype when available (e.g. Arrow `Int32`, Parquet type)
`null_pct`	Null rate in the sample (omitted if not computed)
`uniq`	Distinct non-null string values (string columns only)
`num`	Numeric stats
`bool`	Boolean stats (`true_pct` in JSON)
`date`	Date stats

Table-level fields

Field	Meaning
`row_count`	Total rows when known
`column_count`	Number of columns
`stats_rows_sampled`	Rows used for stats (sample may be smaller than `row_count`)
`encoding`	Text encoding when relevant

CSV-specific fields on csv_metadata: delimiter, quote_character, escape_character, has_header.

Sampling scales with file size, table width, and bytes-per-row when row count is known early — wide tables cap how many columns or rows are fully profiled.

Typed stat objects

Numeric stats

min, max, mean, median, range, iqr, stdev — used for numeric CSV/columnar columns and SQLite INTEGER/REAL.

Date stats

span_days, span_minutes, min, max (ISO-style strings) — parsed date columns in CSV or TEXT SQLite columns.

Boolean stats

true_pct (alias true_percentage) — share of true-like values (true, yes, 1, y, …).

Text stats (SQLite)

min_length, max_length, avg_length — for TEXT columns.

Blob stats (SQLite)

min_size, max_size, avg_size (bytes) — for BLOB columns.

SQLite columns

sqlite_metadata adds schema-oriented data: tables, primary/foreign keys, indexes, row counts. Each column is a ColumnInfo object (verbose names, not the compact i/t form):

Field	Meaning
`name`, `type_name`	Column name and declared SQLite type
`not_null`, `default_value`	Constraints
`is_primary_key`, `is_foreign_key`	Key roles
`null_percentage`, `unique_count`	Sampled null % and distinct count
`numeric_stats`, `date_stats`, `boolean_stats`, `text_stats`, `blob_stats`	Typed stats (boolean vs numeric are mutually exclusive)

Column stats without `columns`

Some formats expose shapes and dtypes instead of a columns array — NumPy (.npy/.npz), HDF5, NetCDF, Zarr, Matrix Market, MATLAB, Tetration .tet, model files. See Metadata extraction.

UBLX display

UBLX renders compact columns JSON as typed tables in Metadata / Writing (when column stats are present). Config key typed_column_tables: none | abbrev (default, caps wide tables at 20 rows) | full.

Column statistics ​

Compact columns array (ColumnStat) ​

Table-level fields ​

Typed stat objects ​

Numeric stats ​

Date stats ​

Boolean stats ​

Text stats (SQLite) ​

Blob stats (SQLite) ​

SQLite columns ​

Column stats without columns ​

UBLX display ​