Skip to content

ZahirScan

ZahirScan is a high-performance Rust tool for template-based content compression and metadata extraction across logs, documents, tabular data, media, archives, models, and more.

UBLX uses ZahirScan for deep previews, Templates / Metadata / Writing panes, and flat JSON export. You can also run it standalone.

What it does

CapabilitySummary
Template miningRepeated patterns in logs and text → templates with placeholders
MetadataPer-format stats — media codecs, document properties, tabular schemas, SQLite schema, etc.
Size reductionOutput JSON is typically 80–95% smaller than raw while preserving structure
PerformanceMemory-mapped I/O, adaptive Rayon batching, fd-limit path batching

Explore

Use the sidebar to jump between topics:

PageContents
Installcargo install, optional features, system deps
CLIFlags, output modes, init
Supported formatsFull format list by category
Metadata extractionWhat each format type returns
Template miningOverview, writing footprint, column stats
ArchitecturePhase 1 / Phase 2, batching, streaming sinks
Configurationzahirscan.toml, filters, adaptive batching
LibraryRust API overview → docs.rs
UBLX integrationBatch vs on-demand enhance in the catalog stack

Quick start

bash
cargo install zahirscan
zahirscan -i /path/to/file.log -o ./out

Templates-only (default) vs full metadata: see CLI.

Tetration .tet files

Tensor store layout and the .tet format are documented on tetration-docs. ZahirScan extracts catalog, dataset, and column stats from .tet files for UBLX and standalone runs.

UBLX · Nefaxer · ZahirScan