ZahirScan

ZahirScan is a high-performance Rust tool for template-based content compression and metadata extraction across logs, documents, tabular data, media, archives, models, and more.

UBLX uses ZahirScan for deep previews, Templates / Metadata / Writing panes, and flat JSON export. You can also run it standalone.

What it does

Capability	Summary
Template mining	Repeated patterns in logs and text → templates with placeholders
Metadata	Per-format stats — media codecs, document properties, tabular schemas, SQLite schema, etc.
Size reduction	Output JSON is typically 80–95% smaller than raw while preserving structure
Performance	Memory-mapped I/O, adaptive Rayon batching, fd-limit path batching

Explore

Use the sidebar to jump between topics:

Page	Contents
Install	`cargo install`, optional features, system deps
CLI	Flags, output modes, `init`
Supported formats	Full format list by category
Metadata extraction	Per-format `*_metadata` blocks and column statistics
Template mining	Repeated patterns, compression, placeholders
Writing footprint	Prose style metrics for text-like files
Architecture	Phase 1 / Phase 2, batching, streaming sinks
Configuration	`zahirscan.toml`, filters, adaptive batching
Library	Rust API overview → docs.rs
UBLX integration	Batch vs on-demand enhance in the catalog stack

Quick start

bash

cargo install zahirscan
zahirscan -i /path/to/file.log -o ./out

Templates-only (default) vs full metadata: see CLI.

Tetration .tet files

Tensor store layout and the .tet format are documented on tetration-docs. ZahirScan extracts catalog, dataset, and column stats from .tet files for UBLX and standalone runs.

ZahirScan ​

What it does ​

Explore ​

Quick start ​

ZahirScan

What it does

Explore

Quick start