ZahirScan
ZahirScan is a high-performance Rust tool for template-based content compression and metadata extraction across logs, documents, tabular data, media, archives, models, and more.
UBLX uses ZahirScan for deep previews, Templates / Metadata / Writing panes, and flat JSON export. You can also run it standalone.
What it does
| Capability | Summary |
|---|---|
| Template mining | Repeated patterns in logs and text → templates with placeholders |
| Metadata | Per-format stats — media codecs, document properties, tabular schemas, SQLite schema, etc. |
| Size reduction | Output JSON is typically 80–95% smaller than raw while preserving structure |
| Performance | Memory-mapped I/O, adaptive Rayon batching, fd-limit path batching |
Explore
Use the sidebar to jump between topics:
| Page | Contents |
|---|---|
| Install | cargo install, optional features, system deps |
| CLI | Flags, output modes, init |
| Supported formats | Full format list by category |
| Metadata extraction | What each format type returns |
| Template mining | Overview, writing footprint, column stats |
| Architecture | Phase 1 / Phase 2, batching, streaming sinks |
| Configuration | zahirscan.toml, filters, adaptive batching |
| Library | Rust API overview → docs.rs |
| UBLX integration | Batch vs on-demand enhance in the catalog stack |
Quick start
bash
cargo install zahirscan
zahirscan -i /path/to/file.log -o ./outTemplates-only (default) vs full metadata: see CLI.
Tetration .tet files
Tensor store layout and the .tet format are documented on tetration-docs. ZahirScan extracts catalog, dataset, and column stats from .tet files for UBLX and standalone runs.