Native Filesystem
Intelligence Engine
A native C extension for Python that recursively scans directories, categorizes every file by type, and returns a complete structured breakdown — in a single function call.
Overview
Three lines of Python. Any filesystem.
Anscom is designed for zero-configuration usage. Point it at any path and it handles everything: spawning worker threads, issuing OS syscalls, categorizing results, and returning a clean Python dict.
Unlike os.walk or pathlib.rglob, Anscom runs entirely in C with direct kernel-level syscalls. Memory is bounded regardless of filesystem size — scanning 50 million files uses no more memory than scanning 50,000.
Installation
Available on PyPI
Pre-compiled wheels for Windows, Linux, and macOS. No build tools required on any supported platform.
API Reference
anscom.scan()
The entire public API is a single function. All parameters have sensible defaults — only path is required.
| Parameter | Type | Default | Description |
|---|---|---|---|
| path | str | required | Target directory. Relative (.) or absolute (/data). |
| max_depth | int | 6 | Recursion depth limit. Clamped to [0, 64]. |
| show_tree | bool | False | Print DFS-ordered directory tree to stdout. Forces workers=1. |
| workers | int | 0 | Thread count. 0 = OS core count. Overridden to 1 when show_tree=True. |
| min_size | int | 0 | Skip files smaller than this byte count. 0 = no filter. |
| extensions | list[str] | None | Whitelist. Only matching extensions counted; all others excluded. |
| callback | callable | None | Called ~every 1s with current file count. GIL acquired before each call. |
| silent | bool | False | Suppress summary report and progress counter. |
| ignore_junk | bool | False | Skip directories in the hardcoded exclusion list entirely. |
Return Value
A consistent dict, every time
All 9 categories always present even when count is zero. Extension keys appear only when count > 0.
Usage Examples
Common patterns
Five production-ready patterns — each runs as a live terminal animation.
Tree Output
DFS-ordered filesystem topology
When show_tree=True, workers is forced to 1 for strict depth-first traversal. Each line written immediately — no buffer accumulates. A filesystem with 4 million files produces 4 million lines without memory pressure.
Exclusion Filter
ignore_junk=True
When enabled, the directories below are skipped entirely — no opendir, no syscall, no recursion. Case-insensitive basename check at any depth.
Report Format
Two tables, zero configuration
When silent=False (default), Anscom prints a Summary Report and a Detailed Extension Breakdown on completion.
File Categories & Extensions
170+ extensions across 9 categories
Built-in table sorted lexicographically, validated at module init. If ordering is violated, PyInit_anscom raises RuntimeError before the module loads.
Live scan simulation
Architecture
Engineering decisions that matter
Six choices determine correctness and throughput. Each eliminates a specific bottleneck in naive Python filesystem traversal.
How It Works
8-stage pipeline
When you call anscom.scan("/some/path"), the following pipeline executes in strict sequence.
Security & Compliance
Safe by construction
Every security property is structural — enforced by the implementation, not by runtime checks that can be bypassed.
Export Results
Three export formats, one click
Every scan result is a plain Python dict — fully serializable. Anscom ships with built-in export helpers for the three most-requested formats: CSV, Excel (XLSX), and JSON. No extra dependencies for CSV and JSON; openpyxl required for Excel.
openpyxl.
Export API
| Function | Output | Extra dep | Description |
|---|---|---|---|
| anscom.export_csv(result, path) | str | none | Writes category summary + extension breakdown as two CSV sections. |
| anscom.export_excel(result, path) | str | openpyxl | Creates a formatted .xlsx workbook with two named sheets. |
| anscom.export_json(result, path) | str | none | Serializes the full result dict to indented JSON. |
Enterprise Use Cases
Production-ready patterns
The returned dict is a plain Python object — directly serializable to JSON, integrable into any pipeline, dashboard, or audit system.