INTRODUCTION
High-performance binary deconstruction for Node.js. Extract data from Parquet, NPY, and NPZ files seamlessly with zero native dependencies.
Cervid Decomposer provides a unified interface to read complex binary formats like Parquet (Apache) and NPY/NPZ (NumPy). It automatically routes files to the appropriate engine based on file extensions.
Under the hood, it uses NPYEngine, NPZEngine, and ParquetReader to chunk and stream memory, maintaining low overhead while decoding columnar and multidimensional binary data directly into JavaScript TypedArrays and Objects.
INSTALLATION
Decomposer requires Node.js 18+ for native fetch and Worker Threads support.
npm i @cervid/decomposer
import { Decomposer } from '@cervid/decomposer'
QUICKSTART
A quick example showing how to read a NumPy file and a Parquet file.
import { Decomposer } from '@cervid/decomposer' // 1. Read a NumPy (.npy) file const npyResult = await Decomposer.read('data/tensor.npy') console.log(npyResult.data) // Raw TypedArray console.log(npyResult.shape) // Array dimensions // 2. Read specific columns from a Parquet file const parquetResult = await Decomposer.read('data/dataset.parquet', { columns: ['id', 'name', 'score'] }) console.log(parquetResult) // 3. Inspect a Parquet file without reading rows const inspection = await Decomposer.read('data/huge.parquet', { inspect: true }) console.log(inspection.metadata)
DECOMPOSER
The primary wrapper class for processing binary files. It acts as a router for the underlying engines.
Reads and decomposes binary files. The target engine is selected based on the file extension (.npy, .npz, .parquet).
| Config Property | Type | Description |
|---|---|---|
| workers | number | Worker threads for processing chunks (NPY/NPZ engines). |
| chunkSizeMB | number | Memory footprint limit per chunk processing. |
| columns | string[] | Array of column names to extract (Parquet only). |
| sample | number | Limit the number of rows to read (Parquet only). |
| inspect | boolean | Read only the metadata/schema without extracting row data (Parquet only). |
NUMPY (NPY/NPZ)
High-performance engines for parsing Python's scientific tensor formats.
NPYEngine parses raw .npy files, decoding the dictionary header to understand the tensor shape, data type (e.g. <f8 for Float64, |u1 for Uint8), and byte order before streaming the raw data block into a native JavaScript TypedArray.
NPZEngine unpacks .npz files, which are standard ZIP archives containing multiple .npy arrays. It returns an object mapping array names to their respective NPYProcessResult.
APACHE PARQUET
Columnar extraction engine for the Apache Parquet specification.
The ParquetReader engine implements custom byte-level parsing to decode Parquet headers, footers, and thrift metadata.
Because Parquet is a columnar format, passing a columns: [] array in the configuration will drastically improve performance, as the reader will skip decoding unrelated data pages entirely.