0
Dependencies
NPY/NPZ
NumPy Support
Parquet
Columnar Extract
MIT
License
How it works

Cervid Decomposer provides a unified interface to read complex binary formats like Parquet (Apache) and NPY/NPZ (NumPy). It automatically routes files to the appropriate engine based on file extensions.

Under the hood, it uses NPYEngine, NPZEngine, and ParquetReader to chunk and stream memory, maintaining low overhead while decoding columnar and multidimensional binary data directly into JavaScript TypedArrays and Objects.

npm
install
npm i @cervid/decomposer
Import
ESM (recommended)
import { Decomposer } from '@cervid/decomposer'
complete example
import { Decomposer } from '@cervid/decomposer'

// 1. Read a NumPy (.npy) file
const npyResult = await Decomposer.read('data/tensor.npy')
console.log(npyResult.data) // Raw TypedArray
console.log(npyResult.shape) // Array dimensions

// 2. Read specific columns from a Parquet file
const parquetResult = await Decomposer.read('data/dataset.parquet', {
  columns: ['id', 'name', 'score']
})
console.log(parquetResult)

// 3. Inspect a Parquet file without reading rows
const inspection = await Decomposer.read('data/huge.parquet', {
  inspect: true
})
console.log(inspection.metadata)
static async Decomposer.read(filePath: string, config?: DecomposerConfig) → Promise<DecomposerReadResult>
static

Reads and decomposes binary files. The target engine is selected based on the file extension (.npy, .npz, .parquet).

Config PropertyTypeDescription
workersnumberWorker threads for processing chunks (NPY/NPZ engines).
chunkSizeMBnumberMemory footprint limit per chunk processing.
columnsstring[]Array of column names to extract (Parquet only).
samplenumberLimit the number of rows to read (Parquet only).
inspectbooleanRead only the metadata/schema without extracting row data (Parquet only).

NPYEngine parses raw .npy files, decoding the dictionary header to understand the tensor shape, data type (e.g. <f8 for Float64, |u1 for Uint8), and byte order before streaming the raw data block into a native JavaScript TypedArray.

NPZEngine unpacks .npz files, which are standard ZIP archives containing multiple .npy arrays. It returns an object mapping array names to their respective NPYProcessResult.

The ParquetReader engine implements custom byte-level parsing to decode Parquet headers, footers, and thrift metadata.

Because Parquet is a columnar format, passing a columns: [] array in the configuration will drastically improve performance, as the reader will skip decoding unrelated data pages entirely.