@Cervid/decomposer — Documentation

0

Dependencies

NPY/NPZ

NumPy Support

Parquet

Columnar Extract

MIT

License

How it works

Cervid Decomposer provides a unified interface to read complex binary formats like Parquet (Apache) and NPY/NPZ (NumPy). It automatically routes files to the appropriate engine based on file extensions.

Under the hood, it uses NPYEngine, NPZEngine, and ParquetReader to chunk and stream memory, maintaining low overhead while decoding columnar and multidimensional binary data directly into JavaScript TypedArrays and Objects.

npm

install

npm i @cervid/decomposer

Import

ESM (recommended)

import { Decomposer } from '@cervid/decomposer'

complete example

import { Decomposer } from '@cervid/decomposer'

// 1. Read a NumPy (.npy) file
const npyResult = await Decomposer.read('data/tensor.npy')
console.log(npyResult.data) // Raw TypedArray
console.log(npyResult.shape) // Array dimensions

// 2. Read specific columns from a Parquet file
const parquetResult = await Decomposer.read('data/dataset.parquet', {
  columns: ['id', 'name', 'score']
})
console.log(parquetResult)

// 3. Inspect a Parquet file without reading rows
const inspection = await Decomposer.read('data/huge.parquet', {
  inspect: true
})
console.log(inspection.metadata)

static async Decomposer.read(filePath: string, config?: DecomposerConfig) → Promise<DecomposerReadResult>

static

Reads and decomposes binary files. The target engine is selected based on the file extension (.npy, .npz, .parquet).

Config Property	Type	Description
workers	number	Worker threads for processing chunks (NPY/NPZ engines).
chunkSizeMB	number	Memory footprint limit per chunk processing.
columns	string[]	Array of column names to extract (Parquet only).
sample	number	Limit the number of rows to read (Parquet only).
inspect	boolean	Read only the metadata/schema without extracting row data (Parquet only).

NPYEngine parses raw .npy files, decoding the dictionary header to understand the tensor shape, data type (e.g. <f8 for Float64, |u1 for Uint8), and byte order before streaming the raw data block into a native JavaScript TypedArray.

NPZEngine unpacks .npz files, which are standard ZIP archives containing multiple .npy arrays. It returns an object mapping array names to their respective NPYProcessResult.

The ParquetReader engine implements custom byte-level parsing to decode Parquet headers, footers, and thrift metadata.

Because Parquet is a columnar format, passing a columns: [] array in the configuration will drastically improve performance, as the reader will skip decoding unrelated data pages entirely.

INTRODUCTION

INSTALLATION

QUICKSTART

DECOMPOSER

NUMPY (NPY/NPZ)

APACHE PARQUET