This document provides a detailed reference for all public methods available in the Preprocessor class.
The main entry point for the SDK.
options.engine: Custom engine instance (optional).options.logger: Custom logger configuration (optional).
Checks if the current environment supports WebGPU.
- Returns:
Promise<boolean>
Loads an LLM model into memory.
modelName: String identifier for the model (e.g.,'Llama-3.2-1B-Instruct-q4f16_1-MLC').options: MLC-AI configuration options.- Throws:
ConfigurationErrorif model loading fails.
Cleans text using rules or LLM.
text: The string to clean.options:removeHtml(bool): Defaultfalse.removeUrls(bool): Defaultfalse.removeExtraWhitespace(bool): Defaultfalse.removeLineBreaks(bool): Defaultfalse.removeSpecialChars(bool): Defaultfalse.decodeHtmlEntities(bool): Defaultfalse.useLLM(bool): Force LLM cleaning.customInstructions(string): Specific semantic instructions for LLM.
Extracts structured information from text.
text: The source text.options:format:'json'(only format supported currently).fields: Array of field names to extract.strict: (bool) If true, throws error on validation failure.
- Returns:
Promise<Object>(Extracted JSON data).
Splits text into smaller segments.
text: The source text.options:size: Max characters per chunk (default500).overlap: Characters to overlap between chunks (default0).strategy:'character','sentence', or'word'.
- Returns:
string[]
Runs multiple operations in sequence.
text: Initial input.steps: Array of strings or objects (e.g.,['clean', { chunk: { size: 100 } }]).- Returns: Result of the final step.
Access the internal logger to retrieve logs or performance stats.
- Returns:
InternalLogger