Preprocessors Gallery

Pre-built functions to prepare data for analysis

Before sending harvested data to a model or other analytic, it might be necessary to prepare that data to be analyzed properly. This could for example include breaking a string of text into shorter elements, changing the color scheme of an image, or normalizing a column of tabular values. In this section, we'll show how many preprocessing operations for text, images, and tabular data.

Tokenize

This function is used to split text into smaller chunks called tokens. You can specify multiple ways to split text, including by sentences, words, or with a custom token pattern that you can specify.

Remove Characters

This function removes special characters from a body of text, including removing all numerals and / or punctuation.

Convert to Case

Converts all text to lowercase or uppercase.

Convert to Vocabulary

Converts tokens to integer vocabularies.

Pad Sequences

Pads sequences of text with a sequence of integers.

Add Value

Adds a value to all pixels in an image.

Subtract Value

Subtracts a value from all pixels in an image.

Multiply Value

Multiplies all pixels in an image by a value.

Divide Value

Divides all pixels in an image by a value.

Convert to Color

Converts an image to a color scheme - either RGB or Black & White.

Resize

Sets the dimensions of an image.

The tabular preprocessing functions are included in the aisquared python package, but are not currently supported in airJS.

Drop Columns

Drops a column from tabular data.

Min-Max Scaling

Takes all associated columns and maps values relative to the minimum and maximum values of the training data.

One Hot Encoding

Encodes categorical features as a one-hot numeric array.

Z-Score Normalization

Takes each user-supplied column value, subtracts that column's provided mean, and divides by the provided standard deviation.

Last updated