Preprocessors Gallery
Pre-built functions to prepare data for analysis
Before sending harvested data to a model or other analytic, it might be necessary to prepare that data to be analyzed properly. This could for example include breaking a string of text into shorter elements, changing the color scheme of an image, or normalizing a column of tabular values. In this section, we'll show how many preprocessing operations for text, images, and tabular data.
Tokenize
This function is used to split text into smaller chunks called tokens. You can specify multiple ways to split text, including by sentences, words, or with a custom token pattern that you can specify.
Remove Characters
This function removes special characters from a body of text, including removing all numerals and / or punctuation.
Convert to Case
Converts all text to lowercase or uppercase.
Convert to Vocabulary
Converts tokens to integer vocabularies.
Pad Sequences
Pads sequences of text with a sequence of integers.
Add Value
Adds a value to all pixels in an image.
Subtract Value
Subtracts a value from all pixels in an image.
Multiply Value
Multiplies all pixels in an image by a value.
Divide Value
Divides all pixels in an image by a value.
Convert to Color
Converts an image to a color scheme - either RGB or Black & White.
Resize
Sets the dimensions of an image.
The tabular preprocessing functions are included in the aisquared python package, but are not currently supported in airJS.
Drop Columns
Drops a column from tabular data.
Min-Max Scaling
Takes all associated columns and maps values relative to the minimum and maximum values of the training data.
One Hot Encoding
Encodes categorical features as a one-hot numeric array.
Z-Score Normalization
Takes each user-supplied column value, subtracts that column's provided mean, and divides by the provided standard deviation.
Last updated