Pre-built functions to prepare data for analysis
Before sending harvested data to a model or other analytic, it might be necessary to prepare that data to be analyzed properly. This could for example include breaking a string of text into shorter elements, changing the color scheme of an image, or normalizing a column of tabular values. In this section, we'll show how many preprocessing operations for text, images, and tabular data.
This function is used to split text into smaller chunks called tokens. You can specify multiple ways to split text, including by sentences, words, or with a custom token pattern that you can specify.
This function removes special characters from a body of text, including removing all numerals and / or punctuation.
Converts all text to lowercase or uppercase.
Converts tokens to integer vocabularies.
Pads sequences of text with a sequence of integers.
Adds a value to all pixels in an image.
Subtracts a value from all pixels in an image.
Multiplies all pixels in an image by a value.
Divides all pixels in an image by a value.
Converts an image to a color scheme - either RGB or Black & White.
Sets the dimensions of an image.
The tabular preprocessing functions are included in the aisquared python package, but are not currently supported in airJS.
Drops a column from tabular data.
Takes all associated columns and maps values relative to the minimum and maximum values of the training data.
Encodes categorical features as a one-hot numeric array.
Takes each user-supplied column value, subtracts that column's provided mean, and divides by the provided standard deviation.