Cleaning

Drop Column

Description

Drop Column Node drops selected column(s) in a data table.

Hint

For a detailed walkthrough see the step-by-step guide.

Parameters

The Drop Column Node requires three parameters, input dataframe, a column to be deleted and a name for a new variable. Dataframe entry input expects variable rectangle with Dataframe, input Columns can be selected from the combobox and new bariable expects string.

Parameter	Type	Description
Dataframe	Dataframe entry	Dataframe variable rectangle
Column(s)	Combobox option	A name of the column or name of columns to be deleted. More than one column can be selected from the combobox.
New variable	String entry	A name for the new Dataframe variable

Step-by-step guide

Rename Column

Description

Rename Column Node sets a new name (header) to the selected column.

Hint

For a detailed walkthrough see the step-by-step guide.

Parameters

The Rename Column Node requires 2 parameters (other than input dataframe and new dataframe name), a column whose name is to be changed and the new name.

Parameter	Type	Description
Dataframe	Dataframe entry	Dataframe variable rectangle
Column	Comboentry	The column to be renamed. The name of the column can be either written or selected from the combobox.
New name	String entry	A new name which will be used as a header for the selected column, e.g. “New_column_name”.
New variable	String entry	A name for the new Dataframe variable

Step-by-step guide

Select Columns

Description

Select Column Node takes out selected column(s) from a data table.

Hint

For a detailed walkthrough see the step-by-step guide.

Parameters

The Select Column Node requires at least one parameter, a column(s) to be selected from the data table.

Parameter	Type	Description
Dataframe	Dataframe entry	Dataframe variable rectangle
Columns to select	Combobox option	A name of the column or name of columns to be selected from the data. More than one column can be chosen in the combobox.
New variable	String entry	A name for the new Dataframe variable

Step-by-step guide

Add Constant Column

Description

Add Constant Column Node creates a new constant column of length equal to that of the data frame.

Hint

For a detailed walkthrough see the step-by-step guide.

Parameters

Add Constant Column Node requires 2 parameters, a value, i.e. a number which will fill in the constant column, and a new column name.

Parameter	Type	Description
Dataframe	Dataframe entry	Dataframe variable rectangle
Value	Integer/Float entry	A constant value filling the new column.
Column name	String entry	A name (header) of the newly created constant column, e.g. “A_new_constant_column”.
New variable	String entry	A name for the new Dataframe variable

Step-by-step guide

Round To Higher Frequency

Column Math Operation (Outdated)

Description

Column Math Operation Node performs a mathemathical operations on the dataframes columns, e.g. sums values of two or more columns.

Parameters

The number of parameters is dependend on the chosen operation. The user can choose the operation in the first combobox which will trigger the revelation of other combobox(es). All operation contain the Result name parameter which defines the name of the resulting column.

Parameter	Type	Description
Choose math operation	combobox	A desired math operation.
Result name	string	A name of the resulting column which will hold the values obtained from the performed operation.

Search String (Outdated)

Description

Search String Node returns all occurences of strings satisfying given pattern.

Parameters

Search String Node requires 2 parameters:

Parameter	Type	Description
Column	combobox	The column selected for a string search.
Pattern	string	A string pattern used for search.

Replace String

Description

Replace String Node replaces all strings (or sub-strings) in selected column which either contain some pattern or exactly match it.

Parameters

Replace String Node requires 4 parameters:

Parameter	Type	Description
Dataframe	Dataframe entry	Dataframe variable rectangle
Replace in columns	Combobox option	Column(s) selected for the string replacement.
Match type	Combobox option	pattern → all strings containing the pattern will be replaced; exact → only string exactly matching the pattern will be replaced
Replace substring	Combobox option	Enables to replace substrings.
Pattern	String entry	The pattern used in the replacement process.
Replacement	String entry	The string which will be replaced for all the selected values.
New variable	String entry	A name for the new Dataframe variable

Filter Data

Description

Takes a dataset and removes or selects the data that satisfies given expression. For example let’s say we have a phonebook of all employees working in an international company and we want to select only the contacts for those who work in Germany. Then we would pass an expression looking something like `country` == ‘Germany’.

Parameters

Filter Data Node requires at least 1 parameter:

Parameter	Type	Description
Dataframe	Dataframe entry	Dataframe variable rectangle
Column(s)	Combobox option	Column(s) selected for the string filtering.
Filter by string	String entry	Filter string which is to be filtered.
Keep matched or drop	Combobox option	Set icon to either keep or drop rows satisfying the expression
New variable	String entry	A name for the new Dataframe variable

Split String

Description

Splits strings in a given column on a given substring and only keeps element on a given position of a resulting split list.

Parameters

Parameter	Type	Description
Dataframe	Dataframe entry	Dataframe variable rectangle
Column	Combobox option	Column selected for the string splitting.
Split on	String entry	String on which the icon splits the strings in a given column.
Select index	String entry	Index position of resulting split list, on which the result should be stored, starting from 0.
Keep old column	Combobox option	Decide whether the old column is or be dropped or not
New column	String entry	A name for the new Dataframe column
New variable	String entry	A name for the new Dataframe variable

Suppose we have a datetime column with dates in the dd/mm/yyyy format, then split on ‘/’ with select index 0 will give us dd value in a column named {new_column}.

Sort Data

Description

Sorts selected columns in either ascending or descending order.

Parameters

Parameter	Type	Description
Dataframe	Dataframe entry	Dataframe variable rectangle
Sort column (2x)	Comboentry	Names of the column to be sorted, e.g. we have a dataframe containing columns Name, Surname, Age, Salary and want to sort it in ascending order by Age and Salary → we will enter: Age, Salary
Ascending (2x)	Checkbox	Tick the checkbox for ascending sort.
New variable	String entry	A name for the new Dataframe variable

Detect Or Remove Outliers

Description

Detect or remove given number of percentage of outliers in a given column(s)

Parameters

Parameter	Type	Description
Dataframe	Dataframe entry	Dataframe variable rectangle.
Outlier mode	Combobox option	Decide whether to detect or remove outliers.
Columns	Combobox option	Choose in which columns the outliers should be detected.
Ratio as outliers	Float entry	Choose percentage of outliers to be found. Choose either ratio or top n.
Top N as outliers	Integer entry	Choose number of outliers to be found. Choose either ratio or top n.
New variable	String entry	A name for the new Dataframe variable.

Remove Duplicates

Description

Removes all duplicate values in selected columns while keeping the first/last occurence or none.

Parameters

Parameter	Type	Description
Dataframe	Dataframe entry	Dataframe variable rectangle.
Keep value	Combobox option	Decide handler behaviour. Either keep only first, last, or none of outliers.
Considered Columns	Comboentry	Choose in which columns the duplicates should be detected.
New variable	String entry	A name for the new Dataframe variable.

Remove Empty Rows

Description

Removes empty rows. If some ID columns are filled, but other columns are empty, the filled columns can be ignored.

Parameters

Parameter	Type	Description
Dataframe	Dataframe entry	Dataframe variable rectangle.
Mode	Combobox option	Decide whether to detect or remove empty rows.
ID Columns	Comboentry	Choose in which columns the id values are filled (columns can be ignored).
New variable	String entry	A name for the new Dataframe variable.

Find Difference in Data

Description

Compare two dataframes with similar columns and return their difference.

Parameters

Parameter	Type	Description
Dataframe	Dataframe entry	First dataframe variable rectangle.
Subtract Dataframe	Dataframe entry	Variable rectangle storing subtracted dataframe.
New variable	String entry	A name for the new Dataframe variable.

Column Wise Shift

Description

Inspect two dataframe columns, shift their values so that corresponding values (e.g. name and domain) are located on the same row. Inserts empty cell or deletes filled cell where necessary.

Parameters

Parameter	Type	Description
Dataframe	Dataframe entry	First dataframe variable rectangle.
Mode	Combobox option	Choose whether rows that do not match should be removed or kept with the other column being shifted and inserted with nan
Reference column	Combobox option	Choose the first column for comparison
Incomplete column	Combobox option	Choose the second column for comparison
New variable	String entry	A name for the new Dataframe variable.

KNN Imputation

Description

Apply numeric KNN Imputation on selected column.

Parameters

Parameter	Type	Description
Dataframe	Dataframe entry	Dataframe variable rectangle.
Column	Comboentry	Choose in which column values should be imputed.
New variable	String entry	A name for the new Dataframe variable.

Imputation

Description

Apply numeric or categorical imputation on selected column.

Parameters

Parameter	Type	Description
Dataframe	Dataframe entry	Dataframe variable rectangle.
Imputed Column	Comboentry	Choose in which columns values should be imputed.
Impute choice	Comboentry	Choose how the values should be generated. Either choose function (or zero) from combobox, or write value in entry.
New variable	String entry	A name for the new Dataframe variable.

Concatenate

Description

Concatenates values in the two selected dataframes into a new one.

Parameters

Common parameter
Parameter	Type	Description
Dataframe (2x)	Dataframe entry	Dataframe variable rectangle.
Append	Combobox option	Choose whether rows or columns should be appended
Join	Combobox option	Choose whether join executed should be inner or outer
New variable	String entry	A name for the new Dataframe variable.

Join Dataframes

Description

Joins two dataframes on (possibly muptiple) columns.

Parameters

Common parameter
Parameter	Type	Description
Dataframe (2x)	Dataframe entry	Dataframe variable rectangle.
On columns	Combobox option	Choose columns on which join should be executed
How	Combobox option	Choose join mode - left, right, outer, inner, cross
New variable	String entry	A name for the new Dataframe variable.

Apply Mapping

Aggregate Groups

Description

Group (not required) data and execute numerical and/or categorical aggregations on given column(s)

Parameters

Common parameter
Parameter	Type	Description
Dataframe	Dataframe entry	Dataframe variable rectangle.
Columns to Group by	Combobox option	Choose columns on which group by should be executed
Columns to Aggregate	Combobox option	Choose columns on which group by should be executed
Numeric aggregations	Combobox option	Choose numerical aggregations that should be executed on numerical columns - sum, mean, median, max, min, count, mode**
Categorical aggregations	Combobox option	Choose categorical aggregations that should be executed on categorical columns - mode
New variable	String entry	A name for the new Dataframe variable.