Cleaning
Drop Column
Description
Drop Column Node drops selected column(s) in a data table.
Hint
For a detailed walkthrough see the step-by-step guide.
Parameters
The Drop Column Node requires three parameters, input dataframe, a column to be deleted and a name for a new variable. Dataframe entry input expects variable rectangle with Dataframe, input Columns can be selected from the combobox and new bariable expects string.
Parameter |
Type |
Description |
---|---|---|
Dataframe |
Dataframe entry |
Dataframe variable rectangle |
Column(s) |
Combobox option |
A name of the column or name of columns to be deleted. More than one column can be selected from the combobox. |
New variable |
String entry |
A name for the new Dataframe variable |
Step-by-step guide
Rename Column
Description
Rename Column Node sets a new name (header) to the selected column.
Hint
For a detailed walkthrough see the step-by-step guide.
Parameters
The Rename Column Node requires 2 parameters (other than input dataframe and new dataframe name), a column whose name is to be changed and the new name.
Parameter |
Type |
Description |
---|---|---|
Dataframe |
Dataframe entry |
Dataframe variable rectangle |
Column |
Comboentry |
The column to be renamed. The name of the column can be either written or selected from the combobox. |
New name |
String entry |
A new name which will be used as a header for the selected column, e.g. “New_column_name”. |
New variable |
String entry |
A name for the new Dataframe variable |
Step-by-step guide
Select Columns
Description
Select Column Node takes out selected column(s) from a data table.
Hint
For a detailed walkthrough see the step-by-step guide.
Parameters
The Select Column Node requires at least one parameter, a column(s) to be selected from the data table.
Parameter |
Type |
Description |
---|---|---|
Dataframe |
Dataframe entry |
Dataframe variable rectangle |
Columns to select |
Combobox option |
A name of the column or name of columns to be selected from the data. More than one column can be chosen in the combobox. |
New variable |
String entry |
A name for the new Dataframe variable |
Step-by-step guide
Add Constant Column
Description
Add Constant Column Node creates a new constant column of length equal to that of the data frame.
Hint
For a detailed walkthrough see the step-by-step guide.
Parameters
Add Constant Column Node requires 2 parameters, a value, i.e. a number which will fill in the constant column, and a new column name.
Parameter |
Type |
Description |
---|---|---|
Dataframe |
Dataframe entry |
Dataframe variable rectangle |
Value |
Integer/Float entry |
A constant value filling the new column. |
Column name |
String entry |
A name (header) of the newly created constant column, e.g. “A_new_constant_column”. |
New variable |
String entry |
A name for the new Dataframe variable |
Step-by-step guide
Round To Higher Frequency
Column Math Operation (Outdated)
Description
Column Math Operation Node performs a mathemathical operations on the dataframes columns, e.g. sums values of two or more columns.
Parameters
The number of parameters is dependend on the chosen operation. The user can choose the operation in the first combobox which will trigger the revelation of other combobox(es). All operation contain the Result name parameter which defines the name of the resulting column.
Parameter |
Type |
Description |
---|---|---|
Choose math operation |
combobox |
A desired math operation. |
Result name |
string |
A name of the resulting column which will hold the values obtained from the performed operation. |
Search String (Outdated)
Description
Search String Node returns all occurences of strings satisfying given pattern.
Parameters
Search String Node requires 2 parameters:
Parameter |
Type |
Description |
---|---|---|
Column |
combobox |
The column selected for a string search. |
Pattern |
string |
A string pattern used for search. |
Replace String
Description
Replace String Node replaces all strings (or sub-strings) in selected column which either contain some pattern or exactly match it.
Parameters
Replace String Node requires 4 parameters:
Parameter |
Type |
Description |
---|---|---|
Dataframe |
Dataframe entry |
Dataframe variable rectangle |
Replace in columns |
Combobox option |
Column(s) selected for the string replacement. |
Match type |
Combobox option |
pattern → all strings containing the pattern will be replaced; exact → only string exactly matching the pattern will be replaced |
Replace substring |
Combobox option |
Enables to replace substrings. |
Pattern |
String entry |
The pattern used in the replacement process. |
Replacement |
String entry |
The string which will be replaced for all the selected values. |
New variable |
String entry |
A name for the new Dataframe variable |
Filter Data
Description
Takes a dataset and removes or selects the data that satisfies given expression. For example let’s say we have a phonebook of all employees working in an international company and we want to select only the contacts for those who work in Germany. Then we would pass an expression looking something like `country` == ‘Germany’.
Parameters
Filter Data Node requires at least 1 parameter:
Parameter |
Type |
Description |
---|---|---|
Dataframe |
Dataframe entry |
Dataframe variable rectangle |
Column(s) |
Combobox option |
Column(s) selected for the string filtering. |
Filter by string |
String entry |
Filter string which is to be filtered. |
Keep matched or drop |
Combobox option |
Set icon to either keep or drop rows satisfying the expression |
New variable |
String entry |
A name for the new Dataframe variable |
Split String
Description
Splits strings in a given column on a given substring and only keeps element on a given position of a resulting split list.
Parameters
Parameter |
Type |
Description |
---|---|---|
Dataframe |
Dataframe entry |
Dataframe variable rectangle |
Column |
Combobox option |
Column selected for the string splitting. |
Split on |
String entry |
String on which the icon splits the strings in a given column. |
Select index |
String entry |
Index position of resulting split list, on which the result should be stored, starting from 0. |
Keep old column |
Combobox option |
Decide whether the old column is or be dropped or not |
New column |
String entry |
A name for the new Dataframe column |
New variable |
String entry |
A name for the new Dataframe variable |
Suppose we have a datetime column with dates in the dd/mm/yyyy format, then split on ‘/’ with select index 0 will give us dd value in a column named {new_column}.
Sort Data
Description
Sorts selected columns in either ascending or descending order.
Parameters
Parameter |
Type |
Description |
---|---|---|
Dataframe |
Dataframe entry |
Dataframe variable rectangle |
Sort column (2x) |
Comboentry |
Names of the column to be sorted, e.g. we have a dataframe containing columns Name, Surname, Age, Salary and want to sort it in ascending order by Age and Salary → we will enter: Age, Salary |
Ascending (2x) |
Checkbox |
Tick the checkbox for ascending sort. |
New variable |
String entry |
A name for the new Dataframe variable |
Detect Or Remove Outliers
Description
Detect or remove given number of percentage of outliers in a given column(s)
Parameters
Parameter |
Type |
Description |
---|---|---|
Dataframe |
Dataframe entry |
Dataframe variable rectangle. |
Outlier mode |
Combobox option |
Decide whether to detect or remove outliers. |
Columns |
Combobox option |
Choose in which columns the outliers should be detected. |
Ratio as outliers |
Float entry |
Choose percentage of outliers to be found. Choose either ratio or top n. |
Top N as outliers |
Integer entry |
Choose number of outliers to be found. Choose either ratio or top n. |
New variable |
String entry |
A name for the new Dataframe variable. |
Remove Duplicates
Description
Removes all duplicate values in selected columns while keeping the first/last occurence or none.
Parameters
Parameter |
Type |
Description |
---|---|---|
Dataframe |
Dataframe entry |
Dataframe variable rectangle. |
Keep value |
Combobox option |
Decide handler behaviour. Either keep only first, last, or none of outliers. |
Considered Columns |
Comboentry |
Choose in which columns the duplicates should be detected. |
New variable |
String entry |
A name for the new Dataframe variable. |
Remove Empty Rows
Description
Removes empty rows. If some ID columns are filled, but other columns are empty, the filled columns can be ignored.
Parameters
Parameter |
Type |
Description |
---|---|---|
Dataframe |
Dataframe entry |
Dataframe variable rectangle. |
Mode |
Combobox option |
Decide whether to detect or remove empty rows. |
ID Columns |
Comboentry |
Choose in which columns the id values are filled (columns can be ignored). |
New variable |
String entry |
A name for the new Dataframe variable. |
Find Difference in Data
Description
Compare two dataframes with similar columns and return their difference.
Parameters
Parameter |
Type |
Description |
---|---|---|
Dataframe |
Dataframe entry |
First dataframe variable rectangle. |
Subtract Dataframe |
Dataframe entry |
Variable rectangle storing subtracted dataframe. |
New variable |
String entry |
A name for the new Dataframe variable. |
Column Wise Shift
Description
Inspect two dataframe columns, shift their values so that corresponding values (e.g. name and domain) are located on the same row. Inserts empty cell or deletes filled cell where necessary.
Parameters
Parameter |
Type |
Description |
---|---|---|
Dataframe |
Dataframe entry |
First dataframe variable rectangle. |
Mode |
Combobox option |
Choose whether rows that do not match should be removed or kept with the other column being shifted and inserted with nan |
Reference column |
Combobox option |
Choose the first column for comparison |
Incomplete column |
Combobox option |
Choose the second column for comparison |
New variable |
String entry |
A name for the new Dataframe variable. |
KNN Imputation
Description
Apply numeric KNN Imputation on selected column.
Parameters
Parameter |
Type |
Description |
---|---|---|
Dataframe |
Dataframe entry |
Dataframe variable rectangle. |
Column |
Comboentry |
Choose in which column values should be imputed. |
New variable |
String entry |
A name for the new Dataframe variable. |
Imputation
Description
Apply numeric or categorical imputation on selected column.
Parameters
Parameter |
Type |
Description |
---|---|---|
Dataframe |
Dataframe entry |
Dataframe variable rectangle. |
Imputed Column |
Comboentry |
Choose in which columns values should be imputed. |
Impute choice |
Comboentry |
Choose how the values should be generated. Either choose function (or zero) from combobox, or write value in entry. |
New variable |
String entry |
A name for the new Dataframe variable. |
Concatenate
Description
Concatenates values in the two selected dataframes into a new one.
Parameters
Parameter |
Type |
Description |
---|---|---|
Dataframe (2x) |
Dataframe entry |
Dataframe variable rectangle. |
Append |
Combobox option |
Choose whether rows or columns should be appended |
Join |
Combobox option |
Choose whether join executed should be inner or outer |
New variable |
String entry |
A name for the new Dataframe variable. |
Join Dataframes
Description
Joins two dataframes on (possibly muptiple) columns.
Parameters
Parameter |
Type |
Description |
---|---|---|
Dataframe (2x) |
Dataframe entry |
Dataframe variable rectangle. |
On columns |
Combobox option |
Choose columns on which join should be executed |
How |
Combobox option |
Choose join mode - left, right, outer, inner, cross |
New variable |
String entry |
A name for the new Dataframe variable. |
Apply Mapping
Aggregate Groups
Description
Group (not required) data and execute numerical and/or categorical aggregations on given column(s)
Parameters
Parameter |
Type |
Description |
---|---|---|
Dataframe |
Dataframe entry |
Dataframe variable rectangle. |
Columns to Group by |
Combobox option |
Choose columns on which group by should be executed |
Columns to Aggregate |
Combobox option |
Choose columns on which group by should be executed |
Numeric aggregations |
Combobox option |
Choose numerical aggregations that should be executed on numerical columns - sum, mean, **median, max, min, count, mode |
Categorical aggregations |
Combobox option |
Choose categorical aggregations that should be executed on categorical columns - mode |
New variable |
String entry |
A name for the new Dataframe variable. |