======== Cleaning ======== .. role:: raw-html(raw) :format: html |cleaning_icon| Drop Column =============================== .. |cleaning_icon| image:: ../../png/clean.png :width: 60 Description ----------- Drop Column Node drops selected column(s) in a data table. .. hint:: For a detailed walkthrough see the :ref:`step-by-step guide `. Parameters ---------- The Drop Column Node requires three parameters, input dataframe, a column to be deleted and a name for a new variable. Dataframe entry input expects variable rectangle with Dataframe, input Columns can be selected from the combobox and new bariable expects string. .. list-table:: :header-rows: 1 * - Parameter - Type - Description * - Dataframe - Dataframe entry - Dataframe variable rectangle * - Column(s) - Combobox option - A name of the column or name of columns to be deleted. More than one column can be selected from the combobox. * - New variable - String entry - A name for the new Dataframe variable .. _step_by_step_drop_column: Step-by-step guide ------------------ |cleaning_icon| Rename Column =============================== Description ----------- Rename Column Node sets a new name (header) to the selected column. .. hint:: For a detailed walkthrough see the :ref:`step-by-step guide `. Parameters ---------- The Rename Column Node requires 2 parameters (other than input dataframe and new dataframe name), a *column* whose name is to be changed and the *new name*. .. list-table:: :header-rows: 1 * - Parameter - Type - Description * - Dataframe - Dataframe entry - Dataframe variable rectangle * - Column - Comboentry - The column to be renamed. The name of the column can be either written or selected from the combobox. * - New name - String entry - A new name which will be used as a header for the selected column, e.g. "*New_column_name*". * - New variable - String entry - A name for the new Dataframe variable .. _step_by_step_rename_column: Step-by-step guide ------------------ |cleaning_icon| Select Columns =============================== Description ----------- Select Column Node takes out selected column(s) from a data table. .. hint:: For a detailed walkthrough see the :ref:`step-by-step guide `. Parameters ---------- The Select Column Node requires at least one parameter, a column(s) to be selected from the data table. .. list-table:: :header-rows: 1 * - Parameter - Type - Description * - Dataframe - Dataframe entry - Dataframe variable rectangle * - Columns to select - Combobox option - A name of the column or name of columns to be selected from the data. More than one column can be chosen in the combobox. * - New variable - String entry - A name for the new Dataframe variable .. _step_by_step_select_column: Step-by-step guide ------------------ |cleaning_icon| Add Constant Column ==================================== Description ----------- Add Constant Column Node creates a new constant column of length equal to that of the data frame. .. hint:: For a detailed walkthrough see the :ref:`step-by-step guide `. Parameters ---------- Add Constant Column Node requires 2 parameters, a *value*, i.e. a number which will fill in the constant column, and a new *column name*. .. list-table:: :header-rows: 1 * - Parameter - Type - Description * - Dataframe - Dataframe entry - Dataframe variable rectangle * - Value - Integer/Float entry - A constant value filling the new column. * - Column name - String entry - A name (header) of the newly created constant column, e.g. "*A_new_constant_column*". * - New variable - String entry - A name for the new Dataframe variable .. _step_by_step_constant_column: Step-by-step guide ------------------ |cleaning_icon| Round To Higher Frequency ========================================= |cleaning_icon| Column Math Operation (Outdated) ================================================ Description ----------- Column Math Operation Node performs a mathemathical operations on the dataframes columns, e.g. sums values of two or more columns. Parameters ---------- The number of parameters is dependend on the chosen operation. The user can choose the operation in the first combobox which will trigger the revelation of other combobox(es). All operation contain the *Result name* parameter which defines the name of the resulting column. .. list-table:: :header-rows: 1 * - Parameter - Type - Description * - Choose math operation - combobox - A desired math operation. * - Result name - string - A name of the resulting column which will hold the values obtained from the performed operation. |cleaning_icon| Search String (Outdated) ======================================== Description ----------- Search String Node returns all occurences of strings satisfying given pattern. Parameters ---------- Search String Node requires 2 parameters: .. list-table:: :header-rows: 1 * - Parameter - Type - Description * - Column - combobox - The column selected for a string search. * - Pattern - string - A string pattern used for search. |cleaning_icon| Replace String =============================== Description ----------- Replace String Node replaces all strings (or sub-strings) in selected column which either contain some pattern or exactly match it. Parameters ---------- Replace String Node requires 4 parameters: .. list-table:: :header-rows: 1 * - Parameter - Type - Description * - Dataframe - Dataframe entry - Dataframe variable rectangle * - Replace in columns - Combobox option - Column(s) selected for the string replacement. * - Match type - Combobox option - **pattern** :raw-html:`→` all strings containing the pattern will be replaced; **exact** :raw-html:`→` only string exactly matching the pattern will be replaced * - Replace substring - Combobox option - Enables to replace substrings. * - Pattern - String entry - The pattern used in the replacement process. * - Replacement - String entry - The string which will be replaced for all the selected values. * - New variable - String entry - A name for the new Dataframe variable |cleaning_icon| Filter Data ============================ Description ----------- Takes a dataset and removes or selects the data that satisfies given expression. For example let's say we have a phonebook of all employees working in an international company and we want to select only the contacts for those who work in Germany. Then we would pass an expression looking something like *`country` == 'Germany'*. Parameters ---------- Filter Data Node requires at least 1 parameter: .. list-table:: :header-rows: 1 * - Parameter - Type - Description * - Dataframe - Dataframe entry - Dataframe variable rectangle * - Column(s) - Combobox option - Column(s) selected for the string filtering. * - Filter by string - String entry - Filter string which is to be filtered. * - Keep matched or drop - Combobox option - Set icon to either keep or drop rows satisfying the expression * - New variable - String entry - A name for the new Dataframe variable |cleaning_icon| Split String ============================ Description ----------- Splits strings in a given column on a given substring and only keeps element on a given position of a resulting split list. Parameters ---------- .. list-table:: :header-rows: 1 * - Parameter - Type - Description * - Dataframe - Dataframe entry - Dataframe variable rectangle * - Column - Combobox option - Column selected for the string splitting. * - Split on - String entry - String on which the icon splits the strings in a given column. * - Select index - String entry - Index position of resulting split list, on which the result should be stored, starting from 0. * - Keep old column - Combobox option - Decide whether the old column is or be dropped or not * - New column - String entry - A name for the new Dataframe column * - New variable - String entry - A name for the new Dataframe variable Suppose we have a datetime column with dates in the *dd/mm/yyyy* format, then split on '/' with select index 0 will give us *dd* value in a column named {new_column}. |cleaning_icon| Sort Data ========================= Description ----------- Sorts selected columns in either ascending or descending order. Parameters ---------- .. list-table:: :header-rows: 1 * - Parameter - Type - Description * - Dataframe - Dataframe entry - Dataframe variable rectangle * - Sort column (2x) - Comboentry - Names of the column to be sorted, e.g. we have a dataframe containing columns *Name, Surname, Age, Salary* and want to sort it in ascending order by Age and Salary :raw-html:`→` we will enter: *Age, Salary* * - Ascending (2x) - Checkbox - Tick the checkbox for ascending sort. * - New variable - String entry - A name for the new Dataframe variable |cleaning_icon| Detect Or Remove Outliers ========================================= Description ----------- Detect or remove given number of percentage of outliers in a given column(s) Parameters ---------- .. list-table:: :header-rows: 1 * - Parameter - Type - Description * - Dataframe - Dataframe entry - Dataframe variable rectangle. * - Outlier mode - Combobox option - Decide whether to detect or remove outliers. * - Columns - Combobox option - Choose in which columns the outliers should be detected. * - Ratio as outliers - Float entry - Choose percentage of outliers to be found. Choose either ratio or top n. * - Top N as outliers - Integer entry - Choose number of outliers to be found. Choose either ratio or top n. * - New variable - String entry - A name for the new Dataframe variable. |cleaning_icon| Remove Duplicates ==================================== Description ----------- Removes all duplicate values in selected columns while keeping the first/last occurence or none. Parameters ---------- .. list-table:: :header-rows: 1 * - Parameter - Type - Description * - Dataframe - Dataframe entry - Dataframe variable rectangle. * - Keep value - Combobox option - Decide handler behaviour. Either keep only **first**, **last**, or **none** of outliers. * - Considered Columns - Comboentry - Choose in which columns the duplicates should be detected. * - New variable - String entry - A name for the new Dataframe variable. |cleaning_icon| Remove Empty Rows ==================================== Description ----------- Removes empty rows. If some ID columns are filled, but other columns are empty, the filled columns can be ignored. Parameters ---------- .. list-table:: :header-rows: 1 * - Parameter - Type - Description * - Dataframe - Dataframe entry - Dataframe variable rectangle. * - Mode - Combobox option - Decide whether to detect or remove empty rows. * - ID Columns - Comboentry - Choose in which columns the id values are filled (columns can be ignored). * - New variable - String entry - A name for the new Dataframe variable. |cleaning_icon| Find Difference in Data ======================================== Description ----------- Compare two dataframes with **similar columns** and return their difference. Parameters ---------- .. list-table:: :header-rows: 1 * - Parameter - Type - Description * - Dataframe - Dataframe entry - First dataframe variable rectangle. * - Subtract Dataframe - Dataframe entry - Variable rectangle storing subtracted dataframe. * - New variable - String entry - A name for the new Dataframe variable. |cleaning_icon| Column Wise Shift ================================= Description ----------- Inspect two dataframe columns, shift their values so that corresponding values (e.g. name and domain) are located on the same row. Inserts empty cell or deletes filled cell where necessary. Parameters ---------- .. list-table:: :header-rows: 1 * - Parameter - Type - Description * - Dataframe - Dataframe entry - First dataframe variable rectangle. * - Mode - Combobox option - Choose whether rows that do not match should be **removed** or **kept** with the other column being shifted and inserted with nan * - Reference column - Combobox option - Choose the first column for comparison * - Incomplete column - Combobox option - Choose the second column for comparison * - New variable - String entry - A name for the new Dataframe variable. |cleaning_icon| KNN Imputation ============================== Description ----------- Apply numeric KNN Imputation on selected column. Parameters ---------- .. list-table:: :header-rows: 1 * - Parameter - Type - Description * - Dataframe - Dataframe entry - Dataframe variable rectangle. * - Column - Comboentry - Choose in which column values should be imputed. * - New variable - String entry - A name for the new Dataframe variable. |cleaning_icon| Imputation ========================== Description ----------- Apply numeric or categorical imputation on selected column. Parameters ---------- .. list-table:: :header-rows: 1 * - Parameter - Type - Description * - Dataframe - Dataframe entry - Dataframe variable rectangle. * - Imputed Column - Comboentry - Choose in which columns values should be imputed. * - Impute choice - Comboentry - Choose how the values should be generated. Either choose function (or zero) from combobox, or write value in entry. * - New variable - String entry - A name for the new Dataframe variable. |cleaning_icon| Concatenate =========================== Description ----------- Concatenates values in the two selected dataframes into a new one. Parameters ---------- .. list-table:: Common parameter :header-rows: 1 * - Parameter - Type - Description * - Dataframe (2x) - Dataframe entry - Dataframe variable rectangle. * - Append - Combobox option - Choose whether rows or columns should be appended * - Join - Combobox option - Choose whether join executed should be inner or outer * - New variable - String entry - A name for the new Dataframe variable. |cleaning_icon| Join Dataframes =============================== Description ----------- Joins two dataframes on (possibly muptiple) columns. Parameters ---------- .. list-table:: Common parameter :header-rows: 1 * - Parameter - Type - Description * - Dataframe (2x) - Dataframe entry - Dataframe variable rectangle. * - On columns - Combobox option - Choose columns on which join should be executed * - How - Combobox option - Choose join mode - **left**, **right**, **outer**, **inner**, **cross** * - New variable - String entry - A name for the new Dataframe variable. |cleaning_icon| Apply Mapping ============================= |cleaning_icon| Aggregate Groups ================================ Description ----------- Group (not required) data and execute numerical and/or categorical aggregations on given column(s) Parameters ---------- .. list-table:: Common parameter :header-rows: 1 * - Parameter - Type - Description * - Dataframe - Dataframe entry - Dataframe variable rectangle. * - Columns to Group by - Combobox option - Choose columns on which group by should be executed * - Columns to Aggregate - Combobox option - Choose columns on which group by should be executed * - Numeric aggregations - Combobox option - Choose numerical aggregations that should be executed on numerical columns - **sum**, **mean, **median**, **max**, **min**, **count**, **mode** * - Categorical aggregations - Combobox option - Choose categorical aggregations that should be executed on categorical columns - **mode** * - New variable - String entry - A name for the new Dataframe variable.