Documentation/Nodes/Machine Learning

Machine Learning Nodes

Complete machine learning toolkit with 8+ algorithms, model management, evaluation metrics, and clustering capabilities for data science workflows.

Core Machine Learning Nodes

Essential nodes for machine learning workflows from data preparation to model deployment.

Train Test Split

Split datasets into training and testing sets

Type: ml_train_test_splitCategory: Data Preparation

Key Features

  • Stratified sampling for classification
  • Random state for reproducibility
  • Train/test ratio customization
  • Automatic feature/target separation
  • Data validation

Input Ports

datadata

Input DataFrame

target_columnstring

Target variable column

Output Ports

X_traindata

Training features

X_testdata

Testing features

y_traindata

Training target

y_testdata

Testing target

Logistic Regression

Binary and multiclass classification using logistic regression

Type: ml_logistic_regressionCategory: Classification

Key Features

  • L1/L2 regularization
  • Multiclass support
  • Feature importance extraction
  • Model persistence
  • Cross-validation support

Input Ports

X_traindata

Training features

y_traindata

Training target

X_testdata

Testing features

Output Ports

modelmodel

Trained logistic regression model

predictionsdata

Predicted labels

probabilitiesdata

Prediction probabilities

Random Forest Classifier

Ensemble classification using random forest algorithm

Type: ml_random_forest_classifierCategory: Classification

Key Features

  • Configurable number of trees
  • Maximum depth control
  • Feature importance ranking
  • Out-of-bag error estimation
  • Parallel processing support

Input Ports

X_traindata

Training features

y_traindata

Training target

X_testdata

Testing features

Output Ports

modelmodel

Trained random forest model

predictionsdata

Predicted labels

feature_importancedata

Feature importance scores

Linear Regression

Linear regression for continuous target prediction

Type: ml_linear_regressionCategory: Regression

Key Features

  • Ordinary least squares
  • Coefficient interpretation
  • R-squared calculation
  • Residual analysis
  • Feature scaling support

Input Ports

X_traindata

Training features

y_traindata

Training target

X_testdata

Testing features

Output Ports

modelmodel

Trained linear regression model

predictionsdata

Predicted values

coefficientsdata

Model coefficients

Model Load

Load a saved model from disk (.pkl file)

Type: ml_model_loadCategory: Model Operations

Key Features

  • Load models saved by Model Save node
  • Support for scikit-learn style models
  • Automatic model validation
  • Error handling for corrupted files
  • File path validation

Input Ports

filefile

Path to .pkl model file

Output Ports

modelmodel

Loaded model object

Model Save

Save a trained model to disk (.pkl file)

Type: ml_model_saveCategory: Model Operations

Key Features

  • Save models as pickle files
  • Automatic filename generation
  • Custom file path support
  • Model metadata preservation
  • Error handling for save failures

Input Ports

modelmodel

Trained model to save

filefile

Destination file path (optional)

Output Ports

filefile

Path to saved model file

Model Test

Test trained models on new data

Type: ml_model_testCategory: Model Operations

Key Features

  • Batch prediction support
  • Confidence score calculation
  • Automatic metric calculation
  • Prediction caching for performance
  • Robust error handling for edge cases

Input Ports

modelmodel

Trained model

X_testdata

Testing features

Output Ports

predictionsdata

Model predictions

confidencedata

Prediction confidence scores

performance_metricsdata

Model performance metrics

Model Predict

Make predictions using trained models

Type: ml_model_predictCategory: Model Operations

Key Features

  • Single and batch predictions
  • Input validation
  • Result formatting
  • Probability scores for classification
  • Confidence intervals for regression

Input Ports

modelmodel

Trained model

new_datadata

New data for prediction

Output Ports

predictionsdata

Model predictions

prediction_detailsdata

Detailed prediction results

Clustering & Dimensionality Reduction

Unsupervised learning algorithms for data exploration and feature extraction.

K-Means Clustering

Partition data into k clusters using K-means algorithm

cluster_kmeans

Unsupervised data segmentation

Agglomerative Clustering

Hierarchical clustering with customizable linkage methods

cluster_agglomerative

Hierarchical data organization

PCA

Principal Component Analysis for dimensionality reduction

dim_pca

Feature extraction and visualization

Model Evaluation Nodes

Comprehensive evaluation tools for assessing model performance and generating insights.

Confusion Matrix

Create confusion matrix for classification evaluation

eval_confusion_matrix

Classification model evaluation

Classification Report

Generate detailed classification performance report

eval_classification_report

Comprehensive classification metrics

Regression Metrics

Calculate regression performance metrics

eval_regression_metrics

Regression model evaluation

Machine Learning Workflow

Typical machine learning pipeline using Bioshift ML nodes.

1

Data Preparation

  • CSV Reader
  • Select Columns
  • Train Test Split
2

Model Training

  • Logistic Regression
  • Random Forest
  • Linear Regression
3

Model Testing

  • Model Test
  • Model Predict
  • Confusion Matrix
4

Evaluation

  • Classification Report
  • Regression Metrics
  • Plot Results
5

Deployment

  • Model Save
  • Save DataFrame
  • Generate Report