Training Process

This document explains the code in the provided Jupyter notebook for implementing a Federated Learning (FL) workflow for a Collaborative Intrusion Detection System (CIDS) with Non-IID data.

1. Utility Functions

1.1. Calculate Model Size

Purpose: Estimate the memory footprint of a model to quantify communication overhead.

def calculate_model_size(model):
    total_params = np.sum([np.prod(weights.shape) for weights in model.get_weights()])
    size_in_bytes = total_params * 4  # 32-bit float (4 bytes per parameter)
    return size_in_bytes

Details:

  • Sums all trainable parameters in the model.

  • Assumes each parameter is stored as a 32-bit float (4 bytes).

  • Used to measure communication costs during FL aggregation.

1.2. Calculate F1-Score

Purpose: Compute the harmonic mean of precision and recall for classification evaluation.

def calculate_f1_score(precision, recall):
    f1_score = 2 * (precision * recall) / (precision + recall + 1e-10)
    return f1_score

Details:

  • Adds a small epsilon (1e-10) to avoid division by zero.

  • Used to evaluate both local and global model performance.

2. Federated Training Function: cids_federated_training()

Purpose:

Simulate federated learning across distributed nodes with Non-IID data, aggregating model updates while tracking performance metrics and resource usage.

def cids_federated_training(num_nodes=5, num_rounds=5):
    # Initialization, training loops, aggregation, and evaluation logic

Parameters:

  • num_nodes: Number of clients/nodes (default: 5).

  • num_rounds: Federated training iterations (default: 5).

2.1. Workflow Overview

  • Step 1: Initialize a global model and trackers for metrics (accuracy, precision, recall, F1-score).

  • Step 2: For each round:

    • Distribute global model weights to nodes.

    • Train local models on node-specific Non-IID data.

    • Aggregate updated weights using federated averaging.

    • Evaluate global model performance on a test set.

  • Step 3: Log metrics (training time, CPU/memory usage, communication overhead, variance in performance).

2.2. Key Components

  • Global Model Initialization:

    global_model = create_model(input_shape=X_df_scl.shape[1])
    
  • Local Training:

    • Data loading using load_data(node) (Non-IID splits).

    • Model training with fixed hyperparameters (10 epochs, batch size=1000).

    • Resource monitoring via psutil (CPU, memory usage).

  • Weight Aggregation:

    new_weights = [np.mean([weight[layer] for weight in local_weights], axis=0) for layer in range(len(global_weights))]
    
  • Communication Overhead:

    • Tracks total data transferred between nodes and the server.

    • Updated after each round:

      communication_overhead += num_nodes * model_size  # Server-to-node broadcast
      

2.3. Tracked Metrics

  • Performance Metrics:

    • Accuracy, precision, recall, F1-score (local and global).

    • Variance and standard deviation across nodes to measure consistency.

  • Resource Utilization:

    • Training/prediction time per node.

    • CPU and memory usage during training.

  • Communication Costs:

    • Model size (in MB) and cumulative overhead across rounds.

3. Simulation Execution

Purpose: Run the federated training process and display results.

print("Simulation for CIDS with Non-IID Data\\n")
fl_model, fl_global_accuracies, ..., f1_stds = cids_federated_training()

Output:

  • Prints round-wise metrics (training time, prediction time, F1-score).

  • Final communication overhead (e.g., Final Communication Overhead: 12.34 MB).

4. Dependencies and Customization

  • Libraries:

    • numpy, tensorflow/keras, psutil, time, sklearn.model_selection.train_test_split.

  • Adjustable Parameters:

    • num_nodes: Increase to simulate more clients.

    • num_rounds: Extend for longer training.

    • fraction in load_data(): Modify client data allocation.

  • Model Customization:

    • Replace create_model() with alternative architectures.

    • Adjust training hyperparameters (epochs, batch size).

5. Notes

  • Non-IID Assumption: Data splits are client-specific and non-uniform, mimicking real-world edge device scenarios.

  • Scalability: The code supports varying numbers of nodes and rounds but may require optimization for large-scale deployments.

  • Evaluation: Test data is loaded using load_data(num_nodes + 1), assuming a reserved client ID for validation.