Training Process

This document explains the code in the provided Jupyter notebook for implementing a Federated Learning (FL) workflow for a Collaborative Intrusion Detection System (CIDS) with Non-IID data.

1. Utility Functions

1.1. Calculate Model Size

Purpose: Estimate the memory footprint of a model to quantify communication overhead.

def calculate_model_size(model):
    total_params = np.sum([np.prod(weights.shape) for weights in model.get_weights()])
    size_in_bytes = total_params * 4  # 32-bit float (4 bytes per parameter)
    return size_in_bytes

Details:

Sums all trainable parameters in the model.
Assumes each parameter is stored as a 32-bit float (4 bytes).
Used to measure communication costs during FL aggregation.

—

1.2. Calculate F1-Score

Purpose: Compute the harmonic mean of precision and recall for classification evaluation.

def calculate_f1_score(precision, recall):
    f1_score = 2 * (precision * recall) / (precision + recall + 1e-10)
    return f1_score

Details:

Adds a small epsilon (1e-10) to avoid division by zero.
Used to evaluate both local and global model performance.

—

2. Federated Training Function: cids_federated_training()

Purpose:

Simulate federated learning across distributed nodes with Non-IID data, aggregating model updates while tracking performance metrics and resource usage.

def cids_federated_training(num_nodes=5, num_rounds=5):
    # Initialization, training loops, aggregation, and evaluation logic

Parameters:

num_nodes: Number of clients/nodes (default: 5).
num_rounds: Federated training iterations (default: 5).

—

2.1. Workflow Overview

Step 1: Initialize a global model and trackers for metrics (accuracy, precision, recall, F1-score).
Step 2: For each round:
- Distribute global model weights to nodes.
- Train local models on node-specific Non-IID data.
- Aggregate updated weights using federated averaging.
- Evaluate global model performance on a test set.
Step 3: Log metrics (training time, CPU/memory usage, communication overhead, variance in performance).

—

2.2. Key Components

Global Model Initialization:

global_model = create_model(input_shape=X_df_scl.shape[1])

Local Training:
- Data loading using load_data(node) (Non-IID splits).
- Model training with fixed hyperparameters (10 epochs, batch size=1000).
- Resource monitoring via psutil (CPU, memory usage).

Weight Aggregation:

new_weights = [np.mean([weight[layer] for weight in local_weights], axis=0) for layer in range(len(global_weights))]

Communication Overhead:
- Tracks total data transferred between nodes and the server.
- Updated after each round:
```
communication_overhead += num_nodes * model_size  # Server-to-node broadcast
```

—

2.3. Tracked Metrics

Performance Metrics:
- Accuracy, precision, recall, F1-score (local and global).
- Variance and standard deviation across nodes to measure consistency.
Resource Utilization:
- Training/prediction time per node.
- CPU and memory usage during training.
Communication Costs:
- Model size (in MB) and cumulative overhead across rounds.

—

3. Simulation Execution

Purpose: Run the federated training process and display results.

print("Simulation for CIDS with Non-IID Data\\n")
fl_model, fl_global_accuracies, ..., f1_stds = cids_federated_training()

Output:

Prints round-wise metrics (training time, prediction time, F1-score).
Final communication overhead (e.g., Final Communication Overhead: 12.34 MB).

—

4. Dependencies and Customization

Libraries:
- numpy, tensorflow/keras, psutil, time, sklearn.model_selection.train_test_split.
Adjustable Parameters:
- num_nodes: Increase to simulate more clients.
- num_rounds: Extend for longer training.
- fraction in load_data(): Modify client data allocation.
Model Customization:
- Replace create_model() with alternative architectures.
- Adjust training hyperparameters (epochs, batch size).

—

5. Notes

Non-IID Assumption: Data splits are client-specific and non-uniform, mimicking real-world edge device scenarios.
Scalability: The code supports varying numbers of nodes and rounds but may require optimization for large-scale deployments.
Evaluation: Test data is loaded using load_data(num_nodes + 1), assuming a reserved client ID for validation.