Training Process
This document explains the code in the provided Jupyter notebook for implementing a Federated Learning (FL) workflow for a Collaborative Intrusion Detection System (CIDS) with Non-IID data.
1. Utility Functions
1.1. Calculate Model Size
Purpose: Estimate the memory footprint of a model to quantify communication overhead.
def calculate_model_size(model):
total_params = np.sum([np.prod(weights.shape) for weights in model.get_weights()])
size_in_bytes = total_params * 4 # 32-bit float (4 bytes per parameter)
return size_in_bytes
Details:
Sums all trainable parameters in the model.
Assumes each parameter is stored as a 32-bit float (4 bytes).
Used to measure communication costs during FL aggregation.
—
1.2. Calculate F1-Score
Purpose: Compute the harmonic mean of precision and recall for classification evaluation.
def calculate_f1_score(precision, recall):
f1_score = 2 * (precision * recall) / (precision + recall + 1e-10)
return f1_score
Details:
Adds a small epsilon (
1e-10) to avoid division by zero.Used to evaluate both local and global model performance.
—
2. Federated Training Function: cids_federated_training()
Purpose:
Simulate federated learning across distributed nodes with Non-IID data, aggregating model updates while tracking performance metrics and resource usage.
def cids_federated_training(num_nodes=5, num_rounds=5):
# Initialization, training loops, aggregation, and evaluation logic
Parameters:
num_nodes: Number of clients/nodes (default: 5).num_rounds: Federated training iterations (default: 5).
—
2.1. Workflow Overview
Step 1: Initialize a global model and trackers for metrics (accuracy, precision, recall, F1-score).
Step 2: For each round:
Distribute global model weights to nodes.
Train local models on node-specific Non-IID data.
Aggregate updated weights using federated averaging.
Evaluate global model performance on a test set.
Step 3: Log metrics (training time, CPU/memory usage, communication overhead, variance in performance).
—
2.2. Key Components
Global Model Initialization:
global_model = create_model(input_shape=X_df_scl.shape[1])
Local Training:
Data loading using
load_data(node)(Non-IID splits).Model training with fixed hyperparameters (10 epochs, batch size=1000).
Resource monitoring via
psutil(CPU, memory usage).
Weight Aggregation:
new_weights = [np.mean([weight[layer] for weight in local_weights], axis=0) for layer in range(len(global_weights))]
Communication Overhead:
Tracks total data transferred between nodes and the server.
Updated after each round:
communication_overhead += num_nodes * model_size # Server-to-node broadcast
—
2.3. Tracked Metrics
Performance Metrics:
Accuracy, precision, recall, F1-score (local and global).
Variance and standard deviation across nodes to measure consistency.
Resource Utilization:
Training/prediction time per node.
CPU and memory usage during training.
Communication Costs:
Model size (in MB) and cumulative overhead across rounds.
—
3. Simulation Execution
Purpose: Run the federated training process and display results.
print("Simulation for CIDS with Non-IID Data\\n")
fl_model, fl_global_accuracies, ..., f1_stds = cids_federated_training()
Output:
Prints round-wise metrics (training time, prediction time, F1-score).
Final communication overhead (e.g.,
Final Communication Overhead: 12.34 MB).
—
4. Dependencies and Customization
Libraries:
numpy,tensorflow/keras,psutil,time,sklearn.model_selection.train_test_split.
Adjustable Parameters:
num_nodes: Increase to simulate more clients.num_rounds: Extend for longer training.fractioninload_data(): Modify client data allocation.
Model Customization:
Replace
create_model()with alternative architectures.Adjust training hyperparameters (epochs, batch size).
—
5. Notes
Non-IID Assumption: Data splits are client-specific and non-uniform, mimicking real-world edge device scenarios.
Scalability: The code supports varying numbers of nodes and rounds but may require optimization for large-scale deployments.
Evaluation: Test data is loaded using
load_data(num_nodes + 1), assuming a reserved client ID for validation.