.. _configuration:

Configuration
===========

Setup & Configuration
---------------------

Environment Setup
~~~~~~~~~~~~~~~~~

1. Install dependencies. Follow more detail instruction on :ref:`installation` section.
   
   .. code-block:: bash

      pip install -r requirements.txt

2. Place your dataset (e.g., ``CoAt-Set.csv``) in the ``./dataset`` directory.
3. Update the dataset path in the code if you place the dataset in other directory or you use other dataset.
  
   .. code-block:: python

      file_path = os.path.join("your_dataset_path", "your_dataset.csv")  # Adjust path as needed

Dataset Configuration
~~~~~~~~~~~~~~~~~~~~~

1. By default, the simulator run **Binary Classification**. The target label is using ``Attack`` column in the dataset.

   .. code-block:: python

     df = df.drop(columns=['Label'])  # Drops multi-class labels
     y_df = df['Attack']              # Target: 0 (benign) or 1 (attack)

2. If you want to run **Multi-Class Classification**, you need change the target label is using ``Label`` column in the dataset.
  
  .. code-block:: python

     df = df.drop(columns=['Attack'])
     y_df = df['Label']  # Retain 'Label' column for multi-class targets

3. Do not forget to edit the neural network architecture in ``create_model()``:
4. Extra code for doing **Multi-Class Classification** also need to provide. This version of simulator is not cover yet, maybe in the future.

Federated Learning Parameters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Configure in ``cids_federated_training()``:

.. code-block:: python

   # Example settings
   num_nodes = 10     # Number of clients
   num_rounds = 20    # Training rounds
   epochs = 10        # Local epochs per round
   batch_size = 1000  # Batch size per client

Data Distribution Settings
~~~~~~~~~~~~~~~~~~~~~~~~~~

Modify in ``load_data()``:

.. code-block:: python

   # Non-IID partitioning example
   def load_data():
       # Each client receives 2% of the dataset
       fraction = 0.02
       # Customize shuffling/sampling logic here

Model Customization
-------------------

Neural Network Architecture
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Edit ``create_model()``:

.. code-block:: python

   def create_model(input_shape):
    model = keras.Sequential([
        layers.Dense(20, activation='relu', input_shape=(input_shape,)),
        layers.Dense(10, activation='relu'), #Edit activation using other method. You can see here https://keras.io/api/layers/activations/#available-activations
        layers.Dense(5, activation='relu'), #Edit number of neuron in each layer (e.g. change 5 with 1000)
        layers.Dense(3, activation='relu'),
        layers.Dense(1, activation='sigmoid') 
    ])
    #Edit loss with other method. You can see here https://keras.io/api/losses/#available-losses  
    #Edit optimizer with other method. You can see here https://keras.io/api/optimizers/#available-optimizers  
    model.compile(loss='mean_squared_error', optimizer='sgd', metrics=['accuracy', Recall(), Precision()])
    return model

Preprocessing Adjustments
~~~~~~~~~~~~~~~~~~~~~~~~~

Replace scalers in the preprocessing pipeline:

.. code-block:: python

   from sklearn.preprocessing import StandardScaler

   # Replace QuantileTransformer
   preprocessor = StandardScaler()

Execution & Outputs
-------------------

Run the Simulation
~~~~~~~~~~~~~~~~~~

Run ``jupyter notebook`` first.

.. code-block:: bash

   jupyter notebook

After that, you can open ``CIDS-Sim_Non-IID.ipynb`` and ``CIDS-Sim_Heterogeneous.ipynb`` in jupyter notebook.

Outputs Generated:

- **Logs**: Real-time metrics (accuracy, F1-score, and etc.) in the console.
- **Visualizations**: Graphic plots of metric in each rounds.
- **CSV Files**: Detailed metrics in each round and save in files (e.g., ``global_metrics.csv``).

Troubleshooting
---------------

Common Issues:

- **Dataset Not Found**:
    
    - Verify ``file_path`` points to the correct dataset file.
    - Check filesystem permissions.

- **Poor Model Performance**:
    
    - Increase ``num_rounds`` or ``epochs``.
    - Add more layers to ``create_model()``.

- **High Memory Usage**:

    - Reduce ``batch_size`` or ``num_nodes``.
    - Disable resource tracking in the code.

Support
-------
For further assistance, open an issue on the `GitHub repository <https://github.com/your-repo>`_.