FairLLM: Mitigating Age-Related Bias in Large Language Models

This archive is distributed in association with the INFORMS Journal on Computing under the MIT License.

The software and data in this repository are a snapshot of the software and data that were used in the research reported on in the paper Mitigating Age-Related Bias in Large Language Models: Strategies for Responsible AI Development by Zhuang Liu, Shiyao Qian, Shuirong Cao, and Tianyu Shi.

Cite

To cite the contents of this repository, please cite both the paper and this repo, using their respective DOIs.

https://doi.org/10.1287/ijoc.2024.0645

https://doi.org/10.1287/ijoc.2024.0645.cd

Below is the BibTex for citing this snapshot of the repository.

@misc{liu2024ijocCode,
  author =        {Zhuang Liu, Shiyao Qian, Shuirong Cao, Tianyu Shi},
  publisher =     {INFORMS Journal on Computing},
  title =         {{Mitigating Age-Related Bias in Large Language Models: Strategies for Responsible AI Development}},
  year =          {2025},
  doi =           {10.1287/ijoc.2024.0645.cd},
  url =           {https://github.com/INFORMSJoC/2024.0645},
  note =          {Available for download at https://github.com/INFORMSJoC/2024.0645},
}

Overview

FairLLM is a project aimed at reducing age-related bias in large language models (LLMs). As LLMs continue to be widely applied across various domains, ensuring their fairness and inclusivity has become crucial. FairLLM introduces two innovative bias mitigation strategies: Self-BMIL (Self-Bias Mitigation in-the-loop) and Coop-BMIL (Cooperative Bias Mitigation in-the-loop), along with an Empathetic Perspective Exchange strategy. These approaches reduce bias in model outputs through self-reflection, collaborative debate, and perspective transformation, thereby enhancing the fairness and inclusivity of the models.

Project Structure

FairLLM/
├── configs/              # Configuration files for experiments
│   ├── base_config.yaml  # Base configuration settings
│   ├── experiment_configs/ # Experiment-specific configurations
│   └── model_configs/    # Model-specific parameter settings
│   └── training_configs/ # training parameter settings
├── data/                 # Dataset storage
│   ├── bias_datasets/    # Datasets related to bias
│   └── synthetic_data/   # Tools for generating synthetic data
├── docs/                 # Project documentation
│   ├── API_REFERENCE.md  # API documentation
│   ├── ARCHITECTURE.md   # System architecture overview
│   └── FAIRNESS_PROTOCOL.md # Fairness protocol guidelines
├── requirements/         # Project dependencies and environment setup
│   ├── docker/           # Docker-related files
│   │   └── k8s/          # Kubernetes configurations
│   └── requirements-dev.txt # Development environment dependencies
├── results/              # Experiment results and analysis
│   ├── analyzer.py       # Result analysis tools
│   └── experiment_controller.py # Experiment management tools
├── scripts/              # Various scripts for project tasks
│   ├── evaluation_pipeline.py # Evaluation pipeline script
│   ├── model_service.sh  # Script for deploying model services
│   └── train_selfbmil.sh # Script for training with Self-BMIL
└── src/                  # Source code
    ├── agents/           # Agent modules for different strategies
    ├── data/             # Data processing utilities
    ├── evaluation/       # Evaluation metrics and tools
    ├── models/           # Model definitions and architectures
    ├── training/         # Model training definitions
    └── utils/            # General-purpose utilities

Installation

Prerequisites

Hardware Requirements:
- NVIDIA A100 GPUs with at least 40GB of VRAM.
- Multi-node GPU cluster with high-speed interconnects (recommended).
- Minimum of 128GB RAM per node (recommended).
Software Requirements:
- Python 3.8 or later
- Specific configurations of Docker (24.0.0+) and Kubernetes (with kubectl, 1.28+).
- CUDA toolkit version 12.1.1.
- Proprietary optimized versions of PyTorch.
- Transformers library by Hugging Face

Using Docker

To build and run the FairLLM service using Docker:

Build the Docker image:

docker build -t fairllm-service -f requirements/docker/Dockerfile.prod .

Run the Docker container:

docker run -d --name fairllm-container -p 8080:8080 fairllm-service

Verify the deployment:

docker ps
curl http://localhost:8080/health

Using Kubernetes

To deploy FairLLM to a Kubernetes cluster:

Build and tag the Docker image:

docker build -t registry.internal/fairllm-service:latest -f requirements/docker/Dockerfile.prod .
docker push registry.internal/fairllm-service:latest

Deploy to Kubernetes:

kubectl apply -f requirements/docker/k8s/deployment.yaml
kubectl apply -f requirements/docker/k8s/service.yaml

Verify the deployment:
```
kubectl get pods
kubectl get services
```

Using Prometheus for Monitoring

To monitor FairLLM using Prometheus:

Deploy Prometheus:

kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/main/bundle.yaml
kubectl apply -f requirements/docker/k8s/Prometheus_metric.yaml

Access Prometheus Dashboard:

kubectl port-forward svc/prometheus-server 9090:9090

Query Metrics:
- Access the Prometheus dashboard at http://localhost:9090
- Query metrics like fairllm_bias_score, fairllm_inference_latency_seconds, etc.

Usage

Training

To train the FairLLM model using the Self-BMIL strategy:

bash scripts/train_selfbmil.sh --config configs/training_configs/rlhf_selfbmil_train.yaml

To train using the Coop-BMIL strategy:

bash scripts/train_coopbmil.sh --config configs/training_configs/adversarial_coopmil.yaml

Evaluation

To evaluate the model's performance and fairness metrics:

python scripts/evaluation_pipeline.py --config configs/experiment_configs/exp_bias_mitigation.yaml

Deployment

To deploy the model as a service:

bash scripts/model_service.sh --model_path /path/to/your/model --config_path configs/model_configs/llama3_8b_bmil.yaml

Results

The project includes results from various experiments demonstrating the effectiveness of the proposed bias mitigation strategies. Key results include:

Figure 1: Comparison of Accuracy, Bias Scores, and Fairness Metrics across diﬀerent models and confgurations.
Figure 2: Cross-Dataset Fairness Metric Analysis.
Figure 3: Impact of Diﬀerent BMIL Methods on Fairness Metric.

Contributing

We welcome contributions to FairLLM! Please see our contribution guidelines for details on how to contribute.

Support

For any issues or questions related to FairLLM, please open an issue in the GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
configs		configs
data		data
docs		docs
requirements		requirements
results		results
scripts		scripts
src		src
AUTHORS		AUTHORS
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FairLLM: Mitigating Age-Related Bias in Large Language Models

Cite

Overview

Project Structure

Installation

Prerequisites

Using Docker

Using Kubernetes

Using Prometheus for Monitoring

Usage

Training

Evaluation

Deployment

Results

Contributing

Support

About

Uh oh!

Releases 1

Packages

Contributors 2

Uh oh!

Languages

License

INFORMSJoC/2024.0645

Folders and files

Latest commit

History

Repository files navigation

FairLLM: Mitigating Age-Related Bias in Large Language Models

Cite

Overview

Project Structure

Installation

Prerequisites

Using Docker

Using Kubernetes

Using Prometheus for Monitoring

Usage

Training

Evaluation

Deployment

Results

Contributing

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Uh oh!

Languages

Packages