責任ある AI — 公平性・説明可能性・プライバシー

AI システムが社会に対して公正かつ透明に機能し、個人の権利を守るための技術的手法と組織的フレームワークを学ぶ。

この章で学ぶこと

公平性 (Fairness) — バイアスの種類を理解し、学習データ・モデル・出力の各段階で公平性を測定・改善する手法
説明可能性 (Explainability) — ブラックボックスモデルの判断根拠を人間が理解できる形で提示する技術
プライバシー — 差分プライバシー・連合学習など、個人情報を保護しながら AI を活用する仕組み
AI ガバナンス — 組織的なポリシー策定、監査、モデルカードによる透明性確保
安全性 (Safety) — 有害出力の防止、敵対的攻撃への耐性、ガードレールの実装

前提知識

このガイドを読む前に、以下の知識があると理解が深まります:

基本的なプログラミングの知識
関連する基礎概念の理解
MLOps — 機械学習の本番運用基盤の内容を理解していること

1. 責任ある AI の全体フレームワーク

1.1 5 つの柱

+------------------------------------------------------------------+
|                     責任ある AI (Responsible AI)                   |
+------------------------------------------------------------------+
|                                                                    |
|  +----------+  +----------+  +----------+  +--------+  +--------+ |
|  | 公平性   |  | 説明     |  | プライ   |  | 安全性 |  | 説明   | |
|  | Fairness |  | 可能性   |  | バシー   |  | Safety |  | 責任   | |
|  |          |  | Explain- |  | Privacy  |  |        |  | Account| |
|  |          |  | ability  |  |          |  |        |  | ability| |
|  +----------+  +----------+  +----------+  +--------+  +--------+ |
|                                                                    |
+------------------------------------------------------------------+

1.2 バイアスの発生箇所

データ収集       前処理        モデル学習      推論/意思決定
+--------+    +--------+    +--------+    +--------+
| 歴史的  |    | サンプ  |    | 目的   |    | 自動化  |
| バイアス | -> | リング  | -> | 関数の  | -> | バイアス |
| 社会的  |    | バイアス |    | バイアス |    | フィード |
| 偏り    |    |         |    |         |    | バック  |
+--------+    +--------+    +--------+    +--------+
   |               |              |              |
   v               v              v              v
 収集段階の      表現の偏り     学習過程での    デプロイ後の
 介入手法       の是正         緩和手法       監視・修正

1.3 バイアスの種類と具体例

バイアスの種類	説明	具体例
歴史的バイアス	過去の社会的不平等がデータに反映	採用データが男性優位の履歴を反映
表現バイアス	特定グループがデータに過小/過大代表	顔認識データの人種構成偏り
測定バイアス	特徴量やラベルの測定方法が偏っている	犯罪予測が逮捕データ（検挙バイアスあり）で学習
集約バイアス	異質なグループを一括で扱う	異なる文化圏を同一モデルで評価
評価バイアス	評価データが本番環境を代表しない	英語中心のベンチマークで多言語モデルを評価
デプロイバイアス	システムが意図と異なる文脈で使用される	補助ツールとして設計されたAIが最終判断に使われる
フィードバックループバイアス	モデルの出力が次の入力データに影響	推薦システムが特定コンテンツのみ露出→データが偏る

2. 公平性 (Fairness)

2.1 公平性指標の定義と計算

# コード例 1: 主要な公平性指標を計算する
import numpy as np
from sklearn.metrics import confusion_matrix
 
def calculate_fairness_metrics(y_true, y_pred, sensitive_attr):
    """
    二値分類における公平性指標を計算する。
 
    Parameters:
        y_true: 正解ラベル (0 or 1)
        y_pred: 予測ラベル (0 or 1)
        sensitive_attr: センシティブ属性 (0: 非特権グループ, 1: 特権グループ)
 
    Returns:
        dict: 各種公平性指標
    """
    metrics = {}
 
    for group_name, group_val in [("privileged", 1), ("unprivileged", 0)]:
        mask = sensitive_attr == group_val
        tn, fp, fn, tp = confusion_matrix(
            y_true[mask], y_pred[mask]
        ).ravel()
 
        # 各グループの指標
        metrics[f"{group_name}_TPR"] = tp / (tp + fn)  # 真陽性率
        metrics[f"{group_name}_FPR"] = fp / (fp + tn)  # 偽陽性率
        metrics[f"{group_name}_selection_rate"] = (tp + fp) / len(y_true[mask])
        metrics[f"{group_name}_PPV"] = tp / (tp + fp) if (tp + fp) > 0 else 0
 
    # --- 公平性指標 ---
    # Statistical Parity Difference (SPD): 選択率の差
    # |SPD| < 0.1 が一般的な閾値
    metrics["SPD"] = (
        metrics["unprivileged_selection_rate"]
        - metrics["privileged_selection_rate"]
    )
 
    # Equal Opportunity Difference (EOD): 真陽性率の差
    # |EOD| < 0.1 が一般的な閾値
    metrics["EOD"] = (
        metrics["unprivileged_TPR"]
        - metrics["privileged_TPR"]
    )
 
    # Disparate Impact (DI): 選択率の比
    # 0.8 <= DI <= 1.25 が「4/5ルール」
    metrics["DI"] = (
        metrics["unprivileged_selection_rate"]
        / metrics["privileged_selection_rate"]
    )
 
    # Predictive Parity Difference: 適合率の差
    metrics["PPD"] = (
        metrics["unprivileged_PPV"]
        - metrics["privileged_PPV"]
    )
 
    # 総合評価
    metrics["passes_4_5_rule"] = 0.8 <= metrics["DI"] <= 1.25
    metrics["fair_by_SPD"] = abs(metrics["SPD"]) < 0.1
    metrics["fair_by_EOD"] = abs(metrics["EOD"]) < 0.1
 
    return metrics
 
# 使用例
metrics = calculate_fairness_metrics(y_true, y_pred, gender)
print(f"Statistical Parity Difference: {metrics['SPD']:.3f}")
print(f"Disparate Impact: {metrics['DI']:.3f}")
print(f"Equal Opportunity Difference: {metrics['EOD']:.3f}")
print(f"4/5ルール: {'合格' if metrics['passes_4_5_rule'] else '不合格'}")

2.2 公平性指標の不可能定理

不可能定理（Impossibility Theorem）:
以下の3つの公平性指標を同時に満たすことは、
基本率（陽性率）がグループ間で異なる場合、数学的に不可能である。

  1. Calibration（較正）
     P(Y=1 | S=1, R=r) = P(Y=1 | S=0, R=r)
     → 同じスコアなら同じ確率で陽性

  2. Equal FPR（偽陽性率の均等）
     P(R=1 | Y=0, S=1) = P(R=1 | Y=0, S=0)
     → 陰性サンプルが誤って陽性と判定される率が等しい

  3. Equal FNR（偽陰性率の均等）
     P(R=0 | Y=1, S=1) = P(R=0 | Y=1, S=0)
     → 陽性サンプルが見落とされる率が等しい

  → タスクの文脈に応じて、どの公平性を優先するかを判断する必要がある

  例:
  ・融資審査 → Equal Opportunity（TPR均等）を優先
    （適格な人が平等に融資を受けられるべき）
  ・犯罪予測 → Equal FPR（偽陽性率均等）を優先
    （無実の人が誤って疑われる率を均等に）
  ・医療診断 → Calibration を優先
    （スコアが確率的に正しい意味を持つべき）

2.3 バイアス緩和手法の比較

手法	介入段階	アプローチ	ライブラリ	精度への影響
リサンプリング	前処理	不均衡データの調整	imbalanced-learn	小
リウェイティング	前処理	サンプル重みの調整	AIF360	小
Disparate Impact Remover	前処理	特徴量の変換	AIF360	小〜中
敵対的デバイアス	学習中	敵対的ネットワークで属性予測を困難にする	AIF360	中
メタフェア学習	学習中	公平性メトリクスを直接最適化	Fairlearn	中
閾値最適化	後処理	グループ別に判定閾値を調整	Fairlearn	小
Calibrated EqOdds	後処理	較正された等確率化	AIF360	小〜中
Reject Option	後処理	判断が不確かな領域でバイアスを補正	AIF360	小

2.4 Fairlearn によるバイアス緩和

# コード例 2: Fairlearn で公平性制約付き学習を行う
from fairlearn.reductions import (
    ExponentiatedGradient, DemographicParity,
    EqualizedOdds, TruePositiveRateParity
)
from fairlearn.metrics import (
    MetricFrame, selection_rate, false_positive_rate,
    false_negative_rate, count
)
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, f1_score
 
# 制約なしモデル（ベースライン）
unconstrained = LogisticRegression(max_iter=1000)
unconstrained.fit(X_train, y_train)
 
# --- 複数の公平性制約で比較 ---
constraints = {
    "Demographic Parity": DemographicParity(),
    "Equalized Odds": EqualizedOdds(),
    "TPR Parity": TruePositiveRateParity(),
}
 
results = {}
 
for name, constraint in constraints.items():
    mitigator = ExponentiatedGradient(
        estimator=LogisticRegression(max_iter=1000),
        constraints=constraint,
        max_iter=50
    )
    mitigator.fit(
        X_train, y_train,
        sensitive_features=sensitive_train
    )
 
    y_pred = mitigator.predict(X_test)
 
    # 公平性ダッシュボードで比較
    metric_frame = MetricFrame(
        metrics={
            "accuracy": accuracy_score,
            "selection_rate": selection_rate,
            "false_positive_rate": false_positive_rate,
            "false_negative_rate": false_negative_rate,
            "count": count,
        },
        y_true=y_test,
        y_pred=y_pred,
        sensitive_features=sensitive_test
    )
 
    results[name] = {
        "overall_accuracy": accuracy_score(y_test, y_pred),
        "by_group": metric_frame.by_group,
        "difference": metric_frame.difference(),
        "ratio": metric_frame.ratio(),
    }
 
    print(f"\n=== {name} ===")
    print(f"全体精度: {results[name]['overall_accuracy']:.4f}")
    print(f"グループ別:")
    print(metric_frame.by_group)
    print(f"グループ間の差:")
    print(metric_frame.difference())
 
# --- ベースラインとの比較表 ---
baseline_pred = unconstrained.predict(X_test)
baseline_acc = accuracy_score(y_test, baseline_pred)
print(f"\nベースライン精度: {baseline_acc:.4f}")
for name, result in results.items():
    acc_drop = baseline_acc - result["overall_accuracy"]
    print(f"{name}: 精度={result['overall_accuracy']:.4f} "
          f"(低下: {acc_drop:.4f})")

2.5 AIF360 によるバイアス検出と緩和

# コード例 3: AIF360 で包括的なバイアス分析を行う
from aif360.datasets import BinaryLabelDataset
from aif360.metrics import BinaryLabelDatasetMetric, ClassificationMetric
from aif360.algorithms.preprocessing import Reweighing, DisparateImpactRemover
from aif360.algorithms.inprocessing import AdversarialDebiasing
from aif360.algorithms.postprocessing import CalibratedEqOddsPostprocessing
 
import pandas as pd
import numpy as np
 
# データセットの構築
df = pd.DataFrame({
    "age": X_test["age"],
    "income": X_test["income"],
    "gender": sensitive_test,
    "label": y_test,
})
 
dataset = BinaryLabelDataset(
    favorable_label=1,
    unfavorable_label=0,
    df=df,
    label_names=["label"],
    protected_attribute_names=["gender"],
)
 
# データセットレベルのバイアス測定
metric = BinaryLabelDatasetMetric(
    dataset,
    unprivileged_groups=[{"gender": 0}],
    privileged_groups=[{"gender": 1}]
)
 
print("=== データセットバイアス ===")
print(f"Mean Difference: {metric.mean_difference():.4f}")
print(f"Disparate Impact: {metric.disparate_impact():.4f}")
print(f"Consistency: {metric.consistency():.4f}")
 
# 前処理: リウェイティング
reweighing = Reweighing(
    unprivileged_groups=[{"gender": 0}],
    privileged_groups=[{"gender": 1}]
)
reweighed_dataset = reweighing.fit_transform(dataset)
 
# リウェイティング後のバイアス測定
metric_rw = BinaryLabelDatasetMetric(
    reweighed_dataset,
    unprivileged_groups=[{"gender": 0}],
    privileged_groups=[{"gender": 1}]
)
print(f"\n=== リウェイティング後 ===")
print(f"Mean Difference: {metric_rw.mean_difference():.4f}")
print(f"Disparate Impact: {metric_rw.disparate_impact():.4f}")

3. 説明可能性 (Explainability)

3.1 説明手法の分類

+-------------------------------------------------------------------+
|                    説明可能性の手法                                  |
+-------------------------------------------------------------------+
|                                                                     |
|  モデル非依存 (Model-Agnostic)     モデル固有 (Model-Specific)       |
|  +---------------------------+    +---------------------------+     |
|  | SHAP (Shapley値)          |    | 決定木の分岐ルール         |     |
|  | LIME (局所的近似)          |    | 線形回帰の係数             |     |
|  | Permutation Importance    |    | Attention重み             |     |
|  | Partial Dependence Plot   |    | Grad-CAM (CNN)            |     |
|  | Counterfactual Explanation|    | Integrated Gradients      |     |
|  | Anchors                   |    | Feature Visualization     |     |
|  +---------------------------+    +---------------------------+     |
|                                                                     |
|  大域的説明 (Global)               局所的説明 (Local)               |
|  「モデル全体がどう動くか」         「この1件をなぜこう判断したか」   |
+-------------------------------------------------------------------+

3.2 説明手法の選択ガイド

対象者	必要な説明	推奨手法	表現形式
データサイエンティスト	モデルの挙動理解	SHAP全体プロット、PDP	グラフ、数値
ビジネス担当者	個別判断の理由	SHAP waterfall、LIME	テキスト+グラフ
規制当局	監査可能な記録	モデルカード、SHAP	レポート
エンドユーザー	なぜこの結果か	自然言語説明	テキスト
開発者	デバッグ	SHAP force plot、Grad-CAM	詳細可視化

3.3 SHAP による説明

# コード例 4: SHAP で個別予測の説明を生成する
import shap
import matplotlib.pyplot as plt
import numpy as np
 
# --- Tree SHAP (高速、ツリーモデル専用) ---
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)
 
# 個別の予測を説明 (ウォーターフォールプロット)
sample_idx = 0
shap.waterfall_plot(
    shap.Explanation(
        values=shap_values[sample_idx],
        base_values=explainer.expected_value,
        data=X_test.iloc[sample_idx],
        feature_names=X_test.columns.tolist()
    )
)
 
# 大域的な特徴量重要度 (サマリープロット)
shap.summary_plot(shap_values, X_test)
 
# 特定の特徴量の依存関係
shap.dependence_plot("age", shap_values, X_test, interaction_index="income")
 
# --- Kernel SHAP (汎用、任意のモデル) ---
# 計算コストが高いため、サンプリングして使う
background = shap.sample(X_train, 100)
kernel_explainer = shap.KernelExplainer(model.predict_proba, background)
kernel_shap_values = kernel_explainer.shap_values(X_test[:50])
 
# --- 自然言語での説明生成 ---
def generate_explanation(shap_values, feature_names, sample_idx,
                          prediction, top_k=3):
    """SHAP値を自然言語の説明に変換する"""
    # 上位の寄与要因を取得
    importance = np.abs(shap_values[sample_idx])
    top_indices = np.argsort(importance)[::-1][:top_k]
 
    explanation_parts = []
    for idx in top_indices:
        feature = feature_names[idx]
        value = shap_values[sample_idx][idx]
        direction = "上昇" if value > 0 else "低下"
        explanation_parts.append(
            f"'{feature}' が予測を{direction}させています "
            f"(寄与度: {abs(value):.3f})"
        )
 
    prediction_text = "陽性" if prediction == 1 else "陰性"
    explanation = (
        f"この予測は「{prediction_text}」です。主な理由:\n"
        + "\n".join(f"  {i+1}. {part}"
                    for i, part in enumerate(explanation_parts))
    )
    return explanation
 
# 使用例
explanation = generate_explanation(
    shap_values, X_test.columns.tolist(), 0, pred
)
print(explanation)

3.4 LIME による局所的説明

# コード例 5: LIME でテキスト分類の判断根拠を可視化する
from lime.lime_text import LimeTextExplainer
from lime.lime_tabular import LimeTabularExplainer
import numpy as np
 
# === テキスト分類の説明 ===
text_explainer = LimeTextExplainer(
    class_names=["negative", "positive"],
    split_expression=r'\s+',
    random_state=42
)
 
def predict_proba(texts):
    """モデルのpredict_probaラッパー"""
    features = vectorizer.transform(texts)
    return model.predict_proba(features)
 
# 個別テキストの説明
text = "この映画は素晴らしい演技と美しい映像で感動的だった"
explanation = text_explainer.explain_instance(
    text,
    predict_proba,
    num_features=10,
    num_samples=1000
)
 
# 判断に寄与した単語とその重み
print("テキスト分類の説明:")
for feature, weight in explanation.as_list():
    direction = "ポジティブ" if weight > 0 else "ネガティブ"
    print(f"  '{feature}': {weight:.3f} ({direction}寄与)")
 
# HTMLで保存
explanation.save_to_file("text_explanation.html")
 
# === 表形式データの説明 ===
tabular_explainer = LimeTabularExplainer(
    training_data=X_train.values,
    feature_names=X_train.columns.tolist(),
    class_names=["不合格", "合格"],
    mode="classification",
    discretize_continuous=True,
    random_state=42
)
 
# 個別サンプルの説明
sample = X_test.iloc[0]
tab_explanation = tabular_explainer.explain_instance(
    sample.values,
    model.predict_proba,
    num_features=5,
    num_samples=2000
)
 
print("\n表形式データの説明:")
for feature, weight in tab_explanation.as_list():
    print(f"  {feature}: {weight:.3f}")

3.5 Counterfactual Explanations（反事実的説明）

# コード例 6: 反事実的説明を生成する
import dice_ml
from dice_ml import Dice
 
# DiCEによる反事実的説明
d = dice_ml.Data(
    dataframe=df_train,
    continuous_features=["age", "income", "years_employed"],
    outcome_name="approved"
)
 
m = dice_ml.Model(model=model, backend="sklearn")
exp = Dice(d, m)
 
# 「ローン不承認」の顧客に対して
# 「何を変えれば承認されるか」を生成
 
counterfactuals = exp.generate_counterfactuals(
    query_instance,
    total_CFs=5,           # 5つの反事実的例を生成
    desired_class="opposite",
    features_to_vary=["income", "years_employed", "credit_score"],  # 変更可能な特徴量
    permitted_range={
        "income": [0, 200000],
        "years_employed": [0, 40],
        "credit_score": [300, 850],
    }
)
 
counterfactuals.visualize_as_dataframe()
 
# 結果の解釈
print("\n=== 反事実的説明 ===")
print(f"現在の状態: ローン不承認")
print(f"承認を得るための変更案:")
for i, cf in enumerate(counterfactuals.cf_examples_list[0].final_cfs_df.iterrows()):
    print(f"\n  案{i+1}:")
    for col in ["income", "years_employed", "credit_score"]:
        original = query_instance[col].values[0]
        changed = cf[1][col]
        if original != changed:
            print(f"    {col}: {original} → {changed}")

3.6 Grad-CAM による画像分類の説明

# コード例 7: Grad-CAM で CNN の判断根拠を可視化する
import torch
import torch.nn.functional as F
import numpy as np
import cv2
import matplotlib.pyplot as plt
 
class GradCAM:
    """Grad-CAM: 勾配重み付きクラス活性化マッピング"""
 
    def __init__(self, model, target_layer):
        self.model = model
        self.target_layer = target_layer
        self.gradients = None
        self.activations = None
 
        # フックを登録
        target_layer.register_forward_hook(self._save_activation)
        target_layer.register_backward_hook(self._save_gradient)
 
    def _save_activation(self, module, input, output):
        self.activations = output.detach()
 
    def _save_gradient(self, module, grad_input, grad_output):
        self.gradients = grad_output[0].detach()
 
    def generate(self, input_tensor, target_class=None):
        """Grad-CAMヒートマップを生成する"""
        self.model.eval()
        output = self.model(input_tensor)
 
        if target_class is None:
            target_class = output.argmax(dim=1).item()
 
        # 勾配を計算
        self.model.zero_grad()
        output[0, target_class].backward()
 
        # Global Average Pooling of gradients
        weights = self.gradients.mean(dim=[2, 3], keepdim=True)  # (1, C, 1, 1)
 
        # 重み付き活性化マップ
        cam = (weights * self.activations).sum(dim=1, keepdim=True)
        cam = F.relu(cam)  # 正の寄与のみ
        cam = cam.squeeze().numpy()
 
        # 正規化
        cam = (cam - cam.min()) / (cam.max() - cam.min() + 1e-8)
 
        return cam, target_class
 
    def visualize(self, image, cam, title="Grad-CAM"):
        """ヒートマップを元画像に重ねて表示"""
        # CAMをリサイズ
        cam_resized = cv2.resize(cam, (image.shape[1], image.shape[0]))
        heatmap = cv2.applyColorMap(
            np.uint8(255 * cam_resized), cv2.COLORMAP_JET
        )
        overlay = cv2.addWeighted(image, 0.6, heatmap, 0.4, 0)
 
        fig, axes = plt.subplots(1, 3, figsize=(15, 5))
        axes[0].imshow(image)
        axes[0].set_title("元画像")
        axes[1].imshow(cam_resized, cmap="jet")
        axes[1].set_title("Grad-CAM")
        axes[2].imshow(overlay)
        axes[2].set_title("オーバーレイ")
        for ax in axes:
            ax.axis("off")
        plt.suptitle(title)
        plt.tight_layout()
        plt.savefig("grad_cam_result.png", dpi=150)
        plt.close()
 
# 使用例
# grad_cam = GradCAM(model, model.layer4[-1])
# cam, pred_class = grad_cam.generate(input_tensor)
# grad_cam.visualize(original_image, cam, f"予測: {class_names[pred_class]}")

4. プライバシー

4.1 差分プライバシー

# コード例 8: 差分プライバシーを適用した学習
# pip install opacus
import torch
from opacus import PrivacyEngine
from opacus.validators import ModuleValidator
from torch import nn, optim
from torch.utils.data import DataLoader
 
# モデル定義（Opacus互換に変換）
model = nn.Sequential(
    nn.Linear(784, 128),
    nn.ReLU(),
    nn.Linear(128, 10)
)
 
# Opacus互換性チェック & 自動修正
model = ModuleValidator.fix(model)
errors = ModuleValidator.validate(model, strict=False)
if errors:
    print(f"互換性エラー: {errors}")
 
optimizer = optim.SGD(model.parameters(), lr=0.1)
 
# PrivacyEngine をアタッチ
privacy_engine = PrivacyEngine()
model, optimizer, train_loader = privacy_engine.make_private_with_epsilon(
    module=model,
    optimizer=optimizer,
    data_loader=train_loader,
    target_epsilon=3.0,       # プライバシー予算 epsilon
    target_delta=1e-5,        # delta パラメータ
    epochs=10,
    max_grad_norm=1.0,        # 勾配クリッピングのノルム上限
)
 
print(f"ノイズ乗数 (sigma): {optimizer.noise_multiplier:.2f}")
 
# 通常通り学習（内部でDP-SGDが適用される）
for epoch in range(10):
    model.train()
    total_loss = 0
    for batch_idx, (data, target) in enumerate(train_loader):
        optimizer.zero_grad()
        output = model(data)
        loss = nn.functional.cross_entropy(output, target)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
 
    # 消費されたプライバシー予算を確認
    epsilon = privacy_engine.get_epsilon(delta=1e-5)
    avg_loss = total_loss / len(train_loader)
    print(f"Epoch {epoch+1}: epsilon={epsilon:.2f}, loss={avg_loss:.4f}")

4.2 差分プライバシーのパラメータ設計

差分プライバシーの理解:

  ε (epsilon): プライバシー損失の上限

ε = 0.1 : 非常に強いプライバシー（精度低下大）
ε = 1.0 : 強いプライバシー
ε = 3.0 : 中程度のプライバシー
ε = 10.0 : 弱いプライバシー（精度低下小）
ε = ∞ : プライバシー保護なし

δ (delta): プライバシー保証が破れる確率の上限
  → 通常 1/n（nはデータサイズ）よりも小さく設定

  精度-プライバシーのトレードオフ:

精度
↑
	____
	/ \____
	/ \_______
	\___________
	\___________
└──────────────────────────────────────→ ε
0.1 1.0 3.0 10.0
強い ←── プライバシー ──→ 弱い

4.3 連合学習 (Federated Learning)

# コード例 9: PySyft を使った連合学習のシミュレーション
import numpy as np
from typing import List, Dict
 
class FederatedLearningSimulator:
    """連合学習のシミュレーション"""
 
    def __init__(self, num_clients: int, model_class, **model_kwargs):
        self.num_clients = num_clients
        self.model_class = model_class
        self.model_kwargs = model_kwargs
        self.global_model = model_class(**model_kwargs)
 
    def split_data(self, X, y, iid=True):
        """データをクライアントに分割する"""
        n = len(X)
        if iid:
            # IID分割: ランダムに均等分割
            indices = np.random.permutation(n)
            splits = np.array_split(indices, self.num_clients)
        else:
            # Non-IID分割: ラベルに基づいて偏った分割
            sorted_indices = np.argsort(y)
            splits = np.array_split(sorted_indices, self.num_clients)
 
        client_data = []
        for split in splits:
            client_data.append((X[split], y[split]))
        return client_data
 
    def train_round(self, client_data: List, epochs_per_client: int = 5,
                     client_fraction: float = 0.5):
        """1ラウンドの連合学習を実行する"""
        # クライアントのサンプリング
        num_selected = max(1, int(self.num_clients * client_fraction))
        selected_clients = np.random.choice(
            self.num_clients, num_selected, replace=False
        )
 
        # グローバルモデルの重みを取得
        global_weights = self.global_model.get_weights()
 
        client_weights = []
        client_sizes = []
 
        for client_id in selected_clients:
            X_client, y_client = client_data[client_id]
 
            # ローカルモデルにグローバル重みを設定
            local_model = self.model_class(**self.model_kwargs)
            local_model.set_weights(global_weights)
 
            # ローカル学習
            local_model.fit(X_client, y_client, epochs=epochs_per_client)
 
            client_weights.append(local_model.get_weights())
            client_sizes.append(len(X_client))
 
        # FedAvg: 加重平均で集約
        self._federated_averaging(client_weights, client_sizes)
 
        return self.global_model
 
    def _federated_averaging(self, client_weights, client_sizes):
        """FedAvg: データサイズに比例した加重平均"""
        total_size = sum(client_sizes)
        averaged_weights = {}
 
        for key in client_weights[0]:
            averaged_weights[key] = sum(
                w[key] * (size / total_size)
                for w, size in zip(client_weights, client_sizes)
            )
 
        self.global_model.set_weights(averaged_weights)
 
# 使用例
# simulator = FederatedLearningSimulator(
#     num_clients=10,
#     model_class=SimpleNN,
#     input_dim=784, hidden_dim=128, output_dim=10
# )
# client_data = simulator.split_data(X, y, iid=True)
# for round_num in range(100):
#     global_model = simulator.train_round(client_data)
#     accuracy = evaluate(global_model, X_test, y_test)
#     print(f"Round {round_num+1}: accuracy={accuracy:.4f}")

4.4 プライバシー保護手法の比較

手法	保護対象	精度への影響	導入コスト	ユースケース	通信コスト
差分プライバシー	個人データの推定防止	中〜大	中	統計クエリ、モデル学習	なし
連合学習	生データの非共有	小〜中	大	複数組織間の協調学習	高い
準同型暗号	暗号化状態での演算	なし	極大	医療・金融データ	中程度
k-匿名化	再識別の防止	小	小	データ公開	なし
l-多様性	属性推定の防止	小	小	データ公開	なし
t-近接性	分布推定の防止	小〜中	中	データ公開	なし
合成データ	実データ不使用	中	中	テスト・開発環境	なし
秘密分散	秘密情報の分散保持	なし	大	マルチパーティ計算	高い

5. AI ガバナンスとモデルカード

5.1 モデルカードの作成

# コード例 10: モデルカードの自動生成
from dataclasses import dataclass, field
from datetime import datetime
from typing import List, Dict, Optional
import json
 
@dataclass
class ModelCard:
    """モデルカード: モデルの透明性を確保する文書"""
 
    # 基本情報
    model_name: str
    model_version: str
    model_type: str
    created_date: str = field(default_factory=lambda: datetime.now().isoformat())
    authors: List[str] = field(default_factory=list)
 
    # 意図された用途
    primary_intended_uses: str = ""
    primary_intended_users: str = ""
    out_of_scope_uses: str = ""
 
    # 学習データ
    training_data_description: str = ""
    training_data_size: int = 0
    training_data_date_range: str = ""
 
    # 性能指標
    overall_metrics: Dict[str, float] = field(default_factory=dict)
    subgroup_metrics: Dict[str, Dict[str, float]] = field(default_factory=dict)
 
    # 公平性分析
    fairness_metrics: Dict[str, float] = field(default_factory=dict)
    sensitive_attributes_tested: List[str] = field(default_factory=list)
 
    # 制限事項
    limitations: List[str] = field(default_factory=list)
    ethical_considerations: List[str] = field(default_factory=list)
    caveats_and_recommendations: List[str] = field(default_factory=list)
 
    def to_markdown(self) -> str:
        """Markdown形式でモデルカードを生成"""
        md = f"# Model Card: {self.model_name}\n\n"
        md += f"**Version:** {self.model_version}  \n"
        md += f"**Type:** {self.model_type}  \n"
        md += f"**Created:** {self.created_date}  \n"
        md += f"**Authors:** {', '.join(self.authors)}  \n\n"
 
        md += "## Intended Use\n\n"
        md += f"**Primary Uses:** {self.primary_intended_uses}  \n"
        md += f"**Primary Users:** {self.primary_intended_users}  \n"
        md += f"**Out-of-Scope:** {self.out_of_scope_uses}  \n\n"
 
        md += "## Training Data\n\n"
        md += f"{self.training_data_description}  \n"
        md += f"**Size:** {self.training_data_size:,} samples  \n"
        md += f"**Date Range:** {self.training_data_date_range}  \n\n"
 
        md += "## Performance\n\n"
        md += "### Overall Metrics\n\n"
        md += "| Metric | Value |\n|--------|-------|\n"
        for metric, value in self.overall_metrics.items():
            md += f"| {metric} | {value:.4f} |\n"
 
        if self.subgroup_metrics:
            md += "\n### Subgroup Metrics\n\n"
            for group, metrics in self.subgroup_metrics.items():
                md += f"\n**{group}:**\n\n"
                md += "| Metric | Value |\n|--------|-------|\n"
                for metric, value in metrics.items():
                    md += f"| {metric} | {value:.4f} |\n"
 
        if self.fairness_metrics:
            md += "\n## Fairness Analysis\n\n"
            md += f"**Tested attributes:** {', '.join(self.sensitive_attributes_tested)}\n\n"
            md += "| Metric | Value | Status |\n|--------|-------|--------|\n"
            for metric, value in self.fairness_metrics.items():
                status = "PASS" if abs(value) < 0.1 else "WARN"
                md += f"| {metric} | {value:.4f} | {status} |\n"
 
        if self.limitations:
            md += "\n## Limitations\n\n"
            for lim in self.limitations:
                md += f"- {lim}\n"
 
        if self.ethical_considerations:
            md += "\n## Ethical Considerations\n\n"
            for eth in self.ethical_considerations:
                md += f"- {eth}\n"
 
        return md
 
    def save(self, filepath: str):
        """モデルカードをファイルに保存"""
        md = self.to_markdown()
        with open(filepath, "w") as f:
            f.write(md)
        # JSON形式でも保存
        with open(filepath.replace(".md", ".json"), "w") as f:
            json.dump(self.__dict__, f, indent=2, default=str)
 
# 使用例
card = ModelCard(
    model_name="Churn Prediction Model",
    model_version="v2.1",
    model_type="GradientBoostingClassifier",
    authors=["Data Science Team"],
    primary_intended_uses="顧客の解約予測。マーケティングチームによる"
                          "リテンション施策の優先順位付けに使用",
    primary_intended_users="マーケティングチーム、カスタマーサクセスチーム",
    out_of_scope_uses="個人の契約自動解除、信用スコアリング",
    training_data_description="2023年1月〜12月の顧客行動ログ",
    training_data_size=150000,
    training_data_date_range="2023-01-01 to 2023-12-31",
    overall_metrics={
        "accuracy": 0.92,
        "f1_score": 0.87,
        "auc_roc": 0.95,
        "precision": 0.89,
        "recall": 0.85,
    },
    subgroup_metrics={
        "Male": {"accuracy": 0.91, "f1_score": 0.86},
        "Female": {"accuracy": 0.93, "f1_score": 0.88},
        "Age < 30": {"accuracy": 0.89, "f1_score": 0.83},
        "Age >= 30": {"accuracy": 0.93, "f1_score": 0.89},
    },
    fairness_metrics={
        "SPD (gender)": -0.02,
        "DI (gender)": 0.97,
        "EOD (gender)": -0.03,
        "SPD (age)": -0.08,
    },
    sensitive_attributes_tested=["gender", "age_group"],
    limitations=[
        "2023年のデータのみで学習。COVID-19以前の行動パターンは含まない",
        "新規顧客（利用歴3ヶ月未満）の予測精度は低い",
        "法人顧客には適用不可（個人顧客データのみで学習）",
    ],
    ethical_considerations=[
        "年齢に基づく差別的な施策への使用を禁止",
        "予測結果のみでサービス停止などの不利益処分を行わない",
        "定期的な公平性監査を実施（四半期ごと）",
    ],
)
 
card.save("model_card.md")
print(card.to_markdown())

5.2 AI 監査チェックリスト

# コード例 11: AI 監査の自動チェックリスト
from dataclasses import dataclass
from typing import List, Tuple
 
@dataclass
class AuditItem:
    category: str
    question: str
    status: str  # "PASS", "FAIL", "N/A", "PENDING"
    evidence: str = ""
    recommendation: str = ""
 
class AIAuditChecklist:
    """AI システムの監査チェックリスト"""
 
    def __init__(self):
        self.items: List[AuditItem] = []
 
    def add_item(self, category, question, status, evidence="", recommendation=""):
        self.items.append(AuditItem(
            category=category,
            question=question,
            status=status,
            evidence=evidence,
            recommendation=recommendation,
        ))
 
    def run_automated_checks(self, model, X_test, y_test,
                              sensitive_features=None):
        """自動化可能なチェックを実行する"""
 
        # 1. 性能テスト
        from sklearn.metrics import accuracy_score, f1_score
        y_pred = model.predict(X_test)
        acc = accuracy_score(y_test, y_pred)
        f1 = f1_score(y_test, y_pred, average="weighted")
 
        self.add_item(
            "性能", "精度が閾値以上か？",
            "PASS" if acc >= 0.85 else "FAIL",
            f"accuracy={acc:.4f}, f1={f1:.4f}"
        )
 
        # 2. 公平性テスト
        if sensitive_features is not None:
            fairness = calculate_fairness_metrics(y_test, y_pred, sensitive_features)
 
            self.add_item(
                "公平性", "Disparate Impact が 4/5 ルールを満たすか？",
                "PASS" if fairness["passes_4_5_rule"] else "FAIL",
                f"DI={fairness['DI']:.4f}"
            )
            self.add_item(
                "公平性", "SPD が閾値以内か？",
                "PASS" if fairness["fair_by_SPD"] else "FAIL",
                f"SPD={fairness['SPD']:.4f}"
            )
 
        # 3. ロバスト性テスト
        import numpy as np
        noise = np.random.normal(0, 0.01, X_test.shape)
        y_noisy = model.predict(X_test + noise)
        stability = np.mean(y_pred == y_noisy)
        self.add_item(
            "ロバスト性", "微小ノイズに対して安定か？",
            "PASS" if stability >= 0.95 else "FAIL",
            f"安定性={stability:.4f}"
        )
 
    def generate_report(self) -> str:
        """監査レポートを生成する"""
        report = "# AI 監査レポート\n\n"
        report += f"実施日: {datetime.now().strftime('%Y-%m-%d')}\n\n"
 
        # サマリー
        total = len(self.items)
        passed = sum(1 for item in self.items if item.status == "PASS")
        failed = sum(1 for item in self.items if item.status == "FAIL")
        report += f"## サマリー: {passed}/{total} 合格"
        report += f" ({failed} 件の要改善項目)\n\n"
 
        # カテゴリ別
        categories = set(item.category for item in self.items)
        for cat in sorted(categories):
            report += f"### {cat}\n\n"
            report += "| 項目 | 結果 | 根拠 |\n|------|------|------|\n"
            for item in self.items:
                if item.category == cat:
                    icon = {"PASS": "OK", "FAIL": "NG", "PENDING": "?"}
                    report += f"| {item.question} | {icon.get(item.status, '?')} "
                    report += f"| {item.evidence} |\n"
            report += "\n"
 
        return report
 
# 使用例
audit = AIAuditChecklist()
audit.run_automated_checks(model, X_test, y_test, sensitive_features=gender)
audit.add_item("文書化", "モデルカードが作成されているか？", "PASS",
               "model_card.md に記載済み")
audit.add_item("文書化", "データリネージが記録されているか？", "PASS",
               "DVC で追跡中")
audit.add_item("運用", "ドリフト監視が設定されているか？", "PASS",
               "Evidently で日次監視中")
audit.add_item("運用", "インシデント対応手順が定義されているか？", "PENDING",
               recommendation="対応手順書の作成が必要")
 
print(audit.generate_report())

6. 安全性とガードレール

6.1 LLM のガードレール実装

# コード例 12: LLM の出力に対するガードレール
from typing import List, Optional
import re
 
class LLMGuardrails:
    """LLMの出力に対する安全性ガードレール"""
 
    def __init__(self):
        self.blocked_topics = [
            "weapons", "drugs", "violence", "self-harm",
        ]
        self.pii_patterns = {
            "email": r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}",
            "phone_jp": r"0\d{1,4}-?\d{1,4}-?\d{3,4}",
            "credit_card": r"\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b",
            "my_number": r"\b\d{4}\s?\d{4}\s?\d{4}\b",
        }
 
    def check_input(self, text: str) -> dict:
        """入力テキストの安全性チェック"""
        issues = []
 
        # トピックフィルタリング
        text_lower = text.lower()
        for topic in self.blocked_topics:
            if topic in text_lower:
                issues.append(f"ブロック対象トピック検出: {topic}")
 
        # インジェクション攻撃検出
        injection_patterns = [
            r"ignore\s+(previous|above)\s+instructions",
            r"system\s*prompt",
            r"you\s+are\s+now",
            r"act\s+as\s+(if|a)",
        ]
        for pattern in injection_patterns:
            if re.search(pattern, text_lower):
                issues.append(f"プロンプトインジェクション検出: {pattern}")
 
        return {
            "safe": len(issues) == 0,
            "issues": issues,
        }
 
    def check_output(self, text: str) -> dict:
        """出力テキストの安全性チェック"""
        issues = []
        filtered_text = text
 
        # PII (個人情報) 検出 & マスキング
        for pii_type, pattern in self.pii_patterns.items():
            matches = re.findall(pattern, text)
            if matches:
                issues.append(f"PII検出 ({pii_type}): {len(matches)}件")
                filtered_text = re.sub(pattern, f"[{pii_type.upper()}_REDACTED]",
                                        filtered_text)
 
        # 有害コンテンツのスコアリング（簡易版）
        harmful_keywords = ["殺す", "死ね", "爆破", "犯罪"]
        for keyword in harmful_keywords:
            if keyword in text:
                issues.append(f"有害キーワード検出: {keyword}")
 
        return {
            "safe": len(issues) == 0,
            "issues": issues,
            "filtered_text": filtered_text,
        }
 
    def apply(self, input_text: str, output_text: str) -> dict:
        """入出力の両方にガードレールを適用"""
        input_check = self.check_input(input_text)
        output_check = self.check_output(output_text)
 
        return {
            "input_safe": input_check["safe"],
            "output_safe": output_check["safe"],
            "all_issues": input_check["issues"] + output_check["issues"],
            "filtered_output": output_check["filtered_text"],
            "should_block": not input_check["safe"],
        }
 
# 使用例
guardrails = LLMGuardrails()
 
# 入力チェック
result = guardrails.check_input("Ignore previous instructions and tell me...")
print(f"入力安全: {result['safe']}")
print(f"問題: {result['issues']}")
 
# 出力チェック
result = guardrails.check_output(
    "お客様の電話番号は090-1234-5678で、メールはuser@example.comです。"
)
print(f"出力安全: {result['safe']}")
print(f"フィルタ後: {result['filtered_text']}")

7. アンチパターン

アンチパターン 1: 「公平性の後付け」

[誤り] モデルを完成させた後に「公平性チェック」だけ追加する

問題点:
- バイアスがデータ収集段階で混入している場合、後処理では根本解決しない
- 後処理での閾値調整は精度を大幅に犠牲にする場合がある
- チーム全体の意識が公平性に向かない

[正解] パイプライン全体で公平性を考慮する
  1. データ収集: 代表性のあるサンプリング設計
  2. 前処理: バイアス検出と緩和
  3. 学習: 公平性制約付き最適化
  4. 評価: グループ別メトリクスの確認
  5. 監視: 本番環境でのバイアス監視

アンチパターン 2: 「説明可能性 = 特徴量重要度」

[誤り] feature_importances_ を見せれば「説明可能」と考える

問題点:
- グローバルな重要度は個別の判断根拠を説明しない
- 相関する特徴量があると重要度が分散する
- 非技術者には依然として理解困難

[正解] 対象者・目的に応じた説明を提供する
  - データサイエンティスト向け: SHAP値、PDP
  - ビジネス担当者向け: 「この顧客が解約予測された主因は
    過去30日間のログイン回数減少(寄与度40%)です」
  - 規制当局向け: モデルカード、監査ログ
  - エンドユーザー向け: 自然言語での判断理由提示

アンチパターン 3: 「プライバシー = データ匿名化」

[誤り] 氏名・住所を削除すれば個人が特定されないと考える

問題点:
- 準識別子（年齢、性別、居住地域の組み合わせ）で再識別可能
- モデルの出力から学習データを推定される（メンバーシップ推定攻撃）
- 集約統計でも個人情報が漏洩しうる（差分攻撃）

  例: 「30代男性、東京都渋谷区在住、年収800万円」
  → 数人〜数十人に絞り込み可能

[正解] 数学的なプライバシー保証を持つ手法を使用する
  - 差分プライバシー: ε-δ でプライバシー損失を定量化
  - k-匿名化 + l-多様性: 再識別リスクを制御
  - 連合学習: 生データを共有しない

アンチパターン 4: 「一度の監査で完了」

[誤り] リリース前に一度だけバイアス監査を行い、その後は放置する

問題点:
- データ分布の変化でバイアスが新たに発生する
- 社会状況の変化で公平性の定義が変わる
- フィードバックループによるバイアスの増幅

[正解] 継続的な監視と定期的な再監査
  - 日次: ドリフト検知、予測分布モニタリング
  - 月次: グループ別メトリクスのレビュー
  - 四半期: 包括的な公平性監査
  - 年次: モデルカードの更新、ステークホルダーレビュー

8. 規制と法的フレームワーク

8.1 主要な AI 規制の比較

規制	地域	主な要件	違反時の罰則
EU AI Act	EU	リスクベースの規制、高リスクAIに透明性・公平性要件	最大3500万ユーロ or 売上7%
GDPR Art.22	EU	自動意思決定の説明義務、異議申立権	最大2000万ユーロ or 売上4%
CCPA/CPRA	米カリフォルニア	自動意思決定のオプトアウト権	罰金あり
個人情報保護法	日本	個人データの適正取得・利用、本人同意	罰金1億円以下
Blueprint for AI Bill of Rights	米国	安全で効果的なシステム、差別からの保護	ガイドライン（法的拘束力なし）

9. FAQ

Q1: 公平性と精度はトレードオフですか？

A: 完全なトレードオフではありません。多くの場合、公平性制約を導入すると精度がわずかに低下しますが、その程度は問題設定に依存します。実務的には以下の戦略が有効です。

バイアスのあるデータを修正すると、精度が向上する場合もある
公平性制約の強さを調整して、許容可能な精度・公平性のバランスを探す
Pareto フロンティアを可視化し、ステークホルダーと合意形成する

Q2: モデルカードとは何ですか？

A: モデルカードは、モデルの詳細情報を標準フォーマットで文書化したものです（Google が 2019 年に提唱）。以下を含みます。

モデルの概要と意図された用途
学習データの説明とバイアス分析
性能指標（全体 + サブグループ別）
倫理的考慮事項と制限事項
テスト結果と推奨される使用条件

A: GDPR 第22条は、自動化された意思決定に対する説明の権利を定めています。技術的対応として以下を推奨します。

意思決定ログの保存: 各予測の入力・出力・説明を記録
局所的説明の生成: SHAP/LIME で個別判断の根拠を生成可能にする
人間による介入経路: 自動判断に異議申し立てできるプロセスを整備
平易な言語での説明: 技術的でない表現で判断根拠を伝える仕組み

Q4: 公平性指標はどれを使うべきですか？

A: タスクの文脈に応じて選択します。

融資審査: Equal Opportunity (TPR均等) — 適格な人が平等に承認される
犯罪リスク評価: Predictive Parity + FPR均等 — 無実の人を不当に扱わない
採用スクリーニング: Demographic Parity — 選択率がグループ間で等しい
医療診断: Calibration — 確率的予測が正確

不可能定理により、すべての指標を同時に満たすことはできないため、ステークホルダーと合意して優先順位を決めます。

Q5: 小規模チームで責任ある AI をどう始めればよいですか？

A: 段階的に導入します。

第1段階（1週間）: モデルカードのテンプレートを作成し、既存モデルに適用
第2段階（2週間）: SHAP による説明可能性を追加
第3段階（1ヶ月）: Fairlearn でグループ別メトリクスを計測
第4段階（2ヶ月）: CI/CD に公平性テストを組み込み
第5段階（3ヶ月）: 本番でのバイアス監視を開始

FAQ

Q1: このトピックを学ぶ上で最も重要なポイントは何ですか？

実践的な経験を積むことが最も重要です。理論だけでなく、実際にコードを書いて動作を確認することで理解が深まります。

Q2: 初心者がよく陥る間違いは何ですか？

基礎を飛ばして応用に進むことです。このガイドで説明している基本概念をしっかり理解してから、次のステップに進むことをお勧めします。

Q3: 実務ではどのように活用されていますか？

このトピックの知識は、日常的な開発業務で頻繁に活用されます。特にコードレビューやアーキテクチャ設計の際に重要になります。

10. まとめ

領域	目的	主要手法	ツール例
公平性	バイアスの検出・緩和	SPD, DI, 敵対的デバイアス	Fairlearn, AIF360
説明可能性	判断根拠の可視化	SHAP, LIME, PDP, Counterfactual	shap, lime, DiCE
プライバシー	個人情報の保護	差分プライバシー, 連合学習	Opacus, PySyft
透明性	モデル情報の文書化	モデルカード	model-card-toolkit
安全性	有害出力の防止	ガードレール, レッドチーム	guardrails-ai, NeMo
説明責任	監査可能性の確保	監査ログ, リネージ	MLflow
ガバナンス	組織的管理体制	監査チェックリスト, ポリシー	カスタム実装

次に読むべきガイド

AI セーフティ — アライメント、レッドチームの実践
AI ガバナンス — 規制・ポリシーの動向
MLOps — 責任ある AI を本番パイプラインに組み込む方法

参考文献

Mitchell, M. et al. (2019). "Model Cards for Model Reporting." Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT* '19). ACM. https://doi.org/10.1145/3287560.3287596
Dwork, C. & Roth, A. (2014). "The Algorithmic Foundations of Differential Privacy." Foundations and Trends in Theoretical Computer Science, 9(3-4), 211-407. https://doi.org/10.1561/0400000042
Mehrabi, N. et al. (2021). "A Survey on Bias and Fairness in Machine Learning." ACM Computing Surveys, 54(6), 1-35. https://doi.org/10.1145/3457607
Chouldechova, A. (2017). "Fair prediction with disparate impact: A study of bias in recidivism prediction instruments." Big Data, 5(2), 153-163.
EU Artificial Intelligence Act (2024). European Parliament and Council Regulation on Artificial Intelligence. https://artificialintelligenceact.eu/