Gemma 3微调指南

Google 发布了 Gemma 3 270M,这是一款紧凑的指令调优模型,可在本地运行。

Gemma 3微调指南

Google 发布了 Gemma 3 270M,这是一款紧凑的指令调优模型,可在本地运行。由于采用 4 位加载,推理仅需几百 MB 内存——没错,低于 0.5 GB——因此在一台普通的机器上进行调试终于变得轻松自如了。我们将在“填补缺失的棋步”任务上对其进行微调、评估,然后导出供本地使用。Google AI for Developers

为什么这个模型如此特别(并且非常适合业余项目)

  • 小巧但功能强大。Gemma 3 的内存大小从 270M 到 27B 不等;270M 版本仅支持文本格式,旨在通过微调实现专业化。Google AI for Developers
  • 内存占用惊人。官方文档列出了推理所需的加载空间约为 400 MB(bf16)和 240 MB(Q4);虽然为 token/运行时预留了一些空间,但仍然很小。Google AI for Developers
  • 专为微调而设计。Google 的介绍明确将 270M 定位为“紧凑的、遵循指令的基础”,并重点介绍了 QAT 检查点和社区工具(Unsloth、Ollama、llama.cpp)。 Google 开发者博客

⚠️ 注意:训练(即使是 LoRA)所需的内存比推理更大。对于仅使用 CPU 的机器,请保持合理的预期;小型 GPU 会大有帮助。Google AI for Developers

我们正在构建的内容:

  • 在本地加载 Gemma 3 270M-IT
  • 准备一个小型国际象棋“遗漏着法”数据集
  • 使用 Unsloth 进行 LoRA 调优(快速适配器,小显存)
  • 快速评估器,用于检查留空棋盘的准确性
  • 导出到 GGUF(可选)用于 Ollama / llama.cpp

技术栈:Unsloth + Hugging Face Transformers + TRL + 数据集。如果您想完整了解,Unsloth 提供了 Gemma 3 的分步指南。 Unsloth 文档

0、环境和安装

# Python 3.10+ recommended
python -m venv .venv && source .venv/bin/activate   # Windows: .venv\Scripts\activate

pip install -U pip
pip install "unsloth>=2025.8.0" "transformers>=4.43.3" "datasets" \
            "trl>=0.9.6" "accelerate" "peft" "bitsandbytes" "evaluate" "scikit-learn"

如果 bitsandbytes 在 CPU 环境下运行出现问题,请设置 load_in_4bit=False 并继续。您仍然可以进行 LoRA 调优,只是速度会慢一些。

1、加载 Gemma 3 270M(指令调优)

# 1_load.py
from unsloth import FastLanguageModel
import torch

MODEL_ID = "unsloth/gemma-3-270m-it"  # instruction-tuned 270M
CONTEXT = 2048

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name           = MODEL_ID,
    max_seq_length       = CONTEXT,
    dtype                = None,        # auto -> bf16/fp16 where possible
    load_in_4bit         = True,        # QLoRA path (saves VRAM)
    full_finetuning      = False,       # we'll do LoRA adapters
)
tokenizer.pad_token = tokenizer.eos_token
print("Loaded:", MODEL_ID)

2、数据集:“缺少哪一步?

我们将使用 ChessInstruct(社区数据集),并将其调整为聊天式监督微调 (SFT)。Hugging Face

# 2_data.py
from datasets import load_dataset

# Small slice for a fast demo; scale up later.
raw = load_dataset("Thytu/ChessInstruct", split="train[:8000]")

def to_chat(example):
    # Expect fields like: {"task": "...", "input": "...", "expected_output": "..."}
    sys = "You are a chess assistant. Given a list of moves with one missing move as '?', produce the exact missing move in SAN or coordinate form."
    usr = str(example.get("input", ""))  # the partial game / board context
    ans = str(example.get("expected_output", "")).strip()
    return {
        "messages": [
            {"role": "system", "content": sys},
            {"role": "user",   "content": usr},
            {"role": "assistant", "content": ans}
        ]
    }

chat_ds = raw.map(to_chat, remove_columns=raw.column_names)
chat_ds = chat_ds.train_test_split(test_size=0.05, seed=3407)
train_ds, val_ds = chat_ds["train"], chat_ds["test"]

print(train_ds[0]["messages"])

3、LoRA 配置(小型适配器,大型win)

# 3_lora.py
from unsloth import FastModel

LORA_R = 96
TARGETS = ["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"]

model = FastModel.get_peft_model(
    model,
    r                         = LORA_R,
    target_modules            = TARGETS,
    lora_alpha                = 128,
    lora_dropout              = 0.05,
    bias                      = "none",
    use_gradient_checkpointing= "unsloth",
    random_state              = 3407,
)
print("LoRA attached.")

4、使用 TRL 进行监督微调 (SFT)

# 4_train.py
from trl import SFTTrainer, SFTConfig

cfg = SFTConfig(
    per_device_train_batch_size = 2,
    gradient_accumulation_steps= 4,
    learning_rate              = 5e-5,
    lr_scheduler_type          = "cosine",
    weight_decay               = 0.01,
    max_steps                  = 150,      # quick demo; increase for quality
    logging_steps              = 10,
    save_steps                 = 75,
    optim                      = "adamw_8bit",
    fp16                       = True,
)

trainer = SFTTrainer(
    model          = model,
    tokenizer      = tokenizer,
    train_dataset  = train_ds,
    eval_dataset   = val_ds.select(range(200)),  # mini dev set
    args           = cfg,
    dataset_text_field = "messages",             # TRL auto-handles chat fmt
)

train_out = trainer.train()
print(train_out)
trainer.save_model("gemma3-270m-chess-lora")
tokenizer.save_pretrained("gemma3-270m-chess-lora")

使循环更流畅的技巧

  • 首先将 max_steps 设置为 100-300,确认循环有效,然后进行扩展。
  • 如果 VRAM 紧张,请降低 CONTEXT、batch_size,并增加 Gradient_accumulation_steps。

5、快速评估(我们真的学到了吗?)

# 5_eval.py
import re
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer

# Load the merged model for eval (optional merge shown later)
base_id = MODEL_ID
peft_dir = "gemma3-270m-chess-lora"

# For fast eval, keep it as base + adapters via PEFT:
eval_model = model
eval_tok = tokenizer

def ask(moves: str, temperature=0.2, max_new_tokens=32):
    prompt = f"Moves so far (with one missing as '?'):\n{moves}\n\nMissing move:"
    inputs = eval_tok.apply_chat_template(
        [{"role":"system","content":"You output ONLY the missing move."},
         {"role":"user","content":prompt}],
        tokenize=False, add_generation_prompt=True
    )
    out = eval_model.generate(**eval_tok(inputs, return_tensors="pt").to(eval_model.device),
                              do_sample=False, temperature=temperature,
                              max_new_tokens=max_new_tokens)
    text = eval_tok.decode(out[0], skip_special_tokens=True)
    # Extract the tail after our prompt
    return text.split("Missing move:")[-1].strip().splitlines()[0]

samples = [
    "c2c4, g8f6, b1c3, e7e6, d2d4, d7d5, c4d5, e6d5, c1g5, ?, result: 1/2-1/2",
    "e2e4, c7c5, g1f3, d7d6, d2d4, c5d4, f3d4, g8f6, b1c3, ?, result: 1-0",
]
for s in samples:
    print(s, "→", ask(s))

为了获得正确的指标,请从数据集中解析出基本事实,并计算保留分割的精确匹配。

6、合并并导出 (GGUF / Ollama)

# 6_export.py
from unsloth import FastModel

# Option A: Merge LoRA into base weights (creates a standalone model)
merged_dir = "gemma3-270m-chess-merged"
FastModel.merge_and_unload(model, save_path=merged_dir)
tokenizer.save_pretrained(merged_dir)
print("Merged model saved to:", merged_dir)

# Option B (optional): export GGUF for llama.cpp / Ollama
# (Unsloth doc shows saver utilities & quantization suggestions)
# See: https://docs.unsloth.ai/basics/gemma-3-how-to-run-and-fine-tune

现在,您可以在 Ollama 或 llama.cpp 中本地运行它,作为国际象棋谜题的小助手。 Unsloth 的 Gemma 3 指南涵盖了导出和最佳实践推理参数。Unsloth 文档


原文链接:Laptop-Only LLM: Tune Google Gemma 3 in Minutes (Code Inside)

汇智网翻译整理,转载请标明出处