Gemma 3微调指南
Google 发布了 Gemma 3 270M,这是一款紧凑的指令调优模型,可在本地运行。

Google 发布了 Gemma 3 270M,这是一款紧凑的指令调优模型,可在本地运行。由于采用 4 位加载,推理仅需几百 MB 内存——没错,低于 0.5 GB——因此在一台普通的机器上进行调试终于变得轻松自如了。我们将在“填补缺失的棋步”任务上对其进行微调、评估,然后导出供本地使用。Google AI for Developers
为什么这个模型如此特别(并且非常适合业余项目)
- 小巧但功能强大。Gemma 3 的内存大小从 270M 到 27B 不等;270M 版本仅支持文本格式,旨在通过微调实现专业化。Google AI for Developers
- 内存占用惊人。官方文档列出了推理所需的加载空间约为 400 MB(bf16)和 240 MB(Q4);虽然为 token/运行时预留了一些空间,但仍然很小。Google AI for Developers
- 专为微调而设计。Google 的介绍明确将 270M 定位为“紧凑的、遵循指令的基础”,并重点介绍了 QAT 检查点和社区工具(Unsloth、Ollama、llama.cpp)。 Google 开发者博客
⚠️ 注意:训练(即使是 LoRA)所需的内存比推理更大。对于仅使用 CPU 的机器,请保持合理的预期;小型 GPU 会大有帮助。Google AI for Developers
我们正在构建的内容:
- 在本地加载 Gemma 3 270M-IT
- 准备一个小型国际象棋“遗漏着法”数据集
- 使用 Unsloth 进行 LoRA 调优(快速适配器,小显存)
- 快速评估器,用于检查留空棋盘的准确性
- 导出到 GGUF(可选)用于 Ollama / llama.cpp
技术栈:Unsloth + Hugging Face Transformers + TRL + 数据集。如果您想完整了解,Unsloth 提供了 Gemma 3 的分步指南。 Unsloth 文档
0、环境和安装
# Python 3.10+ recommended
python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -U pip
pip install "unsloth>=2025.8.0" "transformers>=4.43.3" "datasets" \
"trl>=0.9.6" "accelerate" "peft" "bitsandbytes" "evaluate" "scikit-learn"
如果 bitsandbytes 在 CPU 环境下运行出现问题,请设置 load_in_4bit=False 并继续。您仍然可以进行 LoRA 调优,只是速度会慢一些。
1、加载 Gemma 3 270M(指令调优)
# 1_load.py
from unsloth import FastLanguageModel
import torch
MODEL_ID = "unsloth/gemma-3-270m-it" # instruction-tuned 270M
CONTEXT = 2048
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = MODEL_ID,
max_seq_length = CONTEXT,
dtype = None, # auto -> bf16/fp16 where possible
load_in_4bit = True, # QLoRA path (saves VRAM)
full_finetuning = False, # we'll do LoRA adapters
)
tokenizer.pad_token = tokenizer.eos_token
print("Loaded:", MODEL_ID)
2、数据集:“缺少哪一步?
我们将使用 ChessInstruct(社区数据集),并将其调整为聊天式监督微调 (SFT)。Hugging Face
# 2_data.py
from datasets import load_dataset
# Small slice for a fast demo; scale up later.
raw = load_dataset("Thytu/ChessInstruct", split="train[:8000]")
def to_chat(example):
# Expect fields like: {"task": "...", "input": "...", "expected_output": "..."}
sys = "You are a chess assistant. Given a list of moves with one missing move as '?', produce the exact missing move in SAN or coordinate form."
usr = str(example.get("input", "")) # the partial game / board context
ans = str(example.get("expected_output", "")).strip()
return {
"messages": [
{"role": "system", "content": sys},
{"role": "user", "content": usr},
{"role": "assistant", "content": ans}
]
}
chat_ds = raw.map(to_chat, remove_columns=raw.column_names)
chat_ds = chat_ds.train_test_split(test_size=0.05, seed=3407)
train_ds, val_ds = chat_ds["train"], chat_ds["test"]
print(train_ds[0]["messages"])
3、LoRA 配置(小型适配器,大型win)
# 3_lora.py
from unsloth import FastModel
LORA_R = 96
TARGETS = ["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"]
model = FastModel.get_peft_model(
model,
r = LORA_R,
target_modules = TARGETS,
lora_alpha = 128,
lora_dropout = 0.05,
bias = "none",
use_gradient_checkpointing= "unsloth",
random_state = 3407,
)
print("LoRA attached.")
4、使用 TRL 进行监督微调 (SFT)
# 4_train.py
from trl import SFTTrainer, SFTConfig
cfg = SFTConfig(
per_device_train_batch_size = 2,
gradient_accumulation_steps= 4,
learning_rate = 5e-5,
lr_scheduler_type = "cosine",
weight_decay = 0.01,
max_steps = 150, # quick demo; increase for quality
logging_steps = 10,
save_steps = 75,
optim = "adamw_8bit",
fp16 = True,
)
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = train_ds,
eval_dataset = val_ds.select(range(200)), # mini dev set
args = cfg,
dataset_text_field = "messages", # TRL auto-handles chat fmt
)
train_out = trainer.train()
print(train_out)
trainer.save_model("gemma3-270m-chess-lora")
tokenizer.save_pretrained("gemma3-270m-chess-lora")
使循环更流畅的技巧
- 首先将 max_steps 设置为 100-300,确认循环有效,然后进行扩展。
- 如果 VRAM 紧张,请降低 CONTEXT、batch_size,并增加 Gradient_accumulation_steps。
5、快速评估(我们真的学到了吗?)
# 5_eval.py
import re
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
# Load the merged model for eval (optional merge shown later)
base_id = MODEL_ID
peft_dir = "gemma3-270m-chess-lora"
# For fast eval, keep it as base + adapters via PEFT:
eval_model = model
eval_tok = tokenizer
def ask(moves: str, temperature=0.2, max_new_tokens=32):
prompt = f"Moves so far (with one missing as '?'):\n{moves}\n\nMissing move:"
inputs = eval_tok.apply_chat_template(
[{"role":"system","content":"You output ONLY the missing move."},
{"role":"user","content":prompt}],
tokenize=False, add_generation_prompt=True
)
out = eval_model.generate(**eval_tok(inputs, return_tensors="pt").to(eval_model.device),
do_sample=False, temperature=temperature,
max_new_tokens=max_new_tokens)
text = eval_tok.decode(out[0], skip_special_tokens=True)
# Extract the tail after our prompt
return text.split("Missing move:")[-1].strip().splitlines()[0]
samples = [
"c2c4, g8f6, b1c3, e7e6, d2d4, d7d5, c4d5, e6d5, c1g5, ?, result: 1/2-1/2",
"e2e4, c7c5, g1f3, d7d6, d2d4, c5d4, f3d4, g8f6, b1c3, ?, result: 1-0",
]
for s in samples:
print(s, "→", ask(s))
为了获得正确的指标,请从数据集中解析出基本事实,并计算保留分割的精确匹配。
6、合并并导出 (GGUF / Ollama)
# 6_export.py
from unsloth import FastModel
# Option A: Merge LoRA into base weights (creates a standalone model)
merged_dir = "gemma3-270m-chess-merged"
FastModel.merge_and_unload(model, save_path=merged_dir)
tokenizer.save_pretrained(merged_dir)
print("Merged model saved to:", merged_dir)
# Option B (optional): export GGUF for llama.cpp / Ollama
# (Unsloth doc shows saver utilities & quantization suggestions)
# See: https://docs.unsloth.ai/basics/gemma-3-how-to-run-and-fine-tune
现在,您可以在 Ollama 或 llama.cpp 中本地运行它,作为国际象棋谜题的小助手。 Unsloth 的 Gemma 3 指南涵盖了导出和最佳实践推理参数。Unsloth 文档
原文链接:Laptop-Only LLM: Tune Google Gemma 3 in Minutes (Code Inside)
汇智网翻译整理,转载请标明出处
