IBM Granite 4.0-Nano边缘AI模型
IBM推出了Granite 4.0 Nano模型系列,致力于创建强大且实用的大语言模型(LLM),这些模型特别优化用于边缘和设备端应用。这些模型的参数量从大约3.5亿到15亿不等,在通用知识、数学、代码和安全性等标准基准测试中,与竞争对手类似规模的模型相比,其能力显著增强。
1、概述
该发布包括四个主要变体,包括基于新型高效混合-SSM架构(如Granite 4.0 H 1B和H 350M)的模型,以及传统的变压器版本,以确保与各种运行时的兼容性(如llama.cpp)。至关重要的是,Granite 4.0 Nano模型建立在与更大的Granite 4.0家族相同的强大训练流程和超过15万亿个标记的数据基础上。
为了广泛适用性和信心,所有Nano模型都采用Apache 2.0许可证发布,并带有IBM的ISO 42001认证,确保负责任的模型开发和治理,使用户可以放心部署,符合全球标准。
2、如何尝试Granite 4 Nano?
这些最先进的模型可以通过两个主要渠道方便地访问:简化部署平台Ollama,以及在Hugging Face上维护的全面存储库。这两个资源的直接访问链接已汇总在专用的“链接”部分,以便于您的使用。为了清楚地说明集成的非凡简便性和速度,我已改编并增强了基础代码示例,如下一节所示。
- 准备环境
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
- 安装所需的必要包 📦
# requirements.txt
HuggingFace
torch
transformers
accelerate
torchvision
torchaudio
pip install -r requirements.txt
- 一旦环境准备就绪,只需复制并运行这两个简单的应用程序!
第一个代码使用tokenizer的模板简单构建一个聊天提示,以在用户询问波士顿天气时启用工具调用行为。在对准备好的输入进行分词并将其移动到选定的设备后,模型会生成输出序列。 第二个脚本使用预训练的LLM执行完整的机器学习推理管道。然后加载IBM Granite 4.0(3.5亿参数)模型和分词器,为研究实验室位置查询准备用户提示,并从模型生成响应。
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import json
import os # Import os for file system operations
# --- Device Detection and Selection ---
# Automatically determine the best device available (CUDA > MPS > CPU)
if torch.cuda.is_available():
device = "cuda"
elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
# MPS (Metal Performance Shaders) is the accelerator for Apple Silicon (M1/M2/M3)
device = "mps"
else:
device = "cpu"
print(f"Selected device: {device}")
# --- End Device Detection ---
model_path = "ibm-granite/granite-4.0-350M"
tokenizer = AutoTokenizer.from_pretrained(model_path)
# Pass the automatically determined device to device_map
# The model will now load onto the CPU (or MPS/CUDA if available)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather for a specified city.",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "Name of the city"
}
},
"required": ["city"]
}
}
}
]
# change input text as desired
chat = [
{ "role": "user", "content": "What's the weather like in Boston right now?" },
]
chat = tokenizer.apply_chat_template(chat, \
tokenize=False, \
tools=tools, \
add_generation_prompt=True)
# tokenize the text and move to the selected device
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
# generate output tokens
output = model.generate(**input_tokens,
max_new_tokens=100)
# decode output tokens into text
output = tokenizer.batch_decode(output)
# --- Save output to file in Markdown format ---
output_dir = "./output"
output_file = os.path.join(output_dir, "output.md")
# Create the output directory if it doesn't exist.
# The exist_ok=True argument prevents an error if the directory already exists.
try:
os.makedirs(output_dir, exist_ok=True)
# Write the output to the Markdown file
with open(output_file, "w", encoding="utf-8") as f:
# The output from batch_decode is a list, we take the first item (the generated text)
f.write(output[0])
# Confirmation message
print(f"\nModel output saved successfully to {output_file}")
# Optionally print the content to the console for immediate review
print("\n--- Generated Content ---\n")
print(output[0])
print("\n-------------------------")
except Exception as e:
print(f"\nAn error occurred while trying to save the output file: {e}")
# --- End File Saving Logic ---
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import os # Import os for file system operations
# --- Device Detection and Selection ---
# Automatically determine the best device available (CUDA > MPS > CPU)
if torch.cuda.is_available():
device = "cuda"
elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
# MPS (Metal Performance Shaders) is the accelerator for Apple Silicon (M1/M2/M3)
device = "mps"
else:
device = "cpu"
print(f"Selected device: {device}")
# --- End Device Detection ---
model_path = "ibm-granite/granite-4.0-350M"
tokenizer = AutoTokenizer.from_pretrained(model_path)
# Pass the automatically determined device to device_map
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()
# change input text as desired
chat = [
{ "role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
# tokenize the text and move to the selected device
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
# generate output tokens
output = model.generate(**input_tokens,
max_new_tokens=100)
# decode output tokens into text
output = tokenizer.batch_decode(output)
# --- Save output to file in Markdown format ---
output_dir = "./output"
output_file = os.path.join(output_dir, "output.md")
# Create the output directory if it doesn't exist.
try:
os.makedirs(output_dir, exist_ok=True)
# Write the output to the Markdown file
with open(output_file, "w", encoding="utf-8") as f:
# The output from batch_decode is a list, we take the first item (the generated text)
f.write(output[0])
# Confirmation message
print(f"\nModel output saved successfully to {output_file}")
# Optionally print the content to the console for immediate review
print("\n--- Generated Content ---\n")
print(output[0])
print("\n-------------------------")
except Exception as e:
print(f"\nAn error occurred while trying to save the output file: {e}")
# --- End File Saving Logic ---
您将获得这些输出 📄
<|start_of_role|>system<|end_of_role|>You are a helpful assistant with access to the following tools. You may call one or more tools to assist with the user query.
You are provided with function signatures within <tools></tools> XML tags:
<tools>
{"type": "function", "function": {"name": "get_current_weather", "description": "Get the current weather for a specified city.", "parameters": {"type": "object", "properties": {"city": {"type": "string", "description": "Name of the city"}}, "required": ["city"]}}}
</tools>
For each tool call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.<|end_of_text|>
<|start_of_role|>user<|end_of_role|>What's the weather like in Boston right now?<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|><tool_call>
{"name": "get_current_weather", "arguments": {"city": "Boston"}}
</tool_call><|end_of_text|>
======
<|start_of_role|>system<|end_of_role|>You are a helpful assistant. Please ensure responses are professional, accurate, and safe.<|end_of_text|>
<|start_of_role|>user<|end_of_role|>Please list one IBM Research laboratory located in the United States. You should only output its name and location.<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>IBM Research Laboratory: Cambridge Research Laboratory<|end_of_text|>
就这样 🥇
3、结束语
Granite 4.0模型系列代表了向高效企业级AI的重要转变,通过关注可访问性而非规模来重新定义性能。其关键优势在于创新的混合Mamba/Transformer架构,这显著降低了内存需求——通常超过70%——使得在基本且经济的硬件上进行强大的推理成为可能,包括消费级GPU和边缘设备。至关重要的是,作为在Apache 2.0许可下的开源产品,Granite 4.0模型赋予开发者完全的操作自主权,允许深度定制、本地部署以增强数据隐私,并实现完全透明。这种卓越的效率和开放治理的结合降低了进入门槛,使复杂的工作流程如RAG和函数调用民主化,同时确保了现实世界业务采用所需的控制和信任。
原文链接:IBM Granite 4.0-Nano
汇智网翻译整理,转载请标明出处