IBM Granite 4.0-Nano边缘AI模型

IBM推出了Granite 4.0 Nano模型系列，致力于创建强大且实用的大语言模型（LLM），这些模型特别优化用于边缘和设备端应用。这些模型的参数量从大约3.5亿到15亿不等，在通用知识、数学、代码和安全性等标准基准测试中，与竞争对手类似规模的模型相比，其能力显著增强。

1、概述

该发布包括四个主要变体，包括基于新型高效混合-SSM架构（如Granite 4.0 H 1B和H 350M）的模型，以及传统的变压器版本，以确保与各种运行时的兼容性（如llama.cpp）。至关重要的是，Granite 4.0 Nano模型建立在与更大的Granite 4.0家族相同的强大训练流程和超过15万亿个标记的数据基础上。

为了广泛适用性和信心，所有Nano模型都采用Apache 2.0许可证发布，并带有IBM的ISO 42001认证，确保负责任的模型开发和治理，使用户可以放心部署，符合全球标准。

2、如何尝试Granite 4 Nano？

这些最先进的模型可以通过两个主要渠道方便地访问：简化部署平台Ollama，以及在Hugging Face上维护的全面存储库。这两个资源的直接访问链接已汇总在专用的“链接”部分，以便于您的使用。为了清楚地说明集成的非凡简便性和速度，我已改编并增强了基础代码示例，如下一节所示。

准备环境

python3 -m venv venv  
source venv/bin/activate  

pip install --upgrade pip

安装所需的必要包 📦

# requirements.txt  
HuggingFace  
torch  
transformers  
accelerate  
torchvision  
torchaudio

pip install -r requirements.txt

一旦环境准备就绪，只需复制并运行这两个简单的应用程序！

第一个代码使用tokenizer的模板简单构建一个聊天提示，以在用户询问波士顿天气时启用工具调用行为。在对准备好的输入进行分词并将其移动到选定的设备后，模型会生成输出序列。第二个脚本使用预训练的LLM执行完整的机器学习推理管道。然后加载IBM Granite 4.0（3.5亿参数）模型和分词器，为研究实验室位置查询准备用户提示，并从模型生成响应。

import torch  
from transformers import AutoModelForCausalLM, AutoTokenizer  
import json  
import os # Import os for file system operations  

# --- Device Detection and Selection ---  
# Automatically determine the best device available (CUDA > MPS > CPU)  
if torch.cuda.is_available():  
    device = "cuda"  
elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():  
    # MPS (Metal Performance Shaders) is the accelerator for Apple Silicon (M1/M2/M3)  
    device = "mps"  
else:  
    device = "cpu"  

print(f"Selected device: {device}")  
# --- End Device Detection ---  

model_path = "ibm-granite/granite-4.0-350M"  
tokenizer = AutoTokenizer.from_pretrained(model_path)  

# Pass the automatically determined device to device_map  
# The model will now load onto the CPU (or MPS/CUDA if available)  
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)  
model.eval()  

tools = [  
    {  
        "type": "function",  
        "function": {  
            "name": "get_current_weather",  
            "description": "Get the current weather for a specified city.",  
            "parameters": {  
                "type": "object",  
                "properties": {  
                    "city": {  
                        "type": "string",  
                        "description": "Name of the city"  
                    }  
                },  
                "required": ["city"]  
            }  
        }  
    }  
]  

# change input text as desired  
chat = [  
    { "role": "user", "content": "What's the weather like in Boston right now?" },  
]  

chat = tokenizer.apply_chat_template(chat, \  
                                     tokenize=False, \  
                                     tools=tools, \  
                                     add_generation_prompt=True)  

# tokenize the text and move to the selected device  
input_tokens = tokenizer(chat, return_tensors="pt").to(device)  

# generate output tokens  
output = model.generate(**input_tokens,   
                        max_new_tokens=100)  

# decode output tokens into text  
output = tokenizer.batch_decode(output)  

# --- Save output to file in Markdown format ---  
output_dir = "./output"  
output_file = os.path.join(output_dir, "output.md")  

# Create the output directory if it doesn't exist.  
# The exist_ok=True argument prevents an error if the directory already exists.  
try:  
    os.makedirs(output_dir, exist_ok=True)  

    # Write the output to the Markdown file  
    with open(output_file, "w", encoding="utf-8") as f:  
        # The output from batch_decode is a list, we take the first item (the generated text)  
        f.write(output[0])  

    # Confirmation message  
    print(f"\nModel output saved successfully to {output_file}")  

    # Optionally print the content to the console for immediate review  
    print("\n--- Generated Content ---\n")  
    print(output[0])  
    print("\n-------------------------")  

except Exception as e:  
    print(f"\nAn error occurred while trying to save the output file: {e}")  
# --- End File Saving Logic ---

import torch  
from transformers import AutoModelForCausalLM, AutoTokenizer  
import os # Import os for file system operations  

# --- Device Detection and Selection ---  
# Automatically determine the best device available (CUDA > MPS > CPU)  
if torch.cuda.is_available():  
    device = "cuda"  
elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():  
    # MPS (Metal Performance Shaders) is the accelerator for Apple Silicon (M1/M2/M3)  
    device = "mps"  
else:  
    device = "cpu"  

print(f"Selected device: {device}")  
# --- End Device Detection ---  

model_path = "ibm-granite/granite-4.0-350M"  
tokenizer = AutoTokenizer.from_pretrained(model_path)  

# Pass the automatically determined device to device_map  
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)  
model.eval()  

# change input text as desired  
chat = [  
    { "role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." },  
]  

chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)  

# tokenize the text and move to the selected device  
input_tokens = tokenizer(chat, return_tensors="pt").to(device)  

# generate output tokens  
output = model.generate(**input_tokens,   
                        max_new_tokens=100)  

# decode output tokens into text  
output = tokenizer.batch_decode(output)  

# --- Save output to file in Markdown format ---  
output_dir = "./output"  
output_file = os.path.join(output_dir, "output.md")  

# Create the output directory if it doesn't exist.  
try:  
    os.makedirs(output_dir, exist_ok=True)  

    # Write the output to the Markdown file  
    with open(output_file, "w", encoding="utf-8") as f:  
        # The output from batch_decode is a list, we take the first item (the generated text)  
        f.write(output[0])  

    # Confirmation message  
    print(f"\nModel output saved successfully to {output_file}")  

    # Optionally print the content to the console for immediate review  
    print("\n--- Generated Content ---\n")  
    print(output[0])  
    print("\n-------------------------")  

except Exception as e:  
    print(f"\nAn error occurred while trying to save the output file: {e}")  
# --- End File Saving Logic ---

您将获得这些输出 📄

<|start_of_role|>system<|end_of_role|>You are a helpful assistant with access to the following tools. You may call one or more tools to assist with the user query.  

You are provided with function signatures within <tools></tools> XML tags:  
<tools>  
{"type": "function", "function": {"name": "get_current_weather", "description": "Get the current weather for a specified city.", "parameters": {"type": "object", "properties": {"city": {"type": "string", "description": "Name of the city"}}, "required": ["city"]}}}  
</tools>  

For each tool call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:  
<tool_call>  
{"name": <function-name>, "arguments": <args-json-object>}  
</tool_call>. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.<|end_of_text|>  
<|start_of_role|>user<|end_of_role|>What's the weather like in Boston right now?<|end_of_text|>  
<|start_of_role|>assistant<|end_of_role|><tool_call>  
{"name": "get_current_weather", "arguments": {"city": "Boston"}}  
</tool_call><|end_of_text|>  

======  

<|start_of_role|>system<|end_of_role|>You are a helpful assistant. Please ensure responses are professional, accurate, and safe.<|end_of_text|>  
<|start_of_role|>user<|end_of_role|>Please list one IBM Research laboratory located in the United States. You should only output its name and location.<|end_of_text|>  
<|start_of_role|>assistant<|end_of_role|>IBM Research Laboratory: Cambridge Research Laboratory<|end_of_text|>

就这样 🥇

3、结束语

Granite 4.0模型系列代表了向高效企业级AI的重要转变，通过关注可访问性而非规模来重新定义性能。其关键优势在于创新的混合Mamba/Transformer架构，这显著降低了内存需求——通常超过70%——使得在基本且经济的硬件上进行强大的推理成为可能，包括消费级GPU和边缘设备。至关重要的是，作为在Apache 2.0许可下的开源产品，Granite 4.0模型赋予开发者完全的操作自主权，允许深度定制、本地部署以增强数据隐私，并实现完全透明。这种卓越的效率和开放治理的结合降低了进入门槛，使复杂的工作流程如RAG和函数调用民主化，同时确保了现实世界业务采用所需的控制和信任。

原文链接：IBM Granite 4.0-Nano

汇智网翻译整理，转载请标明出处