作者: 武卓,胡誉文

什么是 AI PC?为什么它适合做本地微调

AI PC(AI 个人电脑)正在把“推理 + 微调”从云端带回本地:它通常具备更强的端侧算力组合(CPU / iGPU / NPU 等),让开发者可以在不依赖昂贵独立显卡、甚至不必上云的情况下,通过 LoRA 等参数高效方法,完成SFT (Supervised Fine-Tuning)这类微调任务。

对开发者来说,AI PC 最大的价值在于“高频迭代”:

  • 快速验证你的数据、Prompt、工具调用格式是否有效

  • 不用为了短平快实验频繁开云实例,成本更可控

  • 数据不出本机,适合企业/内部 PoC 或敏感数据场景

本文分享一个真实跑通的案例:在AI PC 上,使用 Unsloth 进行 LoRA 微调(基于Transformer Reinforcement Learning(TRL) 的 SFTTrainer),对 meta-llama/Llama-3.2-3B-Instruct 进行训练,让模型具备 Function Calling(工具调用) 能力,并完成推理与评测。

  • 训练数据集:hiyouga/glaive-function-calling-v2-sharegpt

  • 评测数据集:Salesforce/xlam-function-calling-60k(需要申请权限)

 0 到 1:按步骤搭环境并跑通微调

 0 步:打开终端窗口

  • 例如:打开 Anaconda Prompt

  • conda activate ...

 1 步:安装 Visual Studio C++ 工具链

可在如下链接(https://visualstudio.microsoft.com/zh-hans/downloads/?q=build+tools)下载并安装 Visual Studio Build Tools / Community ,安装时勾选:

  • Desktop development with C++

  • MSVC v143 toolset

  • Windows 10/11 SDK

(可选)在当前终端启用 MSVC,运行以下命令可以验证(根据目前的安装版本和路径,以下路径名可能会有所不同:

call "C:\Program Files\Microsoft Visual Studio\18\Community\VC\Auxiliary\Build\vcvars64.bat"where cl

确认能看到 cl.exe 路径,说明编译器 OK。

第 2 步:安装 Intel oneAPI(Intel GPU/XPU 路线)

安装 oneAPI Base Toolkit 2025.2.1版本,点击https://www.intel.cn/content/www/cn/zh/developer/tools/oneapi/base-toolkit-download.html 的安装包下载安装。

image.png

需初始化 oneAPI 环境变量:

call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" 

第 3 步:开启 Windows 长路径(只需一次)

管理员 CMD终端窗口执行:

powershell -Command "Set-ItemProperty -Path ""HKLM:\SYSTEM\CurrentControlSet\Control\FileSystem"" -Name ""LongPathsEnabled"" -Value 1

第 4 步:创建 Conda 环境并安装 Unsloth(Intel XPU 版本)

conda create -n aipc-finetune-example python=3.11 -yconda activate aipc-finetune-example git clone https://github.com/unslothai/unsloth.gitcd unslothpip install -e .[intel-gpu-torch290]

安装完成后,你可以快速确认 XPU 可用:

import torchprint(torch.__version__)                 # 2.9.0+xpuprint(hasattr(torch, "xpu") and torch.xpu.is_available())  # True

第 5 步:配置 Level Zero SDK

Triton Intel 后端 JIT 编译依赖 Level Zero headers,在这里https://github.com/oneapi-src/level-zero/releases/tag/v1.20.2 下载 level-zero-win-sdk-1.20.2.zip 并解压到例如:

C:\Users\MindPro\On-Device Fine-tuning AIPC Example\level-zero-win-sdk-1.20.2

设置 ZE_PATH:

set ZE_PATH=C:\Users\MindPro\On-Device Fine-tuning AIPC Example\level-zero-win-sdk-1.20.2

第 6 步:Hugging Face gated 数据集访问(评测必看)

评测数据集 Salesforce/xlam-function-calling-60k 是 gated:仅设置 HF_TOKEN 不够,还需要在网页上 Request access / Agree

推荐流程:

set HF_TOKEN=hf_xxxhuggingface-cli login

然后去数据集页面申请/同意一次。

第 7步:安装 ROUGE 评测依赖​​​​​​​

pip install -U evaluate nltk rouge_score bert_score absl-pypython -c "import nltk; nltk.download('punkt')"

3)运行:训练 / 推理 / 评测

  • 训练​​​​​​​

call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" --forceset ZE_PATH=path\to\the\unzipped\level-zero-win-sdk-1.20.2 #这里放你自己的解压后的路径set UNSLOTH_DISABLE_STATISTICS=1 python train.py 

具体代码如下:​​​​​​​

import datetime from datasets import load_datasetimport torchfrom unsloth import FastLanguageModelfrom trl import SFTTrainer, SFTConfigfrom unsloth.chat_templates import get_chat_templatedef prepare_dataset(tokenizer, dataset_name, dataset_size, dataset_seed):    system_prompt = "You are a helpful assistant with access to the following functions. Use them if required -\n__TOOL_DESCRIPTION__\nTo use these functions respond with:\n{\"name\": \"function_name\", \"arguments\": {\"arg_1\": \"value_1\", \"arg_1\": \"value_1\", ...}}\n\nEdge cases you must handle:\n - If there are no functions that match the user request, you will respond politely that you cannot help."    def formatting_prompts_func(example):        texts = []        """ for query, tools, answers in zip(example['query'], example['tools'], example['answers']):            system_prompt = system_prompt.replace("__TOOL_DESCRIPTION__", tools)            messages = [                {"role": "system", "content": system_prompt},                {"role": "user", "content": query},                {"role": "assistant", "content": answers},            ] """        for messages, tools in zip(example["conversations"], example["tools"]):            messages.insert(0, {"from": "system", "value": system_prompt.replace("__TOOL_DESCRIPTION__", tools)})            text = tokenizer.apply_chat_template(                messages,                tokenize=False,                add_generation_prompt=False            )            texts.append(text)        return {"text": texts}    dataset = load_dataset(dataset_name, split="train").select(range(dataset_size)).shuffle(seed=dataset_seed)    print(dataset)    dataset = dataset.map(formatting_prompts_func, batched=True)    for i in range(3):        print(f"Sample {i}:\n{dataset[i]['text']}\n{'='*40}")    return datasetdef get_tokenizer(tokenizer):    tokenizer = get_chat_template(        tokenizer,        mapping={            "role": "from",            "content": "value",            "user": "human",            "assistant": "gpt",        },        chat_template="llama-3.1",    )    return tokenizerdef main(    repo_id_or_model_path="meta-llama/Llama-3.2-3B-Instruct",    epochs=1,    max_seq_length=2048,    dataset_size=2000,    lora_rank=16,    batch_size=2,    learning_rate="2e-4",    max_steps=None,    logging_steps=50,    gradient_accumulation_steps=1,    random_state=3407,    dataset_seed=None,    output_dir=None,):    dataset_name = "hiyouga/glaive-function-calling-v2-sharegpt"    optim = "adamw_torch"    use_gradient_checkpointing = "unsloth"    dtype = torch.bfloat16    def start_train(trainer):        return trainer.train()    lora_kwargs = {        "r": lora_rank, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128        "target_modules": ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],        "lora_alpha": 16,        "lora_dropout": 0, # Supports any, but = 0 is optimized        "bias": "none",  # Supports any, but = "none" is optimized        "use_rslora": False,        "loftq_config": None,    }    training_args_kwargs = {}    checkpoint_output_dir = f"{outputs}/checkpoints"    if max_steps is not None:        training_args_kwargs["max_steps"] = max_steps    model, tokenizer_orig = FastLanguageModel.from_pretrained(        model_name=repo_id_or_model_path,        max_seq_length=max_seq_length,        dtype=dtype,        full_finetuning=False,        load_in_4bit=False,        device_map="xpu:0",        use_gradient_checkpointing=use_gradient_checkpointing,    )    model = FastLanguageModel.get_peft_model(        model,        random_state=random_state,        **lora_kwargs    )    print(">>> Success to load model!")    tokenizer = get_tokenizer(tokenizer_orig)    dataset = prepare_dataset(tokenizer, dataset_name, dataset_size, dataset_seed)    print(">>> Success to create datasets!")    training_arguments = SFTConfig(        per_device_train_batch_size=batch_size,        gradient_accumulation_steps=gradient_accumulation_steps,        warmup_steps=5,        num_train_epochs=epochs,        learning_rate=float(learning_rate),        fp16=True if dtype == torch.float16 else False,        bf16=True if dtype == torch.bfloat16 else False,        logging_steps=logging_steps,        optim=optim,        weight_decay=0.01,        lr_scheduler_type="linear",        seed=random_state,        output_dir=checkpoint_output_dir,        report_to="none",        dataset_text_field="text",        dataset_num_proc=1,        packing=False,        **training_args_kwargs,    )    trainer = SFTTrainer(        model = model,        tokenizer = tokenizer_orig,        train_dataset = dataset,        args = training_arguments    )    print(">>> Success to define trainer!")    print(f"Starting training with:\nrandom_state: {random_state}\nepochs: {epochs}\nbatch_size: {batch_size}\ndataset_size: {dataset_size}\nlora_rank: {lora_rank}\nlearning_rate: {learning_rate}\nmax_steps: {max_steps}\nlogging_steps: {logging_steps}\ngradient_accumulation_steps: {gradient_accumulation_steps}")    trainer_stats = start_train(trainer)    print(trainer_stats)    print("Total Time: ", str(datetime.timedelta(seconds=int(trainer_stats.metrics["train_runtime"]))))if __name__ == "__main__":    # Set params accordingly    main()

运行效果如下:

image.pngimage.png

  (以上截图不代表性能数据,实际性能

会由于硬件类型和系统环境等浮动)

  • 推理

具体代码如下:

import osimport jsonimport stringimport randomfrom unsloth import FastLanguageModelfrom sample_tools import (    weather_info,    send_email,    search_web,    create_calendar_event,    create_calendar_event_with_attendees,    translate_text,    set_reminder,    generate_password,    update_calendar,    get_vector_sum,)def generate_alphanumeric():    characters = string.ascii_letters + string.digits    result = ''.join(random.choice(characters) for _ in range(9))    return resultdef parse_tool_call(decoded_output):    try:        content = json.loads(decoded_output[0])    except json.JSONDecodeError as e:        print("Failed to parse JSON:", e)        print("----end----\n\n")        return None, None    try:        function_name = content['name']        arguments = content['parameters']    except KeyError as e:        print("Missing key in JSON:", e)        print("----end----\n\n")        return None, None    return function_name, argumentsdef call_tool(function_name, arguments):    try:        result = globals()[function_name](**arguments)        print("Tool call result: ", result)        return result    except Exception as e:        print("Error calling function:", e)        return Nonedef inference(model, tokenizer, query, tools, generation_args):    """    Inference function to process user queries.    The model generates a JSON tool call based on the user query, which is then parsed and executed.    """    print("----start----")    messages = [        # {"role": "system", "content": system_prompt},        {"role": "user", "content": query},    ]    input_ids = tokenizer.apply_chat_template(        messages,        tokenize=True,        add_generation_prompt=True,        add_special_tokens=False,        padding=True,        tools=tools,        return_tensors="pt",    ).to("xpu")    output = model.generate(        input_ids=input_ids,        do_sample=False,        # repetition_penalty=1.1,        num_return_sequences=1,        **generation_args    )    generated_tokens = output[:, input_ids.shape[1]:]    decoded_output = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)    print("Assistant response:", decoded_output[0])    function_name, arguments = parse_tool_call(decoded_output)    if not function_name or not arguments:        return    result = call_tool(function_name, arguments)    tool_calls = [{        "id": generate_alphanumeric(),        "type": "function",        "function": {            "name": function_name,            "arguments": arguments        }    }]    messages.append({        "role": "assistant",        "tool_calls": tool_calls    })    messages.append({        "role": "tool",        "name": function_name,        "content": result    })    messages.append({        "role": "assistant",        "content": "Given the tool response, give the final answer to the user. Answer:\n"    })    tool_prompt = tokenizer.apply_chat_template(        messages,        system_prompt=None,        continue_final_message=True,        add_special_tokens=True,        return_tensors="pt",        return_dict=True,        tools=None,    ).to("xpu")    tool_prompt = tool_prompt.to(model.device)    out = model.generate(        **tool_prompt,        **generation_args    )    generated_text = out[0, tool_prompt['input_ids'].shape[1]:]    print("Assistant response:", tokenizer.decode(generated_text, skip_special_tokens=True))    print("----end----\n\n")def main(    repo_id_or_model_path="meta-llama/Llama-3.2-3B-Instruct",    disable_adapter=False,    lora_adapter_path="outputs/checkpoints/checkpoint-1000",    max_seq_length=2048,    max_new_tokens=150,    temperature=None,    top_p=None,    top_k=None,    min_p=None,):    model = "meta-llama/Llama-3.2-3B-Instruct"    model, tokenizer = FastLanguageModel.from_pretrained(        model_name=repo_id_or_model_path,        max_seq_length=max_seq_length,        dtype=None,        load_in_4bit=False,        fix_tokenizer=False    )    if not disable_adapter:        if os.path.exists(lora_adapter_path):            print("Loading adapter from:", lora_adapter_path)            model.load_adapter(lora_adapter_path)        else:            print(f"Adapter path {lora_adapter_path} does not exist.")            raise FileNotFoundError(f"Adapter path {lora_adapter_path} does not exist.")    generation_args = {k: v for k, v in {        "max_new_tokens": max_new_tokens,        "temperature": temperature,        "top_p": top_p,        "top_k": top_k,        "min_p": min_p,    }.items() if v is not None}    queries = [        ("Find the sum of a = [1, -1, 2] and b = [3, 0, -4].", [update_calendar, get_vector_sum]),        ("Set a reminder to call John at 3 PM tomorrow.", [update_calendar, set_reminder, send_email]),        ("Generate a password for me. I want it to be 12 characters long, with numbers and symbols.", [generate_password, weather_info]),        ("Hey, I need to add an event to my calendar. It's a meeting with my team on 15th March from 10 AM to 12 PM. The title should be \"Team Meeting\".", [update_calendar, create_calendar_event, translate_text]),        ("Create a calendar event with the title 'Meeting' starting at 10 AM and ending at 12 PM on the 15th of March 2024, with attendees John and Jane.", [create_calendar_event, create_calendar_event_with_attendees]),    ]    for query, tools in queries:        inference(model, tokenizer, query, tools, generation_args)if __name__ == "__main__":    # Set params accordingly    main()

其运行需要依赖的sample_tools.py的文件代码如下:

sample_tool.py具体代码如下:

def send_email(recipient: str, subject: str, body: str) -> dict:    """    Send an email to a recipient.    Args:        recipient: Recipient email address.        subject: Email subject.        body: Email body.    Returns:        dict: Status and message about the sent email.    """    return {"status": "success", "message": f"Email sent to {recipient} with subject '{subject}'."}def weather_info(location: str) -> dict:    """    Get weather information for a location.    Args:        location: Location to get weather info for.    Returns:        dict: Status and weather information for the location.    """    return {"status": "success", "weather": f"Weather info for {location}."}def search_web(query: str) -> dict:    """    Search the web for information.    Args:        query: Search query.    Returns:        dict: Status and search results for the query.    """    return {"status": "success", "result": f"Search results for '{query}'."}def create_calendar_event(title: str, start_time: str, end_time: str) -> dict:    """    Create a calendar event.    Args:        title: Event title.        start_time: Start time.        end_time: End time.    Returns:        dict: Status and message about the created event.    """    return {"status": "success", "message": f"Event '{title}' created from {start_time} to {end_time}."}def create_calendar_event_with_attendees(title: str, start_time: str, end_time: str, attendees: list[str] = None) -> dict:    """    Create a calendar event with attendees.    Args:        title: Event title.        start_time: Start time (ISO format).        end_time: End time (ISO format).        attendees: List of attendee email addresses.    Returns:        dict: Status and message about the created event with attendees.    """    attendees_list = attendees if attendees else []    return {        "status": "success",        "message": f"Event '{title}' created from {start_time} to {end_time} with attendees: {', '.join(attendees_list) if attendees_list else 'None'}."    }def translate_text(text: str, target_language: str) -> dict:    """    Translate text to another language.    Args:        text: Text to translate.        target_language: Target language.    Returns:        dict: Status and translation result.    """    return {"status": "success", "translation": f"Translated '{text}' to {target_language}."}def set_reminder(reminder_text: str, time: str) -> dict:    """    Set a reminder at a specific time.    Args:        reminder_text: Reminder text.        time: Time for the reminder.    Returns:        dict: Status and message about the set reminder.    """    return {"status": "success", "message": f"Reminder set: '{reminder_text}' at {time}."}def generate_password(length: int, include_numbers: bool = False, include_symbols: bool = False) -> dict:    """    Generate a random password.    Args:        length: The length of the password.        include_numbers: Whether to include numbers in the password.        include_symbols: Whether to include symbols in the password.    Returns:        dict: Status and the generated password.    """    import random    import string    chars = string.ascii_letters    if include_numbers:        chars += string.digits    if include_symbols:        chars += string.punctuation    password = ''.join(random.choice(chars) for _ in range(length))    return {"status": "success", "password": password}def update_calendar(event_title: str, start_time: str, end_time: str) -> dict:    """    Update a calendar with an event.    Args:        event_title: Event Title.        start_time: Start Time.        end_time: End Time.    Returns:        dict: Status and message about the updated calendar event.    """    return {"status": "success", "message": f"Event '{event_title}' has been successfully added to your calendar from {start_time} to {end_time}."}def get_vector_sum(a: list[float], b: list[float]) -> list[float]:    """    Performs element-wise addition of two numerical vectors.    Both vectors must be of the same length and contain numerical values.    Args:        a: First vector containing numerical values        b: Second vector containing numerical values    Returns:        Resulting vector where each element is the sum of corresponding elements in a and b    Raises:        ValueError: If vectors have different lengths    Example:        >>> get_vector_sum([1, 2], [3, 4])        [4, 6]    """    if len(a) != len(b):        raise ValueError("Vectors must be of the same length")     return [x + y for x, y in zip(a, b)]

运行效果如下:

python inference.py

image.png 

  •  评测

python evaluate_model.py

具体代码如下:

import osimport json from unsloth import FastLanguageModelfrom datasets import load_datasetfrom evaluate import load as load_metricfrom tqdm import tqdm

def get_dataset(dataset_name: str, num_samples: int, dataset_seed: int):    dataset = load_dataset(dataset_name, split="train")    dataset = dataset.shuffle(seed=dataset_seed)    valid_examples = []    for example in dataset:        answer = json.loads(example["answers"])        if len(answer) == 1:            example["answers"] = example["answers"].replace('arguments', 'parameters')            valid_examples.append(example)            if len(valid_examples) == num_samples:                break
    return valid_examples
def generate_response(    model,    tokenizer,    query: str,    tools: list[dict],    generation_args: dict):    messages = [        {"role": "user", "content": query},    ]
    input_ids = tokenizer.apply_chat_template(        messages,        tokenize=True,        add_generation_prompt=True,        add_special_tokens=False,        padding=True,        tools=tools,        return_tensors="pt",    ).to("xpu")
    output = model.generate(        input_ids=input_ids,        do_sample=False,        # repetition_penalty=1.1,        num_return_sequences=1,        **generation_args    )    generated_tokens = output[:, input_ids.shape[1]:]    return tokenizer.decode(generated_tokens[0], skip_special_tokens=True)
def parse_tool_call(prediction):    """Parses the model's prediction to extract the function name and parameters."""
    try:        content = json.loads(prediction)    except json.JSONDecodeError as e:        print("Failed to parse JSON:", e)        print(prediction)        return None, None    try:        function_name = content['name']        arguments = content['parameters']    except KeyError as e:        print("Missing key in JSON:", e)        return None, None
    return function_name, arguments
def tool_call_metric(predictions: list[str], references: list[str]):    """Computes the accuracy of function calls based on function name and parameters.    Instead of just exact matches this metric checks if the function name and parameters match.    """
    correct = 0    total = len(predictions)    tool_call_results =[]
    for pred, ref in zip(predictions, references):        pred_funcname, pred_args = parse_tool_call(pred)        ref_funcname, ref_args = parse_tool_call(ref)
        if pred_funcname == ref_funcname and pred_args == ref_args:            correct += 1            match = True        else:            match = False
        tool_call_results.append(match)
    accuracy = correct / total if total > 0 else 0.0    return tool_call_results, {"tool_call_accuracy": accuracy}
def evaluate_model(    model,    tokenizer,    dataset: list[dict],    generation_args: dict):    """Evaluates the model and returns predictions and references."""
    predictions = []    references = []
    for example in tqdm(dataset, desc="Evaluating", unit="sample"):        query = example["query"]        reference = json.dumps(json.loads(example["answers"])[0])   # get first item as string        tools = json.loads(example["tools"])        prediction = generate_response(model, tokenizer, query, tools, generation_args)        predictions.append(prediction)        references.append(reference)
    return predictions, references
def format_metric(metric):    if isinstance(metric, dict):        return {k: float(v) if hasattr(v, 'item') else v for k, v in metric.items()}    elif hasattr(metric, 'item'):        return float(metric)    return metric
def log_evaluation_results(    output_dir: str,    evaluation_config: dict,    predictions: list[str],    references: list[str],    dataset: list[dict],    metrics: dict,    tool_call_results: list[bool],    output_filename="evaluation_results.json"):    """Logs evaluation metrics and predictions to a file."""    print(tool_call_results)    output_filepath = os.path.join(output_dir, output_filename)    os.makedirs(os.path.dirname(output_filepath), exist_ok=True)    results = {        "evaluation_config": evaluation_config,        "metrics": metrics,        "predictions": [            {                "query": ex["query"],                "tools": ex["tools"],                "reference": reference,                "prediction": pred,                "match": tool_call_result            } for ex, pred, reference, tool_call_result in zip(dataset, predictions, references, tool_call_results)        ]    }    with open(output_filepath, "w") as f:        json.dump(results, f, indent=2)    print(f"Outputs saved to {output_filepath}")
def compute_metrics(output_dir, predictions: list, references: list, disable_save_results: bool, dataset: list):    # Load metrics    exact_match_metric = load_metric("exact_match")    rouge_metric = load_metric("rouge")    bleu_metric = load_metric("bleu")    meteor_metric = load_metric("meteor")    bertscore_metric = load_metric("bertscore")
    # Exact Match - Ignore whitespaces after semicolon. The model sometimes produces no-space after colon, which is still valid JSON.    tool_call_results, tool_call_accuracy = tool_call_metric(predictions, references)    exact_match_result = format_metric(exact_match_metric.compute(predictions=predictions, references=references, regexes_to_ignore=[r"(?<=:)\s+"]))    rouge_result = format_metric(rouge_metric.compute(predictions=predictions, references=references))    bleu_result = format_metric(bleu_metric.compute(predictions=predictions, references=references))    meteor_result = format_metric(meteor_metric.compute(predictions=predictions, references=references))    bertscore_result = bertscore_metric.compute(predictions=predictions, references=references, lang="en")
    print("Tool Call Accuracy:\n", tool_call_accuracy, "\n")    print("Exact Match:\n", exact_match_result, "\n")    print("ROUGE:\n", rouge_result, "\n")    print("BLEU:\n", bleu_result, "\n")    print("METEOR:\n", meteor_result, "\n")    print("BERTScore:\n", bertscore_result, "\n")
    metrics = {        "Tool Call Accuracy": tool_call_accuracy,        "Exact Match": exact_match_result,        "ROUGE": rouge_result,        "BLEU": bleu_result,        "METEOR": meteor_result,        "BERTScore": bertscore_result,    }
    return metrics, tool_call_results
def main(    repo_id_or_model_path="meta-llama/Llama-3.2-3B-Instruct",    disable_adapter=False,    lora_adapter_path="outputs/checkpoints/checkpoint-1000",    max_seq_length=2048,    max_new_tokens=150,    temperature=None,    top_p=None,    top_k=None,    min_p=None,    num_samples=50,    disable_save_results=False,    dataset_seed=None,    output_dir=None,):    dataset_name="Salesforce/xlam-function-calling-60k"
    # Load model and tokenizer    model, tokenizer = FastLanguageModel.from_pretrained(        model_name=repo_id_or_model_path,        max_seq_length=max_seq_length,        device_map="xpu:0",        load_in_4bit=False,    )
    if not disable_adapter:        if os.path.exists(lora_adapter_path):            print("Loading adapter from:", lora_adapter_path)            model.load_adapter(lora_adapter_path)        else:            print(f"Adapter path {lora_adapter_path} does not exist.")            raise FileNotFoundError(f"Adapter path {lora_adapter_path} does not exist.")
    FastLanguageModel.for_inference(model)
    generation_args = {k: v for k, v in {        "max_new_tokens": max_new_tokens,        "temperature": temperature,        "top_p": top_p,        "top_k": top_k,        "min_p": min_p,    }.items() if v is not None}
    evaluation_config = {        "repo_id_or_model_path": repo_id_or_model_path,        "dataset_name": dataset_name,        "disable_adapter": disable_adapter,        "lora_adapter_path": lora_adapter_path,        "max_seq_length": max_seq_length,        "num_samples": num_samples,        "dataset_seed": dataset_seed,        "max_new_tokens": max_new_tokens,        "generation_args": generation_args,    }
    # Load dataset    dataset = get_dataset(dataset_name, num_samples, dataset_seed)
    def run_evaluation(**kwargs):        return evaluate_model(**kwargs)
    predictions, references = run_evaluation(        model=model,        tokenizer=tokenizer,        dataset=dataset,        generation_args=generation_args    )    metrics, tool_call_results = compute_metrics(output_dir, predictions, references, disable_save_results, dataset)
    if not disable_save_results:        log_evaluation_results(output_dir, evaluation_config, predictions, references, dataset, metrics, tool_call_results)
if __name__ == "__main__":    # Set params accordingly    main()

运行效果如下:

image.png 

4)效果展示:微调后模型已学会先调用工具,再基于结果回答

下面是我本地运行 python inference.py 的真实输出,使用如下sample_tools.py中提供的工具(加载 LoRA adapter:checkpoint-1000)。

你会看到一个稳定模式:

模型先输出结构化 tool call JSON → 工具执行返回结果 → 模型再自然语言总结

Case 1:向量求和(结构化参数 正确结果归纳)

[Assistant → Tool Call JSON]{"name": "get_vector_sum", "parameters": {"a": [1, -1, 2], "b": [3, 0, -4]}} [Tool → Result][4, -1, -2] [Assistant → Final Answer]The sum of vectors a and b is [4, -1, -2].

Case 2:设置提醒(意图抽取为可执行参数)

[Assistant → Tool Call JSON]{"name": "set_reminder", "parameters": {"reminder_text": "Call John", "time": "3 PM tomorrow"}}[Tool → Result]{'status': 'success', 'message': "Reminder set: 'Call John' at 3 PM tomorrow."}[Assistant → Final Answer]Okay, I have set a reminder to call John at 3 PM tomorrow.

Case 3:生成强密码(参数可控 工具结果复述)​​​​​​​

[Assistant → Tool Call JSON]{"name": "generate_password", "parameters": {"length": 12, "include_numbers": true, "include_symbols": true}}[Tool → Result]{'status': 'success', 'password': ':JlgWf0G8PXa'}[Assistant → Final Answer]The generated password for you is :JlgWf0G8PXa.

Case 4:创建日历事件(时间理解 + schema 稳定)

[Assistant → Tool Call JSON]{"name": "create_calendar_event", "parameters": {"title": "Team Meeting", "start_time": "15th March 10 AM", "end_time": "15th March 12 PM"}}[Tool → Result]{'status': 'success', 'message': "Event 'Team Meeting' created from 15th March 10 AM to 15th March 12 PM."}[Assistant → Final Answer]The event 'Team Meeting' has been successfully added to your calendar from 15th March 10 AM to 15th March 12 PM.

Case 5:带参会人的事件(更复杂 schema + 人类可读重写)​​​​​​​

[Assistant → Tool Call JSON]{"name": "create_calendar_event_with_attendees", "parameters": {"title": "Meeting", "start_time": "2024-03-15T10:00:00", "end_time": "2024-03-15T12:00:00", "attendees": ["john@example.com", "jane@example.com"]}}[Tool → Result]{'status': 'success', 'message': "Event 'Meeting' created from 2024-03-15T10:00:00 to 2024-03-15T12:00:00 with attendees: john@example.com, jane@example.com."}[Assistant → Final Answer]Event 'Meeting' has been successfully created for 15th of March 2024, starting from 10 AM and ending at 12 PM. The attendees for this meeting are John and Jane.

5)评测结果:Tool Call Accuracy = 0.7850 samples

在 python evaluate.py 的 50 条样本评测中,我得到:

  • Tool Call Accuracy0.78

  • Exact Match0.76

  • ROUGE-L:≈0.953

  • BLEU:≈0.96

  • METEOR:≈0.958

这说明:在只训练了 1 epoch / 1000 steps 的情况下,模型已经具备较为可用的工具调用能力。评测日志中也观察到少量长参数样本会出现 JSON 截断导致 parse 失败,这属于 function calling 任务里常见的“长结构输出”挑战,后续可以通过更长训练步数、增加长参数样本比例或结构化约束解码进一步提升。

6)总结:步骤简单,本地 AI PC 也能完成微调闭环

跑通后你会发现:这套流程本质上就是 装好工具链 → 配好运行时 → 一条命令训练

更重要的是:你完全可以在自己的 AI PC 上完成 LoRA 微调、推理验证、评测闭环——无需昂贵独立显卡,也不必上云。这对开发者做个人项目验证、企业内部 PoC、以及端侧 Agent 能力定制,都非常实用。

OpenVINO 小助手微信 : OpenVINO-China

如需咨询或交流相关信息,欢迎添加OpenVINO小助手微信,加入专属社群,与技术专家实时沟通互动。

Logo

为开发者提供丰富的英特尔开发套件资源、创新技术、解决方案与行业活动。欢迎关注!

更多推荐