前言

跑通一个 Spring ai 应用用到了 ChatClient ，这篇文章会解析 ‘ChatClient’ 的完整用法，这个 API 是整个模块的基础，后面全部都会用到它。

ChatClient 和 ChatModel 的关系

在 Spring AI 中有两个东西比较容易混：

ChatClient
ChatModel

ChatModel

ChatModel 是底层接口，负责和模型 API 打交道：

public interface ChatModel extends Model<Prompt, ChatResponse>, StreamingChatModel {
    default String call(String message) {
        Prompt prompt = new Prompt(new UserMessage(message));
        Generation generation = this.call(prompt).getResult();
        return generation != null ? generation.getOutput().getText() : "";
    }

    default String call(Message... messages) {
        Prompt prompt = new Prompt(Arrays.asList(messages));
        Generation generation = this.call(prompt).getResult();
        return generation != null ? generation.getOutput().getText() : "";
    }

    ChatResponse call(Prompt prompt);

    default ChatOptions getDefaultOptions() {
        return ChatOptions.builder().build();
    }

    default Flux<ChatResponse> stream(Prompt prompt) {
        throw new UnsupportedOperationException("streaming is not supported");
    }
}

调用 ChatModel 需要自己构造 Prompt 对象，处理 ChatResponse ，稍微繁琐一些。

ChatClient

ChatClient 是对 ChatModel 的高层封装，提供流畅的链式 API，代码比较多就不贴出来了。

总结

日常业务代码 ChatClient，需要细颗粒度控制（比如获取 Token 用量，拿原始的响应）事可以使用 ChatModel。

创建 ChatClient

利用 Builder 创建（优选）

com/berial/springai/controller/SimpleChatController.java：

package com.berial.springai.controller;

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;

@RestController
@RequestMapping("/simpleChat")
public class SimpleChatController {

    private final ChatClient chatClient;

    public SimpleChatController(ChatClient.Builder builder) {
        this.chatClient = builder.build();
    }
    @GetMapping
    public String chat(@RequestParam String message) {
        return chatClient.prompt().user(message).call().content();
    }

}

含有默认 System Prompt 的 ChatClient

如果希望 ChatClient 在调用的时候都带有一个固定的 System Prompt （例如给他一个角色设定），可以在 Builder 的时候就将它设置好：

com/berial/springai/config/ChatClientConfig.java：

package com.berial.springai.config;

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class ChatClientConfig {

    @Bean
    public ChatClient customerServiceChatClient(ChatClient.Builder builder) {
        return builder.defaultSystem("你是一个专业的CTF专家，回答简介友好，遇到不确定的问题要如实说不知道，不要编造答案。").build();
    }

    @Bean
    public ChatClient codingChatClient(ChatClient.Builder builder) {
        return builder.defaultSystem("你是一个 Java 技术专家，代码示例使用 Java 21 语法，优先推荐 Spring boot 方案。").build();
    }
}

在不同的场景下，可以注册不同的 ChatClient Bean，用 @Qualifier 区分：

com/berial/springai/controller/MultiChatController.java：

package com.berial.springai.controller;

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;

@RestController
@RequestMapping("/multi-chat")
public class MultiChatController {

    private final ChatClient customServiceChatClient;
    private final ChatClient codingChatClient;

    public MultiChatController(@Qualifier("customerServiceChatClient") ChatClient customServiceChatClient,
                               @Qualifier("codingChatClient") ChatClient codingChatClient) {
        this.customServiceChatClient = customServiceChatClient;
        this.codingChatClient = codingChatClient;
    }

    @GetMapping("/service")
    public String service(@RequestParam String message) {
        return customServiceChatClient.prompt()
                .user(message)
                .call()
                .content();
    }

    @GetMapping("/code")
    public String code(@RequestParam String message) {
        return codingChatClient.prompt()
                .user(message)
                .call()
                .content();
    }
}

Prompt 构建（user、system、messages）

假如我们利用这三种方法来进行“面试出题”，对比下效果；

代码如下：

com/berial/springai/controller/PromptDemoController.java：

package com.berial.springai.controller;

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;

@RestController
@RequestMapping("/prompt-demo")
public class PromptDemoController {
    private final ChatClient chatClient;

    public PromptDemoController(ChatClient.Builder builder) {
        this.chatClient = builder.build();
    }
    /*方式一：只有user消息
    * 没有任何约束，模型自由发挥
    * GET /prompt-demo.simple?message=PWN
    * 模型可能结束概念，也可能出题，行为不可控
    */
    @GetMapping("/simple")
    public String simple(@RequestParam String message) {
        return chatClient.prompt().user(message).call().content();
    }
    /*方式二：user + system 消息
    * system 固定了模型角色，输出风格稳定
    * GET /prompt-demo/with-system?message=PWN
    * 模型一定会用提问的方式来考你，不会跑偏
     */
    @GetMapping("/with-system")
    public String withSystem(@RequestParam String message) {
        return chatClient.prompt().system("你是一个面试官，用提问的方式检验候选人对知识的掌握程度。只出题，不给答案。")
                .user(message).call().content();
    }
    /*方式三：动态模版变量
    * 用一套 Prompt 模版，通过参数控制输出方向
     * GET /prompt-demo/template?topic=PWN&difficulty=初级
     * GET /prompt-demo/template?topic=PWN&difficulty=高级
     * 一个接口覆盖所有主题和难度组合
     */
    @GetMapping("/template")
    public String template(@RequestParam String topic,
                           @RequestParam(defaultValue = "中级") String difficulty) {
        return chatClient.prompt()
                .user(u -> u.text("请出一道关于 {topic} 的 {difficulty} 难度的 PWN 面试题，只出题，不给答案。")
                        .param("topic", topic)
                        .param("difficulty", difficulty))
                .call().content();
    }
}

我们对每种方式都发“Pwn”，三个接口给出不同的回答：

user

GET http://localhost:8080/prompt-demo/simple?message=PWN

HTTP/1.1 200 
Content-Type: text/plain;charset=UTF-8
Content-Length: 1071
Date: Fri, 20 Mar 2026 11:05:28 GMT

"PWN" is an internet slang term that originated from the gaming and hacking communities. It is a shortened form of "own," which is used to indicate domination or complete control over an opponent or situation. In the context of hacking or cybersecurity, "pwn" can refer to the act of compromising a system or gaining unauthorized access to a computer network or system. However, it's important to note that such activities are illegal and unethical.

In gaming contexts, it can mean that a player has completely outplayed or outsmarted their opponent, often in a dominant and unchallenged manner. For example, if a player is able to beat another player's character to death without much resistance, they might say "I pwned them."

It's crucial to use terms like "pwn" responsibly and ethically, and to understand that unauthorized access to computer systems or networks is illegal. If you are interested in cybersecurity, it's advisable to pursue legitimate and ethical means to improve your knowledge and skills, such as through cybersecurity courses and certifications.

Response code: 200; Time: 10710ms (10 s 710 ms); Content length: 1071 bytes (1.07 kB)

system

GET http://localhost:8080/prompt-demo/with-system?message=PWN

HTTP/1.1 200 
Content-Type: text/plain;charset=UTF-8
Content-Length: 174
Date: Fri, 20 Mar 2026 11:06:59 GMT

你能解释一下PWN在网络安全领域中的含义吗？你有没有参与过利用PWN技术进行的安全研究或项目？如果有，能分享一下你的经验吗？

Response code: 200; Time: 2050ms (2 s 50 ms); Content length: 62 bytes (62 B)

messages

GET http://localhost:8080/prompt-demo/template?topic=%E9%AB%98%E7%BA%A7&difficulty=PWN

HTTP/1.1 200 
Content-Type: text/plain;charset=UTF-8
Content-Length: 905
Date: Fri, 20 Mar 2026 11:08:11 GMT

好的，以下是一道关于高级PWN难度的PWN面试题：

### 题目描述

你被赋予一个疑似存在堆溢出漏洞的C语言程序，该程序在某些情况下会崩溃。该程序的源代码如下：

```c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void vuln_function(char *input) {
    char buffer[64];
    strcpy(buffer, input);
}

int main(int argc, char **argv) {
    if (argc > 1) {
        vuln_function(argv[1]);
    }
    printf("Program ends normally\n");
    return 0;
}
```

### 问题

1. **如何利用此程序执行任意代码？**
2. **如何绕过常见的栈保护机制（如栈溢出保护（SSP）和地址空间布局随机化（ASLR））？**
3. **假设该程序运行在支持 PIE（Position-Independent Executable）的环境中，如何利用此漏洞进行远程代码执行（RCE）？**

请详细阐述你的攻击思路和具体步骤。

Response code: 200; Time: 7509ms (7 s 509 ms); Content length: 565 bytes (565 B)

手动构造多轮对话消息列表

为什么要手动构造消息列表呢？

大模型本身是无状态的，即每次调用都是一次全新的请求，他完全不记得上一轮说了什么。假如我们问它“它和 Web 有什么区别”，如果没有上下文，它根本不知道“它”代表的是什么。

让模型“记住”对话的方式只有一个，即把历史对话一起发过去。每次请求都要把完整的对话历史打包进来：

第一轮发送：
	[system: 你是一个Pwn解题助手] [user: 栈溢出怎么解决]
第二轮发送：
	[system: 你是一个Pwn解题助手] [user: 栈溢出怎么解决] [assistant: 可以通过覆盖返回地址...] [user: 它和堆溢出有什么区别]
	
	
assistant ——> 上一轮对话历史

模型拿到这个消息列表，才能理解“它”是指“栈溢出”，给出正确的答案。

在消息列表中，有三种角色：

SystemMessage：给模型的“幕后指令”，用户看不到，用来设定角色和规则；
UserMessage：用户说的话；
AssistantMessage：模型上一轮的回复（构造历史时用）

com/berial/springai/controller/MessageDemoController.java：

package com.berial.springai.controller;

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.messages.AssistantMessage;
import org.springframework.ai.chat.messages.Message;
import org.springframework.ai.chat.messages.SystemMessage;
import org.springframework.ai.chat.messages.UserMessage;
import org.springframework.web.bind.annotation.*;

import java.util.List;

@RestController
@RequestMapping("/message-demo")
public class MessageDemoController {
    private final ChatClient chatClient;

    public MessageDemoController(ChatClient.Builder builder) {
        this.chatClient = builder.build();
    }

    @PostMapping
    public String chat(@RequestBody ChatHistroyRequest request) {
        List<Message> messages = List.of(
                new SystemMessage("你是一个 Pwn 解题助手"),
                new UserMessage("什么是栈溢出"),
                new AssistantMessage(request.lastAssistantReply()),
                new UserMessage(request.currentQuestion())
        );

        return chatClient.prompt().messages(messages).call().content();
    }

    record ChatHistroyRequest(String lastAssistantReply, String currentQuestion) {}

}

注意

这段只是演示底层原理，在实际项目中并不会手动管理这个消息列表。后面会用 ChatMemory，自动维护历史消息、存储、截断、注入全部自动处理，不需要自己拼。

调用方式：call VS stream

这个其实就是我们在使用大模型时，模型回复的SSE打字机效果的区别；

stream 流式调用返回 Flux<String>；

以下是包含两种方式的代码，同时也给出了可以拿到包含 Token 用量的完整响应的同步调用方式；

com/berial/springai/controller/CallDemoController.java：

package com.berial.springai.controller;

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.metadata.Usage;
import org.springframework.ai.chat.model.ChatResponse;
import org.springframework.boot.autoconfigure.http.HttpMessageConverters;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import reactor.core.publisher.Flux;

@RestController
@RequestMapping("/call-demo")
public class CallDemoController {
    private final ChatClient chatClient;
    private final HttpMessageConverters messageConverters;

    public CallDemoController(ChatClient.Builder builder, HttpMessageConverters messageConverters) {
        this.chatClient = builder.build();
        this.messageConverters = messageConverters;
    }

    // 同步调用——拿文本
    @GetMapping("/sync")
    public String sync(@RequestParam String message) {
        return chatClient.prompt()
                .user(message).call().content();
    }

    // 同步调用——拿完整响应 （含 Token 用量）
    @GetMapping("token-usage")
    public TokenUsageResponse tokenUsage(@RequestParam String message) {
        ChatResponse response = chatClient.prompt()
                .user(message).call().chatResponse();

        Usage usage = response.getMetadata().getUsage();
        return new TokenUsageResponse(
                response.getResult().getOutput().getText(),
                usage.getPromptTokens(),    // 输入 token
                usage.getCompletionTokens(),// 输出 token
                usage.getTotalTokens()  // 合计
        );
    }

    // 流式调用——返回 Flux<>String，适合 SSE 打字机效果
    @GetMapping("/stream")
    public Flux<String> stream(@RequestParam String message) {
        return chatClient.prompt()
                .user(message).stream().content();
    }

    record TokenUsageResponse(String content, Integer inputTokens,
                              Integer outputTokens, Integer totalTokens) {}
}

模型参数配置

在配置文件里设置默认参数

ollama

pom.xml：

<dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-starter-model-ollama</artifactId>
    </dependency>

src/main/resources/application.yml：

spring:
  ai:
    ollama:
      base-url: http://localhost:11434
      chat:
        options:
          model: qwen2.5:7b
          temperature: 0.7

Openai