Chapter 8 : AI Agents Advanced
Models - LLMs
Models or specifically Large Language Models (LLMs) power AI capabilities. Its recommended to use Hybrid Model Workflow meaning using Multi-LLM Workflows.
How to Choose Models
For Building Agents
As per the budget - high to medium,
- Prefer premium, heavyweight reasoning models like Claude Opus
- Alternative consider using Claude Sonnet 4.6 or similar, which are Cost-efficient, rapid production workhorse.
- Last resort, Use models like Claude Haiku 4.5 or similar GPT-5 Mini, which are cost-efficient workhorses.
Lower end models will not give good response for tasks like:
- From table schema
- Identify Sensitive Data Columns, create view scripts.
- Ask AI What Insights and Trends Can Get?
- Ask AI What AI Workflows Can Implement
- Get Proactive Insights Workflows
For Using Agents and Chat
Example Use Case: AI Data Insights
For an agentic solution running real-time data insights, your models must have exceptional JSON/structured output capabilities, low latency, and a low cost-per-million-tokens to accommodate the high volume of text processed in chained workflows.
Example Work Load
Solution based on Spring AI agentic platform is used by multiple users, where 2 to 3 users may execute simultaneous workflows containing 4 to 5 chained AI tasks.
Recommended Models
One of the recommendation is to use OpenRouter Paid Tier. The most cost-efficient, highly reliable paid models on OpenRouter that fit this exact profile are given below.
Best for Complex Statistical Reasoning: DeepSeek R1
- Model: deepseek/deepseek-r1
- It is more expensive than DeepSeek V3
Best Overall for Data Insights: DeepSeek V3
- Model: deepseek/deepseek-chat
Other
- The models which are low-cost may not be much useful.
- Models: Google Gemini 2.5 Flash
Recommended Hybrid Model Workflow Approach
| Tasks: | Data Cleaning & Structuring | → Use DeepSeek V3 |
| Tasks: | Insight Analysis & Aggregation | → Use DeepSeek R1 or V3 |
| (Requires strong logic and strict JSON generation) | (Good intelligence at low cost) |
How to Use OpenRouter
To use OpenRouter Models,
- Set its API Key as
OPENAI_API_KEY. - Add below properties to
agent\spring-java\agentserver\src\main\resources\application.properties. - Use suffix
:freefor model free tier usage.
spring.ai.openai.base-url=https://openrouter.ai/api
# Optionally Specify model to use in the format, e.g.
#
#spring.ai.openai.chat.options.model=deepseek/deepseek-chat
#spring.ai.openai.chat.options.model=deepseek/deepseek-r1
#spring.ai.openai.chat.options.model=anthropic/claude-haiku-4.5
#
#spring.ai.openai.chat.options.model=google/gemini-2.5-flash
#spring.ai.openai.chat.options.model=qwen/qwen3-coder:free
#spring.ai.openai.chat.options.model=google/gemma-4-31b-it
#spring.ai.openai.chat.options.model=google/gemma-4-26b-a4b-it
How to Optimize Model Usage Costs
Token Calculation
Total Tokens = Prompt + Reasoning + Completion
Optimize Tokens
Data Context
- Give optimized data for LLMs to perform Best.
- Use Data Modeling and Pushdown features.
- Breakdown AI Tasks to perform data preparation by low cost models.
- And give aggregated final context to reasoning models to process final output.
In general
- Avoid giving very large context in Prompts.
- Limit agents to small MCP Tool groups (e.g., under 30 tools), which saves massive input token costs.
- Use RAG with Relevance search.
Event-Driven AI Workflows
- Use pre-built Kafka based events Pre-configured for your Data
- Use Configuration For Backend:
- Kafka Events Producer, Consumer : Include ✅
Invoke Workflows
→ From Event Trigger, invoke AI Workflow via webhook or API call.
→ From Backend Event Consumer Service, invoke AI Workflow via WebClient API call.
- See file for details: `EmKafkaConsumerService.java`
Create Events From Workflows
→ In workflows, AI models connect to Kafka via Tools to create events.
- Use pre-built kafka event APIs and event templates pre-configured for your data.
- e.g. use tool like:
- `Tool Name: event_publish_product`
RAG and Semantic Cache
RAG - Retrieval Augmented Generation
RAG Provides the LLM with knowledge context from matching documents in vectorStore.
Semantic Cache
Semantic Caching remembers previous answers to similar questions. And avoids expensive LLM calls by reusing responses for similar queries.
Agent Server Redis Configuration
Agent Server is configured to use Redis - RAG, Semantic Cache If Backend Redis Enable Selected.
- Dependencies and application.properties are set for using Redis, docker image is redis-stack-server.
Redis Vector Store
- RedisVectorStore lets developers use Redis as a high-performance vector database.
Setup and Use Redis Vector Store
RedisVectorStore - To initialize, load documents into, and use it for RAG & Semantic Cache, See sample code file provided:
agentserver\src\main\java\com\example\agentserver\RedisVectorStore.java
Customize Agent Server to implement them further.
Customize to Implement RAG and Semantic Cache
Create source files from the sample code and customize existing code to use VectorStore.
Please refer to below Spring AI doc pages:
- Retrieval Augmented Generation (RAG) : ETL Pipeline
- The ETL pipelines creates, transforms and stores Document instances into vectorStore.
- You can store various Document types (pdf, word, etc) into vectorStore
You can also initialize vectorStore by uncommenting application.property (instead of Configuration class):
spring.ai.vectorstore.redis.initialize-schema=true
Advanced Agentic patterns
Try out advanced Agentic patterns after completing tutorial "Chapter 6 : AI Agents".
Getting Started Steps
In src\main\java\com\example\emagent\app\EmAgentSpringApp.java
- Enable Run EmAgentExtra
The additional agentic patterns are called via:
src\main\java\com\example\emagent\app\EmAgentsExtra.java
In EmAgentsExtra.java:
Choose and Enable run flags for below patterns as required.
boolean runEmAgentParallel = false;
boolean runEmAgentEvaluatorOptimizerFixed = false;
boolean runEmAgentEvaluatorOptimizer = false;
boolean runEmAgentOrchestratorWorkers = false;
boolean runEmAgentRouting = false;
NOTE: All Agents come with access to MCP Tools
Agentic Pattern Parallelization
- AI Agent
EmAgentParallel - Agentic System Workflows with agentic patterns Parallelization
Agentic Pattern Evaluator-Optimizer Fixed
- AI Agent
EmAgentEvaluatorOptimizerFixed - Agentic System Workflows with agentic patterns Evaluator-Optimizer
- Fixed - means fixed number of evaluator loops.
Agentic Pattern Evaluator-Optimizer
- AI Agent
EmAgentEvaluatorOptimizer - Agentic System Workflows with agentic patterns Evaluator-Optimizer
- There is no limit on number of evaluator loops. It is performed till satifying criteria.
Agentic Pattern Orchestrator-Workers
- AI Agent
EmAgentOrchestratorWorkers - Agentic System Workflows with agentic patterns Orchestrator-Workers
Agentic Pattern Routing
- AI Agent
EmAgentRouting - Agentic System Workflows with agentic patterns Routing
Schedule AI Agent Runs
- Schedule AI Agent Runs using Kestra
- See section Agents & Dashboard : Schedule Agent Runs
- When Creating Agent Run Schedule in Kestra
- Use java jar process execute based Flow
- You can use compiled jar runs for below main program:
agent\spring-java\emagent\src\main\java\com\example\emagent\app\EmAgentSpringApp.java
Optionally Enable advanced agentic patterns in it via:
EmAgentsExtra.java
- Email Agent Run reports:
- To send an output report via email in Kestra, use the MailSend task