Skip to main content

Chapter 8 : AI Agents Advanced

Models - LLMs

Models or specifically Large Language Models (LLMs) power AI capabilities. Its recommended to use Hybrid Model Workflow meaning using Multi-LLM Workflows.

How to Choose Models

For Building Agents

As per the budget - high to medium,

  • Prefer premium, heavyweight reasoning models like Claude Opus
  • Alternative consider using Claude Sonnet 4.6 or similar, which are Cost-efficient, rapid production workhorse.
  • Last resort, Use models like Claude Haiku 4.5 or similar GPT-5 Mini, which are cost-efficient workhorses.
caution

Lower end models will not give good response for tasks like:

  • From table schema
    • Identify Sensitive Data Columns, create view scripts.
    • Ask AI What Insights and Trends Can Get?
    • Ask AI What AI Workflows Can Implement
    • Get Proactive Insights Workflows

For Using Agents and Chat

Example Use Case: AI Data Insights

For an agentic solution running real-time data insights, your models must have exceptional JSON/structured output capabilities, low latency, and a low cost-per-million-tokens to accommodate the high volume of text processed in chained workflows.

Example Work Load

Solution based on Spring AI agentic platform is used by multiple users, where 2 to 3 users may execute simultaneous workflows containing 4 to 5 chained AI tasks.

One of the recommendation is to use OpenRouter Paid Tier. The most cost-efficient, highly reliable paid models on OpenRouter that fit this exact profile are given below.

Best for Complex Statistical Reasoning: DeepSeek R1
  • Model: deepseek/deepseek-r1
  • It is more expensive than DeepSeek V3
Best Overall for Data Insights: DeepSeek V3
  • Model: deepseek/deepseek-chat
Other
  • The models which are low-cost may not be much useful.
  • Models: Google Gemini 2.5 Flash
Tasks:Data Cleaning & Structuring Use DeepSeek V3
Tasks:Insight Analysis & Aggregation Use DeepSeek R1 or V3
(Requires strong logic and strict JSON generation)(Good intelligence at low cost)

How to Use OpenRouter

To use OpenRouter Models,

  • Set its API Key as OPENAI_API_KEY.
  • Add below properties to agent\spring-java\agentserver\src\main\resources\application.properties.
  • Use suffix :free for model free tier usage.
spring.ai.openai.base-url=https://openrouter.ai/api

# Optionally Specify model to use in the format, e.g.
#
#spring.ai.openai.chat.options.model=deepseek/deepseek-chat
#spring.ai.openai.chat.options.model=deepseek/deepseek-r1
#spring.ai.openai.chat.options.model=anthropic/claude-haiku-4.5
#
#spring.ai.openai.chat.options.model=google/gemini-2.5-flash
#spring.ai.openai.chat.options.model=qwen/qwen3-coder:free
#spring.ai.openai.chat.options.model=google/gemma-4-31b-it
#spring.ai.openai.chat.options.model=google/gemma-4-26b-a4b-it

How to Optimize Model Usage Costs

Token Calculation

Total Tokens = Prompt + Reasoning + Completion

Optimize Tokens

Data Context

  • Give optimized data for LLMs to perform Best.
  • Use Data Modeling and Pushdown features.
  • Breakdown AI Tasks to perform data preparation by low cost models.
  • And give aggregated final context to reasoning models to process final output.

In general

  • Avoid giving very large context in Prompts.
  • Limit agents to small MCP Tool groups (e.g., under 30 tools), which saves massive input token costs.
  • Use RAG with Relevance search.

Event-Driven AI Workflows

  • Use pre-built Kafka based events Pre-configured for your Data
  • Use Configuration For Backend:
    • Kafka Events Producer, Consumer : Include ✅

Invoke Workflows

→ From Event Trigger, invoke AI Workflow via webhook or API call.

→ From Backend Event Consumer Service, invoke AI Workflow via WebClient API call.

- See file for details: `EmKafkaConsumerService.java`

Create Events From Workflows

→ In workflows, AI models connect to Kafka via Tools to create events.

- Use pre-built kafka event APIs and event templates pre-configured for your data.
- e.g. use tool like:
- `Tool Name: event_publish_product`

RAG and Semantic Cache

RAG - Retrieval Augmented Generation

RAG Provides the LLM with knowledge context from matching documents in vectorStore.

Semantic Cache

Semantic Caching remembers previous answers to similar questions. And avoids expensive LLM calls by reusing responses for similar queries.

Agent Server Redis Configuration

Agent Server is configured to use Redis - RAG, Semantic Cache If Backend Redis Enable Selected.

  • Dependencies and application.properties are set for using Redis, docker image is redis-stack-server.

Redis Vector Store

  • RedisVectorStore lets developers use Redis as a high-performance vector database.

Setup and Use Redis Vector Store

  • RedisVectorStore - To initialize, load documents into, and use it for RAG & Semantic Cache, See sample code file provided:

    • agentserver\src\main\java\com\example\agentserver\RedisVectorStore.java
  • Customize Agent Server to implement them further.

  • Customize to Implement RAG and Semantic Cache

  • Create source files from the sample code and customize existing code to use VectorStore.

  • Please refer to below Spring AI doc pages:

  • You can also initialize vectorStore by uncommenting application.property (instead of Configuration class):

    • spring.ai.vectorstore.redis.initialize-schema=true

Advanced Agentic patterns

Try out advanced Agentic patterns after completing tutorial "Chapter 6 : AI Agents".

Getting Started Steps

In src\main\java\com\example\emagent\app\EmAgentSpringApp.java

  • Enable Run EmAgentExtra

The additional agentic patterns are called via:

  • src\main\java\com\example\emagent\app\EmAgentsExtra.java

In EmAgentsExtra.java:

Choose and Enable run flags for below patterns as required.

    boolean runEmAgentParallel = false;
boolean runEmAgentEvaluatorOptimizerFixed = false;
boolean runEmAgentEvaluatorOptimizer = false;
boolean runEmAgentOrchestratorWorkers = false;
boolean runEmAgentRouting = false;

NOTE: All Agents come with access to MCP Tools

Agentic Pattern Parallelization

  • AI Agent EmAgentParallel
  • Agentic System Workflows with agentic patterns Parallelization

Agentic Pattern Evaluator-Optimizer Fixed

  • AI Agent EmAgentEvaluatorOptimizerFixed
  • Agentic System Workflows with agentic patterns Evaluator-Optimizer
    • Fixed - means fixed number of evaluator loops.

Agentic Pattern Evaluator-Optimizer

  • AI Agent EmAgentEvaluatorOptimizer
  • Agentic System Workflows with agentic patterns Evaluator-Optimizer
    • There is no limit on number of evaluator loops. It is performed till satifying criteria.

Agentic Pattern Orchestrator-Workers

  • AI Agent EmAgentOrchestratorWorkers
  • Agentic System Workflows with agentic patterns Orchestrator-Workers

Agentic Pattern Routing

  • AI Agent EmAgentRouting
  • Agentic System Workflows with agentic patterns Routing

Schedule AI Agent Runs

  • Schedule AI Agent Runs using Kestra
  • See section Agents & Dashboard : Schedule Agent Runs
  • When Creating Agent Run Schedule in Kestra
    • Use java jar process execute based Flow
  • You can use compiled jar runs for below main program:
agent\spring-java\emagent\src\main\java\com\example\emagent\app\EmAgentSpringApp.java

Optionally Enable advanced agentic patterns in it via:

EmAgentsExtra.java
  • Email Agent Run reports:
    • To send an output report via email in Kestra, use the MailSend task