AI Infrastructure

Declarative RAG and AI infrastructure for Spring Boot

Spring Middleware AI provides a declarative infrastructure for integrating LLMs and Retrieval-Augmented Generation (RAG) into distributed systems.

The platform focuses on knowledge modeling, structured retrieval, semantic chunking, metadata-aware filtering, and production-grade reactive execution instead of prompt-only AI orchestration.

Core idea

Model knowledge, not pipelines

Spring Middleware AI separates infrastructure concerns, semantic modeling, retrieval strategy, and LLM interaction. The goal is to make RAG deterministic, explainable, and production-ready.

Raw Data (JSON, Markdown, APIs)
→ Declarative ETL
→ Semantic Chunks
→ Embeddings
→ Vector Store
→ Query Planning
→ Retrieval
→ LLM Response

The platform is designed around a core principle: separate infrastructure, knowledge modeling, and query understanding.

Module structure

Composable AI architecture

The AI platform is divided into focused modules covering contracts, providers, infrastructure implementations, and Spring Boot orchestration.

ai
 ├─ ai-core
 ├─ ai-ollama
 ├─ ai-infrastructure
 └─ ai-boot

ai-core

Provider-independent abstractions such as ChatClient, EmbeddingClient, VectorStore, DocumentChunker, and AIProvider.

ai-infrastructure

Vector stores, chunkers, document sources, and retrieval implementations.

ai-boot

Spring Boot auto-configuration, indexing lifecycle, orchestration, conversations, and chat APIs.

ai-ollama

Local LLM and embedding integration through Ollama for local-first AI execution.

Reactive execution

Fully reactive pipelines

Spring Middleware AI is designed around non-blocking execution and reactive orchestration from indexing to retrieval.

Mono.fromCallable(() -> embeddingClient.generate(...))
    .subscribeOn(Schedulers.boundedElastic());

No blocking pipelines

Chunking, embedding generation, retrieval, and orchestration are designed to work inside reactive flows.

Parallel indexing

Chunks can be processed concurrently to improve indexing throughput without blocking the application runtime.

Distributed friendly

Reactive execution allows AI pipelines to scale naturally inside distributed systems and microservice platforms.

Declarative ETL

Extract → Transform → Load

Spring Middleware AI uses a declarative ETL model to transform raw system data into semantic knowledge.

01

Extract

Retrieve documents from APIs, databases, files, GraphQL endpoints, or custom providers.

02

Transform

Convert raw data into semantic chunks with structured metadata.

03

Load

Generate embeddings and store chunks inside vector databases.

04

Retrieve

Query planners and filters retrieve contextual knowledge for LLMs.

JSON chunking

Semantic modeling instead of stringify-and-embed

One of the key features of Spring Middleware AI is declarative JSON chunking. Raw JSON is transformed into semantic text and structured metadata before embeddings are generated.

Incorrect approach

JSON → stringify → embeddings ❌

Raw JSON serialization usually produces weak semantic retrieval and poor filtering capabilities.

Spring Middleware approach

JSON → semantic modeling → embeddings ✔

The framework transforms structured data into natural semantic context while preserving metadata for deterministic retrieval.

rules:
  - name: product
    extractor-path: "$.data.catalogs.content[*].products[*]"
Transformation rules

Rule-based semantic modeling

JSON chunking uses a declarative DSL to define extraction, metadata generation, templates, hierarchy, and semantic context.

Extractor rules

- json-data-types: [FIELD, META_DATA]
  name: productType
  extractor-path: "$.type"

Semantic templates

- template: "Product {productName}
  belongs to catalog {catalogName}"

Hierarchical modeling

Relationships such as catalog → product → review can be modeled declaratively.

Structured metadata

Numeric fields, identifiers, and categories are preserved for executable filtering.

Natural semantic text

Generated chunks contain contextual text optimized for semantic retrieval.

Retrieval model

Structured RAG execution

Retrieval is not based only on embedding similarity. Queries are transformed into structured execution plans.

User Query Query Planner Vector Search Context LLM

Query planning

Natural language becomes executable retrieval instructions.

Metadata filtering

Numeric and structured filters are applied directly in vector stores.

Semantic retrieval

Embeddings retrieve semantically relevant chunks.

Controlled context

Retrieved chunks become deterministic context for the LLM.

Query planner

Natural language becomes executable retrieval

The planner transforms user questions into optimized retrieval plans containing semantic queries, filters, and retrieval strategy.

{
  "optimizedQuery": "digital products reviewCount > 5",
  "filters": [
    {
      "field": "productType",
      "values": ["DIGITAL"],
      "operator": "EQUAL"
    }
  ],
  "useSemanticSearch": true
}

Numeric filtering

Support for ranges and comparisons such as greater-than or less-than queries.

Structured execution

Filters are executable retrieval instructions, not prompt suggestions.

Deterministic retrieval

Retrieval behavior becomes predictable and explainable.

Vector stores

Structured filtering at execution time

Metadata filters are executed directly in vector stores such as Qdrant or MongoDB.

Qdrant

{
  "must": [
    {
      "key": "metadata.productType",
      "match": {
        "value": "PHYSICAL"
      }
    }
  ]
}

MongoDB

Criteria.where("metadata.reviewCount")
    .gte(3);

Numeric metadata is stored as real numeric types, not strings, allowing native range filtering inside vector databases.

Configuration

Spring Boot integration

AI infrastructure integrates directly into Spring Boot applications through auto-configuration and declarative setup.

middleware:
  ai:
    document:
      sources:
        catalogs-full:
          type: CUSTOM
          provider-name: catalogs-graphql-provider
          chunker: catalogs-chunker

      indexing:
        enabled: true
        index-on-startup: true

    vector-store:
      qdrant:
        enabled: true

    provider:
      openai:
        enabled: true
        models:
          - gpt-5
Final insight

RAG quality depends more on transformation than embeddings

Most RAG systems rely almost entirely on embeddings and similarity search. Spring Middleware AI focuses on semantic modeling, structured metadata, retrieval planning, and deterministic filtering.

Typical RAG

query → embedding → similarity → hope

Spring Middleware AI

query → plan → filters + semantic
→ controlled retrieval → answer

The real value of RAG comes from how data is extracted, transformed, modeled, and retrieved — not only from embeddings.