ai-core
Provider-independent abstractions such as ChatClient, EmbeddingClient, VectorStore, DocumentChunker, and AIProvider.
Spring Middleware AI provides a declarative infrastructure for integrating LLMs and Retrieval-Augmented Generation (RAG) into distributed systems.
The platform focuses on knowledge modeling, structured retrieval, semantic chunking, metadata-aware filtering, and production-grade reactive execution instead of prompt-only AI orchestration.
Spring Middleware AI separates infrastructure concerns, semantic modeling, retrieval strategy, and LLM interaction. The goal is to make RAG deterministic, explainable, and production-ready.
Raw Data (JSON, Markdown, APIs)
→ Declarative ETL
→ Semantic Chunks
→ Embeddings
→ Vector Store
→ Query Planning
→ Retrieval
→ LLM Response
The platform is designed around a core principle: separate infrastructure, knowledge modeling, and query understanding.
The AI platform is divided into focused modules covering contracts, providers, infrastructure implementations, and Spring Boot orchestration.
ai
├─ ai-core
├─ ai-ollama
├─ ai-infrastructure
└─ ai-boot
Provider-independent abstractions such as ChatClient, EmbeddingClient, VectorStore, DocumentChunker, and AIProvider.
Vector stores, chunkers, document sources, and retrieval implementations.
Spring Boot auto-configuration, indexing lifecycle, orchestration, conversations, and chat APIs.
Local LLM and embedding integration through Ollama for local-first AI execution.
Spring Middleware AI is designed around non-blocking execution and reactive orchestration from indexing to retrieval.
Mono.fromCallable(() -> embeddingClient.generate(...))
.subscribeOn(Schedulers.boundedElastic());
Chunking, embedding generation, retrieval, and orchestration are designed to work inside reactive flows.
Chunks can be processed concurrently to improve indexing throughput without blocking the application runtime.
Reactive execution allows AI pipelines to scale naturally inside distributed systems and microservice platforms.
Spring Middleware AI uses a declarative ETL model to transform raw system data into semantic knowledge.
Retrieve documents from APIs, databases, files, GraphQL endpoints, or custom providers.
Convert raw data into semantic chunks with structured metadata.
Generate embeddings and store chunks inside vector databases.
Query planners and filters retrieve contextual knowledge for LLMs.
One of the key features of Spring Middleware AI is declarative JSON chunking. Raw JSON is transformed into semantic text and structured metadata before embeddings are generated.
JSON → stringify → embeddings ❌
Raw JSON serialization usually produces weak semantic retrieval and poor filtering capabilities.
JSON → semantic modeling → embeddings ✔
The framework transforms structured data into natural semantic context while preserving metadata for deterministic retrieval.
rules:
- name: product
extractor-path: "$.data.catalogs.content[*].products[*]"
JSON chunking uses a declarative DSL to define extraction, metadata generation, templates, hierarchy, and semantic context.
- json-data-types: [FIELD, META_DATA]
name: productType
extractor-path: "$.type"
- template: "Product {productName}
belongs to catalog {catalogName}"
Relationships such as catalog → product → review can be modeled declaratively.
Numeric fields, identifiers, and categories are preserved for executable filtering.
Generated chunks contain contextual text optimized for semantic retrieval.
Retrieval is not based only on embedding similarity. Queries are transformed into structured execution plans.
Natural language becomes executable retrieval instructions.
Numeric and structured filters are applied directly in vector stores.
Embeddings retrieve semantically relevant chunks.
Retrieved chunks become deterministic context for the LLM.
The planner transforms user questions into optimized retrieval plans containing semantic queries, filters, and retrieval strategy.
{
"optimizedQuery": "digital products reviewCount > 5",
"filters": [
{
"field": "productType",
"values": ["DIGITAL"],
"operator": "EQUAL"
}
],
"useSemanticSearch": true
}
Support for ranges and comparisons such as greater-than or less-than queries.
Filters are executable retrieval instructions, not prompt suggestions.
Retrieval behavior becomes predictable and explainable.
Metadata filters are executed directly in vector stores such as Qdrant or MongoDB.
{
"must": [
{
"key": "metadata.productType",
"match": {
"value": "PHYSICAL"
}
}
]
}
Criteria.where("metadata.reviewCount")
.gte(3);
Numeric metadata is stored as real numeric types, not strings, allowing native range filtering inside vector databases.
AI infrastructure integrates directly into Spring Boot applications through auto-configuration and declarative setup.
middleware:
ai:
document:
sources:
catalogs-full:
type: CUSTOM
provider-name: catalogs-graphql-provider
chunker: catalogs-chunker
indexing:
enabled: true
index-on-startup: true
vector-store:
qdrant:
enabled: true
provider:
openai:
enabled: true
models:
- gpt-5
Most RAG systems rely almost entirely on embeddings and similarity search. Spring Middleware AI focuses on semantic modeling, structured metadata, retrieval planning, and deterministic filtering.
query → embedding → similarity → hope
query → plan → filters + semantic
→ controlled retrieval → answer
The real value of RAG comes from how data is extracted, transformed, modeled, and retrieved — not only from embeddings.