Skip to content

Feat : Add Elasticsearch Document Reader | 添加 Elasticsearch 文档读取器 #390

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 7, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 127 additions & 0 deletions community/document-readers/es-document-reader/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# Spring AI Alibaba Elasticsearch Document Reader

This module provides a document reader implementation for Elasticsearch, allowing you to retrieve documents from Elasticsearch indices for use with Spring AI.

本模块提供了 Elasticsearch 的文档读取器实现,允许从 Elasticsearch 索引中检索文档以供 Spring AI 使用。

## Features | 特性

- Read documents from Elasticsearch indices | 从 Elasticsearch 索引中读取文档
- Support for both single node and cluster mode | 支持单节点和集群模式
- Support for HTTPS and basic authentication | 支持 HTTPS 和基本认证
- Customizable query field | 可自定义查询字段
- Configurable maximum results | 可配置最大结果数
- Support for both simple retrieval and query-based search | 支持简单检索和基于查询的搜索

## Usage | 使用方法

### Maven Dependency | Maven 依赖

```xml
<dependency>
<groupId>com.alibaba.cloud.ai</groupId>
<artifactId>es-document-reader</artifactId>
<version>${version}</version>
</dependency>
```

### Single Node Configuration | 单节点配置

```java
ElasticsearchConfig config = new ElasticsearchConfig();
config.setHost("localhost"); // Default: localhost | 默认值:localhost
config.setPort(9200); // Default: 9200 | 默认值:9200
config.setIndex("your-index"); // Required | 必填
config.setQueryField("content"); // Default: content | 默认值:content
config.setMaxResults(10); // Default: 10 | 默认值:10
config.setScheme("https"); // Default: http | 默认值:http

// Optional authentication | 可选的认证配置
config.setUsername("your-username");
config.setPassword("your-password");

ElasticsearchDocumentReader reader = new ElasticsearchDocumentReader(config);
```

### Cluster Configuration | 集群配置

```java
ElasticsearchConfig config = new ElasticsearchConfig();
// Configure cluster nodes | 配置集群节点
config.setNodes(Arrays.asList(
"node1:9200",
"node2:9201",
"node3:9202"
));
config.setIndex("your-index");
config.setQueryField("content");
config.setScheme("https");

// Optional authentication (applied to all nodes) | 可选的认证配置(应用于所有节点)
config.setUsername("your-username");
config.setPassword("your-password");

ElasticsearchDocumentReader reader = new ElasticsearchDocumentReader(config);
```

### Reading Documents | 读取文档

```java
// Get all documents | 获取所有文档
List<Document> documents = reader.get();

// Get document by ID | 通过 ID 获取文档
Document document = reader.getById("document-id");

// Search documents by query | 通过查询搜索文档
List<Document> queryResults = reader.readWithQuery("your search query");
```

## Configuration Properties | 配置属性

| Property 属性 | Description 描述 | Default Value 默认值 |
|------------|----------------|------------------|
| host | Elasticsearch host 主机地址 | localhost |
| port | Elasticsearch port 端口 | 9200 |
| nodes | List of cluster nodes (host:port) 集群节点列表 | [] |
| index | Index name to query 要查询的索引名称 | - |
| queryField | Field to search in 搜索字段 | content |
| username | Username for authentication 认证用户名 | - |
| password | Password for authentication 认证密码 | - |
| maxResults | Maximum number of results to return 最大返回结果数 | 10 |
| scheme | Connection scheme (http/https) 连接方案 | http |

## Cluster Support | 集群支持

The reader supports both single node and cluster configurations:
读取器支持单节点和集群两种配置方式:

- If `nodes` is provided, it will use cluster mode | 如果提供了 `nodes`,将使用集群模式
- If `nodes` is empty, it will use single node mode with `host` and `port` | 如果 `nodes` 为空,将使用 `host` 和 `port` 的单节点模式
- All nodes in the cluster share the same authentication and scheme settings | 集群中的所有节点共享相同的认证和方案设置

## HTTPS Support | HTTPS 支持

For secure connections: | 对于安全连接:

1. Set `scheme` to "https" | 将 `scheme` 设置为 "https"
2. The reader will automatically: | 读取器将自动:
- Create a secure SSL context | 创建安全的 SSL 上下文
- Trust all certificates (for development) | 信任所有证书(用于开发环境)
- Handle hostname verification | 处理主机名验证
- Apply authentication if provided | 应用提供的认证信息

## Testing | 测试

The module includes comprehensive tests. To run the tests:
模块包含完整的测试。要运行测试:

```bash
mvn test
```

## Requirements | 要求

- Java 17 or later | Java 17 或更高版本
- Elasticsearch 8.x | Elasticsearch 8.x 版本
- Docker (for running tests) | Docker(用于运行测试)
72 changes: 72 additions & 0 deletions community/document-readers/es-document-reader/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
~ Copyright 2024-2025 the original author or authors.
~
~ Licensed under the Apache License, Version 2.0 (the "License");
~ you may not use this file except in compliance with the License.
~ You may obtain a copy of the License at
~
~ https://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>com.alibaba.cloud.ai</groupId>
<artifactId>spring-ai-alibaba</artifactId>
<version>${revision}</version>
<relativePath>../../../pom.xml</relativePath>
</parent>

<artifactId>es-document-reader</artifactId>
<name>Spring AI Alibaba Elasticsearch Document Reader</name>
<description>Spring AI Alibaba Elasticsearch Document Reader</description>

<dependencies>
<!-- Spring AI -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-core</artifactId>
</dependency>

<!-- Elasticsearch -->
<dependency>
<groupId>co.elastic.clients</groupId>
<artifactId>elasticsearch-java</artifactId>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
</dependency>

<!-- Test dependencies -->
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-core</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.assertj</groupId>
<artifactId>assertj-core</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.testcontainers</groupId>
<artifactId>elasticsearch</artifactId>
<scope>test</scope>
</dependency>
</dependencies>

</project>
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
/*
* Copyright 2024-2025 the original author or authors.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* https://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.alibaba.cloud.ai.document.reader.es;

import java.util.ArrayList;
import java.util.List;

/**
* Configuration class for Elasticsearch document reader. Contains all necessary settings
* to connect to and query Elasticsearch.
*
* @author brianxiadong
* @since 0.0.1
*/
public class ElasticsearchConfig {

/**
* Elasticsearch host URL
*/
private String host = "localhost";

/**
* Elasticsearch port
*/
private int port = 9200;

/**
* List of cluster nodes in format: hostname:port
*/
private List<String> nodes = new ArrayList<>();

/**
* Index name to query
*/
private String index;

/**
* Query field to search in
*/
private String queryField = "content";

/**
* Username for authentication (optional)
*/
private String username;

/**
* Password for authentication (optional)
*/
private String password;

/**
* Maximum number of documents to retrieve
*/
private int maxResults = 10;

/**
* Connection scheme (http/https)
*/
private String scheme = "http";

// Getters and Setters
public String getHost() {
return host;
}

public void setHost(String host) {
this.host = host;
}

public int getPort() {
return port;
}

public void setPort(int port) {
this.port = port;
}

public List<String> getNodes() {
return nodes;
}

public void setNodes(List<String> nodes) {
this.nodes = nodes;
}

public String getIndex() {
return index;
}

public void setIndex(String index) {
this.index = index;
}

public String getQueryField() {
return queryField;
}

public void setQueryField(String queryField) {
this.queryField = queryField;
}

public String getUsername() {
return username;
}

public void setUsername(String username) {
this.username = username;
}

public String getPassword() {
return password;
}

public void setPassword(String password) {
this.password = password;
}

public int getMaxResults() {
return maxResults;
}

public void setMaxResults(int maxResults) {
this.maxResults = maxResults;
}

public String getScheme() {
return scheme;
}

public void setScheme(String scheme) {
this.scheme = scheme;
}

}
Loading
Loading