Skip to content

[doc] (fix) fix GEO head hide problem #2684

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 1, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 25 additions & 25 deletions docs/sql-manual/basic-element/sql-data-types/semi-structured/GEO.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"language": "en"
}
---
# GEO Type Documentation
## GEO Type Documentation

Geospatial types are special data types in databases used to store and manipulate geospatial data, which can represent geometric objects such as points, lines, and polygons.
- Core purposes:
Expand All @@ -14,31 +14,31 @@ Geospatial types are special data types in databases used to store and manipulat
Geographic Information Systems are widely used in map services, logistics scheduling, location-based social networking, meteorological monitoring, etc. The core requirement is to efficiently store massive spatial data and support low-latency spatial computing.


# Core Encoding Technologies
## S2 Geometry Library
## Core Encoding Technologies
### S2 Geometry Library
S2 Geometry is a spherical geometry encoding system developed by Google. Its core idea is to achieve efficient indexing of global geospatial data through projection from a sphere to a plane.

### Core Principles
#### Core Principles
- Spherical projection: Project the Earth's sphere onto the 6 faces of a regular hexahedron, converting 3D spherical data into 2D planar data.
- Hierarchical grid division: Each face is recursively divided into quadrilateral grids (cells), and each cell can be further subdivided into 4 smaller sub-cells, forming a hierarchical structure with 30 levels of precision (the higher the level, the smaller the cell area and the higher the precision).
- 64-bit encoding: Each cell is assigned a unique 64-bit ID, through which spatial positions can be quickly located and spatial relationships can be judged.
- Hilbert curve ordering: Hilbert space-filling curves are used to encode cells, making spatially adjacent cells have continuous IDs and optimizing range query performance.

### Advantages
#### Advantages
- High precision and smooth transition: 30 levels of hierarchy, with precision ranging from global (level 0) to centimeter-level (level 30), ensuring smooth transition to meet the needs of different scenarios.
- Efficiency in global range queries: Suitable for large-scale spatial queries (e.g., cross-continental, cross-country regional analysis) with no significant performance degradation.
- Efficient spatial relationship calculation: Inclusion, intersection, and other relationships can be quickly judged through cell IDs, avoiding complex geometric operations.


## GeoHash Encoding
### GeoHash Encoding
GeoHash is a geocoding method based on equirectangular projection, which realizes spatial indexing by converting longitude and latitude into strings.

### Core Principles
#### Core Principles
- Planar projection: Approximate the Earth's sphere as a plane, and recursively divide the area through binary division of longitude and latitude.
- Rectangular grid division: Divide the Earth's surface into rectangular cells with different precisions. The length of the string determines the precision (up to 12 characters), and each additional character increases the precision by approximately 10 times.
- Z-order curve encoding: Form a Z-order curve by alternately truncating the binary bits of longitude and latitude, converting 2D coordinates into 1D strings.

### Features
#### Features
- Indexing convenience: Adjacent areas can be quickly queried through string prefix matching (e.g., GeoHash codes with the same prefix correspond to spatially adjacent areas).
- Limitations:
- Limited precision levels: Up to 12 levels, with steep transitions between levels, making it difficult to meet the needs of high-precision smooth division.
Expand All @@ -53,14 +53,14 @@ Comprehensively comparing the characteristics of S2 Geometry Library and GeoHash
- Spatial continuity: Hilbert curves have better spatial continuity than Z-order curves, which can reduce redundant calculations in range queries.


# Introduction to WKT
## Introduction to WKT
WKT (Well-Known Text) is a standard text format for representing geospatial data.

## Definition
### Definition
- Text format: Describe the structure and coordinates of geometric objects with text strings.
- Features: Human-readable, easy to edit, suitable for manual input or simple data exchange.

## Syntax Structure
### Syntax Structure
- Basic format: GeometryType(CoordinateValues)
- Common geometric types:
- Point: POINT(longitude, latitude)
Expand All @@ -70,14 +70,14 @@ WKT (Well-Known Text) is a standard text format for representing geospatial data
- Polygon: POLYGON ((0 0, 10 0, 10 10, 0 10, 0 0))


# Introduction to WKB
## Introduction to WKB
WKB (Well-Known Binary) is a standard binary data format for representing geospatial data.

## Definition
### Definition
- Binary format: Represent geometric objects with binary encoding, which is more compact and efficient than WKT.
- Features: Optimized for internal storage and transmission by computers, saving space and enabling fast parsing.

## Encoding Structure
### Encoding Structure
WKB consists of the following parts:
- Byte order (1 byte):
- 0x00: Big Endian (network byte order)
Expand All @@ -92,15 +92,15 @@ WKB consists of the following parts:
- LineString: coordinates of point1, coordinates of point2
- Polygon: coordinates of point1, coordinates of point2...

### Example
#### Example
```sql
01 01 00 00 00 00 00 00 00 00 F0 3F 00 00 00 00 00 00 00 40
└─┘ └─┘ └───────────────┘ └───────────────┘
│ │ │ │
Little Endian Point type x=1.0 y=2.0
```

# GeoPoint Type
## GeoPoint Type
1. Storing WKT Format Using String or Varchar

```sql
Expand Down Expand Up @@ -165,7 +165,7 @@ select st_astext(st_point(x,y)) from simple_point_double;
```


# GeoLine type
## GeoLine type

1. Storing WKT Format Using String or Varchar

Expand Down Expand Up @@ -210,7 +210,7 @@ select st_astext(st_geometryfromwkb(wkb)) from simple_line;
+-------------------------------------------------+
```

# GeoPolygon type
## GeoPolygon type

1. Storing WKT Format Using String or Varchar

Expand Down Expand Up @@ -253,7 +253,7 @@ select st_astext(st_geometryfromwkb(wkb)) from simple_polygon_wkb;
+------------------------------------------+
```

# GeoMultiPolygon type
## GeoMultiPolygon type


1. Storing WKT Format Using String or Varchar
Expand All @@ -280,7 +280,7 @@ select st_astext(st_geometryfromtext(wkt)) from simple_multipolygon;
```
Note: WKB format conversion for GeoMultiPolygon is not yet supported

# GeoCircle type
## GeoCircle type

Storage Method (Storing Center Coordinates and Radius Using Floating-Point Numbers)
Since circles do not conform to WKB and WKT formats, three floating-point numbers are needed to store the center coordinates (x, y) and radius (R) respectively:
Expand All @@ -300,8 +300,8 @@ select st_astext(st_circle(X,Y,R)) from simple_circle;
+-----------------------------+
```

# Constraints
## Index
## Constraints
### Index
Since Doris does not directly implement the Geo type but stores and converts it using WKT and WKB, query acceleration for GEO type queries through indexing technology is not possible.

Only 13-digit precision can be guaranteed when converting WKT to GEO output:
Expand Down Expand Up @@ -329,8 +329,8 @@ mysql> select ST_AsText(ST_GeomFromWKB(ST_AsBinary(ST_Point(24.7,3.1415926535897



# Common Uses and Methods of Geo Types in Doris
## Calculating Distance Between Two Points on Earth
## Common Uses and Methods of Geo Types in Doris
### Calculating Distance Between Two Points on Earth

The distance of Beijing to Shanghai
Coordinates of Beijing (116.4074, 39.9042) and Shanghai (121.4737, 31.2304):
Expand Down Expand Up @@ -363,7 +363,7 @@ select ST_DISTANCE_SPHERE(116.4074, 39.9042, -74.0060, 40.7128);
![alt text](/images/BeijingToNewyork.png)


## Calculating Area of a Region on the Earth's Sphere
### Calculating Area of a Region on the Earth's Sphere

Estimating New York's Area
Outline the New York area roughly with a polygon and calculate the area:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@



# GEO类型文档
## GEO类型文档

地理空间类型是数据库中用于存储和操作地理空间数据的特殊数据类型,可表示点、线、面等几何对象。
- 核心用途:
Expand All @@ -18,44 +18,45 @@
地理信息系统在地图服务、物流调度、位置社交、气象监测等领域有广泛应用,核心需求是高效存储海量空间数据并支持低延迟的空间计算。


# 核心编码技术
## S2 Geometry 库
## 核心编码技术
### S2 Geometry 库
S2 Geometry 是由 Google 开发的球面几何编码系统,核心思想是通过球面到平面的映射实现全球地理空间的高效索引。
核心原理:
#### 核心原理:
- 球面映射:将地球球面投影到正六面体的 6 个面上,将三维球面数据转换为二维平面数据。
- 层级网格划分:每个面被递归划分为四边形网格(cell),每个 cell 可进一步细分为 4 个更小的子 cell,形成 30 级精度的层级结构(级别越高,cell 面积越小,精度越高)。
- 64 位编码:每个 cell 被分配一个唯一的 64 位 ID,通过 ID 可快速定位空间位置并判断空间关系。
- Hilbert 曲线排序:采用 Hilbert 空间填充曲线对 cell 进行编码,使空间上相邻的 cell 具有连续的 ID,优化范围查询性能。
优势:
#### 优势:
- 高精度与平滑过渡:30 级层级划分,精度从全球范围(级别 0)到厘米级(级别 30),过渡平滑,满足不同场景需求。
- 全球范围查询效率:适合大尺度空间查询(如跨洲、跨国区域分析),无明显性能衰减。
- 空间关系计算高效:通过 cell ID 可快速判断包含、相交等关系,避免复杂的几何运算。

## GeoHash 编码
### GeoHash 编码
GeoHash 是一种基于正轴等角圆柱投影的地理编码方式,通过将经纬度转换为字符串实现空间索引。
核心原理:
#### 核心原理:
- 平面投影:将地球球面近似为平面,通过经度和纬度的二分法递归划分区域。
- 矩形网格划分:将地球表面划分为不同精度的矩形 cell,字符串长度决定精度(最长 12 位),长度每增加 1 位,精度约提升 10 倍。
- Z 阶曲线编码:通过交替截取经纬度的二进制位,形成 Z 阶曲线(Z-order curve),将二维坐标转换为一维字符串。
特点:
#### 特点:
- 索引便捷性:通过字符串前缀匹配可快速查询相邻区域(如前缀相同的 GeoHash 编码对应空间上邻近的区域)。
- 局限性:
- 精度层级有限:最多 12 级,层级过渡较陡峭,难以满足高精度平滑划分需求。
- Z 阶曲线突变性:空间上相邻的区域可能因曲线跳跃导致编码不连续,影响范围查询准确性。
- 大尺度查询效率低:全球范围查询时,需扫描大量离散 cell,性能较差。

### 综合比对选择
综合对比 S2 Geometry 和 GeoHash 的特性,我们选择 S2 Geometry 库作为地理空间处理的第三方依赖,主要原因如下:
- 全球范围查询适配性:S2 的层级网格设计更适合大尺度空间分析,而 GeoHash 在跨区域查询时存在性能瓶颈。
- 精度与平滑性:S2 的 30 级层级划分可实现从全球到厘米级的平滑过渡,满足多场景精度需求,优于 GeoHash 的 12 级划分。
- 空间连续性:Hilbert 曲线相比 Z 阶曲线的空间连续性更好,可减少范围查询中的冗余计算。


## WKT介绍
### WKT介绍
WKT(Well-Known Text) 是一种用于表示地理空间数据的标准的文本格式。
### 定义
#### 定义
- 文本格式:用文本字符串描述几何对象的结构和坐标。
- 特点:人类可读、易于编辑,适合手动输入或简单数据交换。
### 语法结构
#### 语法结构
- 基本格式:几何类型(坐标值)
- 常见几何类型:
- 点(Point):POINT(经度,纬度)
Expand All @@ -64,12 +65,12 @@ WKT(Well-Known Text) 是一种用于表示地理空间数据的标准的文本
例如:LINESTRING(0 0,1 1) 表示连接二个点的线段。
- 多边形(Polygon):POLYGON ((0 0, 10 0, 10 10, 0 10, 0 0))

## WKB介绍
### WKB介绍
WKB(Well-Known Binary) 是一种用于表示地理空间数据的标准的二进制数据格式。
### 定义
#### 定义
- 二进制格式:用二进制编码表示几何对象,比 WKT 更紧凑、高效。
- 特点:计算机内部存储和传输更优,节省空间,解析速度快。
### 编码结构
#### 编码结构
WKB 由以下部分组成:
- 字节序(1 字节):
- 0x00:大端序(Big Endian,网络字节序)
Expand All @@ -91,7 +92,7 @@ WKB 由以下部分组成:
小端 点类型 x=1.0 y=2.0
```

# GeoPoint类型
## GeoPoint类型
1.利用String类型或者Varchar类型存储wkt格式的文本进行存储

```sql
Expand Down Expand Up @@ -156,7 +157,7 @@ select st_astext(st_point(x,y)) from simple_point_double;
```


# GeoLine类型
## GeoLine类型

利用String类型或者Varchar类型存储wkt格式的文本进行存储

Expand Down Expand Up @@ -201,7 +202,7 @@ select st_astext(st_geometryfromwkb(wkb)) from simple_line;
+-------------------------------------------------+
```

# GeoPolygon类型
## GeoPolygon类型

1.利用String类型或者Varchar类型存储wkt格式的文本进行存储

Expand Down Expand Up @@ -242,7 +243,7 @@ select st_astext(st_geometryfromwkb(wkb)) from simple_polygon_wkb;
+------------------------------------------+
```

# GeoMultiPolygon类型
## GeoMultiPolygon类型


1.利用String类型或者Varchar类型存储wkt格式的文本进行存储
Expand All @@ -269,7 +270,7 @@ select st_astext(st_geometryfromtext(wkt)) from simple_multipolygon;
```
GeoMultiPolygon的wkb格式转换还暂不支持

# GeoCircle类型
## GeoCircle类型

利用三个浮点数分别存储圆的中心坐标x,y和圆的半径(因为circle并不符合wkb与wkt格式,所以只能这样存储)

Expand All @@ -288,8 +289,8 @@ select st_astext(st_circle(X,Y,R)) from simple_circle;
+-----------------------------+
```

# Doris中对Geo类型的约束
## 索引
## Doris中对Geo类型的约束
### 索引
因为doris并未直接实现Geo这种类型,而是用wkt,wkb来存储和转换,所以并不能通过索引技术对GEO类型的查询进行加速。
精度

Expand Down Expand Up @@ -317,8 +318,8 @@ mysql> select ST_AsText(ST_GeomFromWKB(ST_AsBinary(ST_Point(24.7,3.1415926535897
```


# Geo类型常见的使用用途和方式
## 计算地球上两点之间的距离
## Geo类型常见的使用用途和方式
### 计算地球上两点之间的距离
计算北京到上海的距离,北京经度和纬度是(116.4074, 39.9042),上海的经度和纬度是(121.4737, 31.2304),可以通过下面这个函数来计算两个地方之间的距离。

```sql
Expand Down Expand Up @@ -348,7 +349,7 @@ select ST_DISTANCE_SPHERE(116.4074, 39.9042, -74.0060, 40.7128);
![alt text](/images/BeijingToNewyork.png)


## 计算地球球面上的一定区域面积
### 计算地球球面上的一定区域面积

大概计算纽约面积,多边形大概可以概括纽约整个面积

Expand Down
Loading