You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* API application development: [API Application Development](docs/docs-en/source/5.application-development/1.api/1.overview.md)
57
57
58
-
## Real-time Capabilities
58
+
## Performance
59
59
60
-
Compared with traditional stream processing engines such as Flink and Storm, which use tables as their data model for real-time processing, GeaFlow's graph-based data model has significant performance advantages when handling join relationship operations, especially complex multi-hops relationship operations like those involving 3 or more hops of join and complex loop searches.
GeaFlow supports incremental graph computation capabilities, allowing for continuous streaming incremental graph iterative computations or traversals on dynamic graphs (graphs that are constantly changing). When GeaFlow consumes messages from real-time middleware, the points associated with the real-time data in the current window are activated, triggering iterative graph computations. In each iteration, only the updated points need to notify their neighboring nodes, while unchanged points are not triggered for computation, significantly enhancing the timeliness of the calculations.
63
63
64
-
[Why using graphs for relational operations is more appealing than table joins?](docs/docs-en/source/reference/vs_join.md)
64
+
In the early days of the industry, there were systems for distributed offline graph computation using Spark GraphX. To support similar engine capabilities, Spark relied on the Spark Streaming framework. However, although this integrated approach can handle streaming consumption of point-edge data, it still requires full graph computations every time a calculation is triggered. This makes it challenging to meet the performance expectations of the business (this approach is also referred to as snapshot-based graph computation).
65
65
66
-
Association Analysis Demo Based on GQL:
66
+
Using the WCC (Weakly Connected Components) algorithm as an example, we compared the algorithmic execution time of GeaFlow and Spark solutions, with specific performance results as follows:
Since GeaFlow only activates the vertex-edge relations involved in the current window for incremental computation, the computation time can be completed within seconds, and the computation time for each window remains fairly stable. As the data volume increases, Spark’s need to backtrack through historical data during computation also grows. While the machine capacity has not reached its limit, the computation delay shows a positive correlation with the data volume. In similar conditions, GeaFlow's computation time may slightly increase but can generally still be kept at the level of seconds.
74
70
75
-
Association Analysis Demo Based on SQL:
76
71
77
-
```roomsql
78
-
--SQL Style
79
-
SELECT c.name
80
-
FROM course c JOIN selectCourse sc
81
-
ON c.id = sc.targetId
82
-
JOIN student s ON sc.srcId = s.id
83
-
;
84
-
```
72
+
### Stream Computation Acceleration
73
+
74
+
Compared to traditional stream processing engines (such as Flink and Storm, which are based on table models), GeaFlow utilizes a graph as its data model (using a vertex-edge storage format), offering significant performance advantages in handling Join operations, especially for complex multi-hop relationships (like joins exceeding 3 hops and complex cycle searches).
75
+
76
+
To make a comparison, we analyzed the performance of Flink and GeaFlow using the K-Hop algorithm. K-Hop relationships refer to chains of relationships in which individuals can know each other through K intermediaries. For example, in social networks, K-Hop indicates user relationships connected through K intermediaries. In transaction analysis, K-Hop refers to the path of funds transferred consecutively K times.
77
+
78
+
In comparing the time consumption of the K-Hop algorithm in Flink and GeaFlow:
As shown in the figure above, Flink performs slightly better than GeaFlow in one-hop and two-hop scenarios. This is because, in these cases, the data volume involved in the Join calculations is relatively small, and both the left and right tables are compact, resulting in shorter traversal times. Additionally, Flink's computation framework can cache the historical results of Join operations.
82
+
85
83
86
84
## Contribution
87
-
Thank you very much for contributing to GeaFlow, whether bug reporting, documentation improvement, or major feature development, we warmly welcome all contributions.
85
+
Thank you very much for contributing to GeaFlow, whether bug reporting, documentation improvement, or major feature development, we warmly welcome all contributions.
88
86
89
87
For more information: [Contribution](docs/docs-en/source/9.contribution.md).
Copy file name to clipboardExpand all lines: docs/docs-en/source/1.guide.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
Here is the documentation map to help users quickly learn and use geaflow.
3
3
4
4
## Introduction
5
-
**TuGraph Analytics** (alias: GeaFlow) is the [**fastest**](https://ldbcouncil.org/benchmarks/snb-bi/) open-source OLAP graph database developed by Ant Group. It supports core capabilities such as trillion-level graph storage, hybrid graph and table processing, real-time graph computation, and interactive graph analysis. Currently, it is widely used in scenarios such as data warehousing acceleration, financial risk control, knowledge graph, and social networks.
5
+
**GeaFlow** is the [**fastest**](https://ldbcouncil.org/benchmarks/snb-bi/) open-source OLAP graph database developed by Ant Group. It supports core capabilities such as trillion-level graph storage, hybrid graph and table processing, real-time graph computation, and interactive graph analysis. Currently, it is widely used in scenarios such as data warehousing acceleration, financial risk control, knowledge graph, and social networks.
6
6
7
7
For more information about GeaFlow: [GeaFlow Introduction](2.introduction.md)
@@ -51,19 +51,19 @@ Loop detection Demo provides two ways to interact:
51
51
* Method 1 Enter the dot information in the input box
52
52
* Method 2 Demonstrate using built-in data
53
53
54
-
> Both methods essentially call Tugraph Analytics for real-time calculations, but Method 2 omits the manual input process.
54
+
> Both methods essentially call GeaFlow for real-time calculations, but Method 2 omits the manual input process.
55
55
56
56
Here we use the built-in data for a quick demonstration, click [Options], select 'Add Points', 7 points of information appear in the canvas; Then select 'Add Edges'. We can see the add record in the above dialog.
0 commit comments