Skip to content

Commit 95d100b

Browse files
authored
Optimization and productizing docs (#196)
New optimization and productizing docs From @karynzv : This new branch is a clear start from the mess in the other PR. Let's ship this. Renaming the productizing file as suggested by @amotl Typo Fixing some line wraps Point to client side cursors page instead of server side cursors page which is still in construction Add note about improvement in 6.0
1 parent 72a8265 commit 95d100b

File tree

3 files changed

+749
-0
lines changed

3 files changed

+749
-0
lines changed
Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
.. _topics-to-watch-out-when-productizing-cratedb:
2+
3+
#######################################
4+
Key design considerations for scaling
5+
#######################################
6+
7+
This article explores critical design considerations to successfully scale
8+
CrateDB in large production environments to ensure performance and reliability
9+
as workloads grow.
10+
11+
.. _mindful-of-memory:
12+
13+
*******************************
14+
Be mindful of memory capacity
15+
*******************************
16+
17+
In CrateDB, operations requiring a working set like groupings, aggregations, and
18+
sorting are performed fully in memory without spilling over to disk.
19+
20+
Sometimes you may have a query that leads to a sub-optimal execution plan
21+
requiring lots of memory. If you are coming to CrateDB from other database
22+
systems, your experience may be that these queries will proceed to run taking
23+
longer than required and impacting other workloads in the meanwhile. Sometimes
24+
this effect may be obvious if a query takes a lot of resources and runs for a
25+
long time, other times it may go unnoticed if a query that could complete in say
26+
100 milliseconds takes one hundred times longer, 10 seconds, but the users put
27+
up with it without reporting to you.
28+
29+
If a query would require more heap memory than the interested nodes
30+
have available the query will fail with a particular type of error message that
31+
we call a ``CircuitBreakerException``. This is a fail-fast approach as we
32+
quickly see there is an issue and can optimize the query to get the best
33+
performance, without impacting other workloads.
34+
35+
Please take a look at :ref:`Query Optimization 101 <performance-optimization>`
36+
for strategies to optimize your queries when you encounter this situation.
37+
38+
.. _reading-lots-of-records:
39+
40+
*************************
41+
Reading lots of records
42+
*************************
43+
44+
When the HTTP endpoint is used CrateDB will prepare the entire response in
45+
memory before sending it to the client.
46+
47+
When the PostgreSQL protocol is used CrateDB attempts to stream the results but
48+
in many cases it still needs to bring all rows to the query handler node first.
49+
50+
So we should always limit how many rows we request at a time, see `Fetching
51+
large result sets from CrateDB`_.
52+
53+
.. _number-of=shards:
54+
55+
******************
56+
Number of shards
57+
******************
58+
59+
In CrateDB data in tables and partitions is distributed in storage units that we
60+
call shards.
61+
62+
If we do not specify how many shards we want for a table/partition CrateDB will
63+
derive a default from the number of nodes.
64+
65+
CrateDB also has replicas of data and this results in additional shards in the
66+
cluster.
67+
68+
Having too many or too few shards has performance implications, so it is very
69+
important to get familiar with the :ref:`Sharding Performance Guide
70+
<sharding_guide>`.
71+
72+
But in particular, there is a soft limit of 1000 shards per node; so table
73+
schemas, partitioning strategy, and number of nodes need to be planned to stay
74+
well below this limit, one strategy can be to aim for a configuration where even
75+
if one node in the cluster is lost the remaining nodes would still have less
76+
than 1000 shards.
77+
78+
If this was not considered when initially defining the tables we have the
79+
following considerations:
80+
81+
- changing the partitioning strategy requires creating a new table and copying
82+
over the data
83+
- the easiest way to change the number of shards on a partitioned table is to
84+
do it for new shards only with the ``ALTER TABLE ONLY`` command
85+
- see also `Changing the number of shards`_
86+
87+
.. _amount-of-indexed-columns:
88+
89+
*************************************
90+
Number of indexed fields in OBJECTs
91+
*************************************
92+
93+
``OBJECT`` columns are ``DYNAMIC`` by default and CrateDB indexes all their
94+
fields, providing excellent query performance without requiring manual indexing.
95+
However, excessive indexing can impact storage, write speed, and resource
96+
utilization.
97+
98+
- All fields in OBJECTs are automatically indexed when inserted.
99+
- CrateDB optimizes indexing using Lucene-based columnar storage.
100+
- A soft limit of 1,000 total indexed columns and OBJECT fields per table
101+
exists.
102+
- Going beyond this limit may impact performance.
103+
104+
In cases with many fields and columns, it is advised to determine if some
105+
OBJECTs or nested parts of them need to be indexed, and use the `ignored column
106+
policy`_ where applicable.
107+
108+
.. _section-joins:
109+
110+
*******
111+
JOINs
112+
*******
113+
114+
CrateDB is a lot better at JOINs than many of our competitors and is getting
115+
better at every release, but JOINs in distributed databases are tricky to
116+
optimize, so in many cases queries involving JOINs may need a bit of tweaking.
117+
118+
See `Using common table expressions to speed up queries`_
119+
120+
.. _changing the number of shards: https://cratedb.com/docs/crate/reference/en/latest/general/ddl/alter-table.html#alter-shard-number
121+
122+
.. _fetching large result sets from cratedb: https://community.cratedb.com/t/fetching-large-result-sets-from-cratedb/1270
123+
124+
.. _ignored column policy: https://cratedb.com/docs/crate/reference/en/latest/general/ddl/data-types.html#ignored
125+
126+
.. _using common table expressions to speed up queries: https://community.cratedb.com/t/using-common-table-expressions-to-speed-up-queries/1719

docs/performance/index.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,4 +15,6 @@ performance tuning and sharding.
1515
Sharding <sharding>
1616
inserts/index
1717
selects
18+
optimization
19+
designing_for_scale
1820
storage

0 commit comments

Comments
 (0)