|
| 1 | +.. _topics-to-watch-out-when-productizing-cratedb: |
| 2 | + |
| 3 | +####################################### |
| 4 | + Key design considerations for scaling |
| 5 | +####################################### |
| 6 | + |
| 7 | +This article explores critical design considerations to successfully scale |
| 8 | +CrateDB in large production environments to ensure performance and reliability |
| 9 | +as workloads grow. |
| 10 | + |
| 11 | +.. _mindful-of-memory: |
| 12 | + |
| 13 | +******************************* |
| 14 | + Be mindful of memory capacity |
| 15 | +******************************* |
| 16 | + |
| 17 | +In CrateDB, operations requiring a working set like groupings, aggregations, and |
| 18 | +sorting are performed fully in memory without spilling over to disk. |
| 19 | + |
| 20 | +Sometimes you may have a query that leads to a sub-optimal execution plan |
| 21 | +requiring lots of memory. If you are coming to CrateDB from other database |
| 22 | +systems, your experience may be that these queries will proceed to run taking |
| 23 | +longer than required and impacting other workloads in the meanwhile. Sometimes |
| 24 | +this effect may be obvious if a query takes a lot of resources and runs for a |
| 25 | +long time, other times it may go unnoticed if a query that could complete in say |
| 26 | +100 milliseconds takes one hundred times longer, 10 seconds, but the users put |
| 27 | +up with it without reporting to you. |
| 28 | + |
| 29 | +If a query would require more heap memory than the interested nodes |
| 30 | +have available the query will fail with a particular type of error message that |
| 31 | +we call a ``CircuitBreakerException``. This is a fail-fast approach as we |
| 32 | +quickly see there is an issue and can optimize the query to get the best |
| 33 | +performance, without impacting other workloads. |
| 34 | + |
| 35 | +Please take a look at :ref:`Query Optimization 101 <performance-optimization>` |
| 36 | +for strategies to optimize your queries when you encounter this situation. |
| 37 | + |
| 38 | +.. _reading-lots-of-records: |
| 39 | + |
| 40 | +************************* |
| 41 | + Reading lots of records |
| 42 | +************************* |
| 43 | + |
| 44 | +When the HTTP endpoint is used CrateDB will prepare the entire response in |
| 45 | +memory before sending it to the client. |
| 46 | + |
| 47 | +When the PostgreSQL protocol is used CrateDB attempts to stream the results but |
| 48 | +in many cases it still needs to bring all rows to the query handler node first. |
| 49 | + |
| 50 | +So we should always limit how many rows we request at a time, see `Fetching |
| 51 | +large result sets from CrateDB`_. |
| 52 | + |
| 53 | +.. _number-of=shards: |
| 54 | + |
| 55 | +****************** |
| 56 | + Number of shards |
| 57 | +****************** |
| 58 | + |
| 59 | +In CrateDB data in tables and partitions is distributed in storage units that we |
| 60 | +call shards. |
| 61 | + |
| 62 | +If we do not specify how many shards we want for a table/partition CrateDB will |
| 63 | +derive a default from the number of nodes. |
| 64 | + |
| 65 | +CrateDB also has replicas of data and this results in additional shards in the |
| 66 | +cluster. |
| 67 | + |
| 68 | +Having too many or too few shards has performance implications, so it is very |
| 69 | +important to get familiar with the :ref:`Sharding Performance Guide |
| 70 | +<sharding_guide>`. |
| 71 | + |
| 72 | +But in particular, there is a soft limit of 1000 shards per node; so table |
| 73 | +schemas, partitioning strategy, and number of nodes need to be planned to stay |
| 74 | +well below this limit, one strategy can be to aim for a configuration where even |
| 75 | +if one node in the cluster is lost the remaining nodes would still have less |
| 76 | +than 1000 shards. |
| 77 | + |
| 78 | +If this was not considered when initially defining the tables we have the |
| 79 | +following considerations: |
| 80 | + |
| 81 | +- changing the partitioning strategy requires creating a new table and copying |
| 82 | + over the data |
| 83 | +- the easiest way to change the number of shards on a partitioned table is to |
| 84 | + do it for new shards only with the ``ALTER TABLE ONLY`` command |
| 85 | +- see also `Changing the number of shards`_ |
| 86 | + |
| 87 | +.. _amount-of-indexed-columns: |
| 88 | + |
| 89 | +************************************* |
| 90 | + Number of indexed fields in OBJECTs |
| 91 | +************************************* |
| 92 | + |
| 93 | +``OBJECT`` columns are ``DYNAMIC`` by default and CrateDB indexes all their |
| 94 | +fields, providing excellent query performance without requiring manual indexing. |
| 95 | +However, excessive indexing can impact storage, write speed, and resource |
| 96 | +utilization. |
| 97 | + |
| 98 | +- All fields in OBJECTs are automatically indexed when inserted. |
| 99 | +- CrateDB optimizes indexing using Lucene-based columnar storage. |
| 100 | +- A soft limit of 1,000 total indexed columns and OBJECT fields per table |
| 101 | + exists. |
| 102 | +- Going beyond this limit may impact performance. |
| 103 | + |
| 104 | +In cases with many fields and columns, it is advised to determine if some |
| 105 | +OBJECTs or nested parts of them need to be indexed, and use the `ignored column |
| 106 | +policy`_ where applicable. |
| 107 | + |
| 108 | +.. _section-joins: |
| 109 | + |
| 110 | +******* |
| 111 | + JOINs |
| 112 | +******* |
| 113 | + |
| 114 | +CrateDB is a lot better at JOINs than many of our competitors and is getting |
| 115 | +better at every release, but JOINs in distributed databases are tricky to |
| 116 | +optimize, so in many cases queries involving JOINs may need a bit of tweaking. |
| 117 | + |
| 118 | +See `Using common table expressions to speed up queries`_ |
| 119 | + |
| 120 | +.. _changing the number of shards: https://cratedb.com/docs/crate/reference/en/latest/general/ddl/alter-table.html#alter-shard-number |
| 121 | + |
| 122 | +.. _fetching large result sets from cratedb: https://community.cratedb.com/t/fetching-large-result-sets-from-cratedb/1270 |
| 123 | + |
| 124 | +.. _ignored column policy: https://cratedb.com/docs/crate/reference/en/latest/general/ddl/data-types.html#ignored |
| 125 | + |
| 126 | +.. _using common table expressions to speed up queries: https://community.cratedb.com/t/using-common-table-expressions-to-speed-up-queries/1719 |
0 commit comments