Skip to content

NextGenRepl docs #286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Dec 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
150 changes: 81 additions & 69 deletions content/riak/kv/3.2.0/configuring/next-gen-replication.md

Large diffs are not rendered by default.

240 changes: 240 additions & 0 deletions content/riak/kv/3.2.0/configuring/next-gen-replication/fullsync.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,240 @@
---
title_supertext: "Configuring: Next Gen Replication"
title: "FullSync"
description: ""
project: "riak_kv"
project_version: "3.2.0"
lastmod: 2024-09-16T00:00:00-00:00
sitemap:
priority: 0.9
menu:
riak_kv-3.2.0:
name: "FullSync"
identifier: "nextgen_rep_fullsync"
weight: 103
parent: "nextgen_rep"
version_history:
in: "3.2.0+"
toc: true
commercial_offering: false
aliases:
---

[configure tictacaae]: ../../active-anti-entropy/tictac-aae/
[configure nextgenrepl fullsync]: ../fullsync/
[configure nextgenrepl realtime]: ../realtime/
[configure nextgenrepl queuing]: ../queuing/

NextGenRepl's FullSync feature provides a considerable improvement over the legacy fullsync engines. It is faster, more efficient, and more reliable. NextGenRepl is the recommended replication engine to use.

FullSync will ensure that the data in the source cluster is also in sink cluster.

{{% note %}}
NextGenRepl relies on [TicTac AAE](../../active-anti-entropy/tictac-aae/), so this must be enabled.
{{% /note %}}

## Overview

NextGenRepl's FullSync works on an automated schedule whereupon a source cluster node checks for changes with a predefined sink cluster node (or load balancer). It then pushes any changes found to a specific preconfigured queue in the queuing system.

A source node can connect to 1 sink node using an IP address or FQDN to check for differences. This can be the IP or FQDN of a load balancer for the sink cluster. Each source node can have the the same FullSync settings as the other source cluster nodes, or entirely different FullSync settings per node if needed.

A source node will sync data from all nodes in the source cluster.

A source node will run FullSync according to the schedule on that specific source node. The source nodes will co-ordinate to ensure that only one FullSync task runs at a time.

If a source node or sink peer is offline for any reason, Riak will wait until the node is repaired before continuing. You should ensure that sufficient redundancies are in place to ensure uptime. This can be done by having multiple source nodes connecting to the same sink cluster, and by using a load balancer in front of the sink cluster.

The number of different clusters you can FullSync to is defined by the number of Riak KV nodes in the source cluster.

{{% note %}}
Currently all changes listed in this documentation to NextGenRepl must be made by changing the values in the `riak.conf` file.
{{% /note %}}

## Enable

To turn on FullSync replication, a scope of operation (`ttaaefs_scope`) is needed. The default scope is `disabled` which means that FullSync replication is turned off. The scope can be set to:

- `disabled` - FullSync is disabled
- `all` - all buckets are replicated
- `bucket` - only the specified bucket is replicated
- `type` - only buckets of the specified bucket type are replicated

To FullSync replicate all buckets, use the `ttaaefs_scope` of `any`. For example, to FullSync replicate all buckets, set this value:

```
ttaaefs_scope = all
```

To FullSync replicate using a bucket name filter, use the `ttaaefs_scope` of `bucket` and the `ttaaefs_bucketfilter_name` setting. For example, to only FullSync replicate the bucket "my-bucket-name", set these values:

```
ttaaefs_scope = bucket
ttaaefs_bucketfilter_name = my-bucket-name
```

To FullSync replicate all buckets using a bucket type filter, use the `ttaaefs_scope` of `type` and the `ttaaefs_bucketfilter_type` setting. For example, to only FullSync replicate all buckets of bucket type "my-bucket-type", set these values:

```
ttaaefs_scope = type
ttaaefs_bucketfilter_type = my-bucket-type
```

## Queues

FullSync will send all changes from the sink cluster to the queue configured using the `ttaaefs_queuename` setting. The default for this is `q1_ttaaefs`. This can be any queue name, including the same queue name as used by RealTime replication.

For example, to set the FullSync queue name to the default of `q1_ttaaefs`, set `ttaaefs_queuename` like this:

```
ttaaefs_queuename = q1_ttaaefs
```

### Bi-directional FullSync

From Riak KV 3.0.10 onwards, it is possible to have the sink cluster also detect changes from the source cluster (bi-directional FullSync) and queue them on the sink clsuter side. This is configured using the `ttaaefs_queuename_peer` setting. The default for this setting is `disabled`.

For example, to set the sink cluster FullSync queue name to the standard name of `q1_ttaaefs`, set `ttaaefs_queuename_peer` like this:

```
ttaaefs_queuename_peer = q1_ttaaefs
```

## Read and write n_vals

When performing a GET on a Riak object in the source cluster, the FullSync client will read with an `r` value of `ttaaefs_localnval`. When performing a PUT of a Riak object in the sink cluster, the FullSync client will write with an `w` value of `ttaaefs_remotenval`. Both of these default to the standard Riak `n_val` of `3`.

To customise these values, use these settings:

```
ttaaefs_localnval = 3
ttaaefs_remotenval = 3
```

## Connections

Each source cluster node can connect to a single sink cluster node (or a load balancer). This is specificed in the settings of `ttaaefs_peerip`, `ttaaefs_peerport`, and `ttaaefs_peerprotocol`.

- `ttaaefs_peerip` - the IP address or FQDN of the sink cluster node.
- `ttaaefs_peerport` - the port to connect to on the sink cluster node.
- `ttaaefs_peerprotocol` - the protocol to use to talk to the sink cluster node. Use `pb` for the Protocol Buffer API, and use `http` fpr the HTTP API.

For example, to connect to IP `10.2.34.56` on port `8087` using the Protocol Buffer API, these would be the settings to use:

```
ttaaefs_peerip = 10.2.34.56
ttaaefs_peerport = 8087
ttaaefs_peerprotocol = pb
```

For example, to connect to the FQDN `node01.source-cluster-a.mynetwork.com` on port `8098` using the HTTP API, these would be the settings to use:

```
ttaaefs_peerip = node01.source-cluster-a.mynetwork.com
ttaaefs_peerport = 8098
ttaaefs_peerprotocol = http
```

If you need to have TLS encryption and certificate-based authentication then you must exclusively use the Protocol Buffer API (`pb`) for replication.

### TLS encryption

TLS security is configured for replication using the settings of `repl_cacert_filename`, `repl_cert_filename` and `repl_key_filename` which operate in a similar manner to the protocol listener settings.

For example, you could use settings similar to these:

```
repl_cacert_filename = /etc/riak/cacert.pem
repl_cert_filename = /etc/riak/cert.pem
repl_key_filename = /etc/riak/key.pem
```

### Riak Security

If Riak Security is enabled on the sink cluster, then the username for replication can be set with the `repl_username` setting:

```
repl_username = source-cluster-replication-user
```

## Schedule

FullSync uses an automated scheduling tool based on a configurable number of slots in a 24-hour period.

FullSync has the ability to check different time ranges, so recent changes can be checked more often than very old changes.

These time ranges are:

- `ttaaefs_autocheck` - uses logic to decide the best form of FullSync time range to check; this is the default.
- `ttaaefs_allcheck` - checks all keys.
- `ttaaefs_daycheck` - checks keys changed in the last 24 hours.
- `ttaaefs_hourcheck` - checks keys changed in the last hour.
- `ttaaefs_rangecheck` - checks keys since the last successfull check.
- `ttaaefs_nocheck` - skips the check; this is useful for padding the schedule.

Each check is set to an integer, and the FullSync scheduler will distribute the checks evenly over a 24 hours period in proportion to the number of each type of check.

For example, this schedule will run autocheck every hour:

```
ttaaefs_autocheck = 24
ttaaefs_allcheck = 0
ttaaefs_daycheck = 0
ttaaefs_hourcheck = 0
ttaaefs_rangecheck = 0
ttaaefs_nocheck = 0
```

For example, this schedule will run allcheck once per day, daycheck 3 times per day, hour check 8 times per day, and range check 12 times per day (for a total of 24 checks):

```
ttaaefs_autocheck = 0
ttaaefs_allcheck = 1
ttaaefs_daycheck = 3
ttaaefs_hourcheck = 8
ttaaefs_rangecheck = 12
ttaaefs_nocheck = 0
```

### Tuning for autocheck

Autocheck can limit the use of allcheck by setting a window of time in which allcheck can be safely called. This is ideal for scenarios where there is a dip in activity in the source and sink clusters. By default `ttaaefs_allcheck.policy` is set to `always`. It can be set to `never` to not allow autocheck to use allcheck at all, or `window` to restrict the hours in which allcheck can be used.

For example, to stop autocheck from ever using allcheck, use this setting:

```
ttaaefs_allcheck.policy = never
```

To limit the hours autocheck can use allcheck to between 10pm and 6am, use these settings:

```
ttaaefs_allcheck.policy = window
ttaaefs_allcheck.window.start = 22
ttaaefs_allcheck.window.end = 6
```

## Tuning

### Results size

When performing a comparison between clusters, the keys are compared in chunks called segments. The number of chunks checked at one time can be set via the `ttaaefs_maxresults` setting. This is 32 chunks by default. To speed up comparisons but at the cost of more comparisons, reduce this value. If you intend to use autocheck or rangecheck in the scheduler, then this value can be reduced to as low as 16 and will apply to daycheck and hourcheck.

As a performance boost for rangecheck, `ttaaefs_rangeboost` will increase the number of chunks checked but only for rangecheck. This is a multipler, so the number of chunks checked will be `ttaaefs_maxresults` * `ttaaefs_rangeboost`.

For example, this will limit the daycheck and hourcheck to 32 chunks, but allow rangecheck to (32 * 16 =) 512 chunks:

```
ttaaefs_maxresults = 32
ttaaefs_rangeboost = 16
```

### Cluster slice

`ttaaefs_cluster_slice` helps space out queries between clusters if you have more than 2 clusters performing FullSync to the same sink cluster. This will stop two clusters with identical schedules from mutual full-syncs at the same time. Each cluster may be configured with `ttaaefs_cluster_slice` number between 1 and 4.

For example, this will set the `ttaaefs_cluster_slice` to `1`.

```
ttaaefs_cluster_slice = 1
```
101 changes: 101 additions & 0 deletions content/riak/kv/3.2.0/configuring/next-gen-replication/queuing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
---
title_supertext: "Configuring: Next Gen Replication"
title: "Queuing System"
description: ""
project: "riak_kv"
project_version: "3.2.0"
lastmod: 2024-09-16T00:00:00-00:00
sitemap:
priority: 0.9
menu:
riak_kv-3.2.0:
name: "Queues"
identifier: "nextgen_rep_queuing"
weight: 101
parent: "nextgen_rep"
version_history:
in: "3.2.0+"
toc: true
commercial_offering: false
aliases:
---

[configure tictacaae]: ../../active-anti-entropy/tictac-aae/
[configure nextgenrepl fullsync]: ../fullsync/
[configure nextgenrepl realtime]: ../realtime/
[configure nextgenrepl queuing]: ../queuing/
[configure nextgenrepl queue filters]: ../queuing/#queue-filters
[configure nextgenrepl reference]: ../reference/
[tictacaae folds]: ../../../using/cluster-operations/tictac-aae-fold/

NextGenRepl's Queuing feature is the heart of the replication engine. It stores all changes to be replicated using multiple configuarble queues. It's limits and size can be configured for your specific NextGenRepl use case.

## Overview

You can have as many queues are you like with different filters on each queue.

Each queue has 3 levels of priority:

1. RealTime changes - these are normally copies of the Riak object, but can be references to the Riak object if the queue gets too large. These are populated automatically when a PUT (which includes inserts, updartes and deletes) occurs.
2. FullSync changes - these are references to Riak objects and are populated on the source cluster when the FullSync manager finds differenecs betweem the source cluster and the sink cluster.
3. Admin changes - these are references to Riak objects and are populated when the administrator performs actions via the [TicTac AAE Fold][tictacaae folds] commands.

The sink side replication manager will connect to its list of replication sources and replicate objects using these priorities - so RealTime changes first, FullSync differences second, and finally the admin changes.

The queuing system is always active.

{{% note %}}
Currently all changes listed in this documentation to NextGenRepl must be made by changing the values in the `riak.conf` file.
{{% /note %}}

## Maximum queue length

The default limit of the number of objects stored in a specific queue is `300000` Riak objects. Once this limit is reached, any new items will not be added to the queue.

This can value can be configured on a per-node basis by setting the `replrtq_srcqueuelimit` value to a posiitive integer.

## Maximum number and size of objects

When items are added to a queue, the normal mode is to add them as a reference to the Riak object and not the Riak object itself. However, as a performance boost for RealTime updates, a queue will contain a copy of the actual Riak object instead of a reference.

Having too many of these will increase memory load considerably, so there are two configuration values to limit the number of Riak objects and the maximum allowed size of each Riak object in the queue.

The maximum number of these objects can be set through `replrtq_srcobjectlimit` as a positive interger. The default is `1000` Riak objects.

The maximum size of these objects can be set through `replrtq_srcobjectsize` as positive interger. The default is `200KB`.

## Queue filters

A queue can be filtered in various ways using `replrtq_srcqueue`. As each sink node can only pull from one queue, this needs to be carefully planned. To replicate multiple queues to a specific sink cluster, then different sink cluster nodes will need to be configured to use different queue names.

The filters are:

- `any`: every RealTime, FullSync and AAE Fold update will be replicated.
- `block_rtq`: RealTime updates are blocked - only FullSync and AAE Fold updates will be replicated.
- `bucketname`: every RealTime, FullSync and AAE Fold update in any bucket matching this name (regardless of bucket type) will be replicated.
- `bucketprefix`: every RealTime, FullSync and AAE Fold update in any bucket name where the name starts with the configured ascii string (regardless of bucket type) will be replicated.
- `buckettype`: every RealTime, FullSync and AAE Fold update in any bucket of the given bucket type will be replicated.

For example, you could set `replrtq_srcqueue` to:

- `q1_ttaaefs:block_rtq` to have a queue called `q1_ttaaefs` that only has FullSync and AAE Fold updates. This is the default.
- `for_tokyo_cluster:bucketprefix:users_` to have a queue called `for_tokyo_cluster` that will replicate all changes to any bucket with the prefix `users_`.
- `backup_cluster:buckettype:financialtransactions` to have a queue called `backup_cluster` that will replicate all changes to any bucket with the bucket type of `financialtransactions`.

## Multiple queues

You can specific multiple queues by deliminating with the `|` character.

For example, this will create two queues - one with realtime updates called `allupdates` and one without realtime updates called `offsite_backup`:

```
replrtq_srcqueue = allupdates:any|offsite_backup:block_rtq
```

## Transmission compression

If bandwidth is a concern, objects can be compressed when they are requested by a sink peer. This can be turned on (`enabled`) and off (`disabled`) using `replrtq_compressonwire`.

## Configuration reference

Please check the [NextGenRepl configuration reference][configure nextgenrepl reference] for a complete reference.
Loading