Skip to content

feat: add support s3 protocol #315

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,30 @@ nfs_setup_protocol = false
nfs_setup_protocol = true
```

## S3 Protocol Gateways
We support creating S3 protocol gateways that will be mounted automatically to the cluster.
<br>In order to create you need to provide the number of protocol gateways instances you want (by default the number is 0),

*The amount of S3 protocol gateways should be at least 3.*
</br>
for example:
```hcl
s3_protocol_gateways_number = 3
```
This will automatically create 3 instances.
<br>In addition you can supply these optional variables:
```hcl
s3_protocol_gateway_instance_type = "Standard_D8_v5"
s3_protocol_gateway_nics_num = 2
s3_protocol_gateway_disk_size = 48
s3_protocol_gateway_frontend_cores_num = 1
```

<br>In order to create stateless clients, need to set variable:
```hcl
s3_setup_protocol = true
```

## SMB Protocol Gateways
We support creating SMB protocol gateways that will be mounted automatically to the cluster.
<br>In order to create you need to provide the number of protocol gateways instances you want (by default the number is 0),
Expand Down Expand Up @@ -294,6 +318,7 @@ proxy_url = VALUE
| <a name="module_network"></a> [network](#module\_network) | ./modules/network | n/a |
| <a name="module_nfs_protocol_gateways"></a> [nfs\_protocol\_gateways](#module\_nfs\_protocol\_gateways) | ./modules/protocol_gateways | n/a |
| <a name="module_peering"></a> [peering](#module\_peering) | ./modules/peering_vnets | n/a |
| <a name="module_s3_protocol_gateways"></a> [s3\_protocol\_gateways](#module\_s3\_protocol\_gateways) | ./modules/protocol_gateways | n/a |
| <a name="module_smb_protocol_gateways"></a> [smb\_protocol\_gateways](#module\_smb\_protocol\_gateways) | ./modules/protocol_gateways | n/a |

## Resources
Expand Down Expand Up @@ -423,6 +448,11 @@ proxy_url = VALUE
| <a name="input_protocol_gateways_identity_name"></a> [protocol\_gateways\_identity\_name](#input\_protocol\_gateways\_identity\_name) | The user assigned identity name for the protocol gateways instances (if empty - new one is created). | `string` | `""` | no |
| <a name="input_proxy_url"></a> [proxy\_url](#input\_proxy\_url) | Weka home proxy url | `string` | `""` | no |
| <a name="input_rg_name"></a> [rg\_name](#input\_rg\_name) | A predefined resource group in the Azure subscription. | `string` | n/a | yes |
| <a name="input_s3_protocol_gateway_disk_size"></a> [s3\_protocol\_gateway\_disk\_size](#input\_s3\_protocol\_gateway\_disk\_size) | The protocol gateways' default disk size. | `number` | `48` | no |
| <a name="input_s3_protocol_gateway_fe_cores_num"></a> [s3\_protocol\_gateway\_fe\_cores\_num](#input\_s3\_protocol\_gateway\_fe\_cores\_num) | The number of frontend cores on single protocol gateway machine. | `number` | `1` | no |
| <a name="input_s3_protocol_gateway_instance_type"></a> [s3\_protocol\_gateway\_instance\_type](#input\_s3\_protocol\_gateway\_instance\_type) | The protocol gateways' virtual machine type (sku) to deploy. | `string` | `"Standard_D8_v5"` | no |
| <a name="input_s3_protocol_gateways_number"></a> [s3\_protocol\_gateways\_number](#input\_s3\_protocol\_gateways\_number) | The number of protocol gateway virtual machines to deploy. | `number` | `0` | no |
| <a name="input_s3_setup_protocol"></a> [s3\_setup\_protocol](#input\_s3\_setup\_protocol) | Config protocol, default if false | `bool` | `false` | no |
| <a name="input_script_post_cluster_creation"></a> [script\_post\_cluster\_creation](#input\_script\_post\_cluster\_creation) | Script to run after cluster creation | `string` | `""` | no |
| <a name="input_script_pre_start_io"></a> [script\_pre\_start\_io](#input\_script\_pre\_start\_io) | Script to run before starting IO | `string` | `""` | no |
| <a name="input_set_dedicated_fe_container"></a> [set\_dedicated\_fe\_container](#input\_set\_dedicated\_fe\_container) | Create cluster with FE containers | `bool` | `true` | no |
Expand Down Expand Up @@ -480,6 +510,7 @@ proxy_url = VALUE
| <a name="output_nfs_protocol_gateway_ips"></a> [nfs\_protocol\_gateway\_ips](#output\_nfs\_protocol\_gateway\_ips) | If 'private\_network' is set to false, it will output nfs protocol gateway public ips, otherwise private ips. |
| <a name="output_ppg_id"></a> [ppg\_id](#output\_ppg\_id) | Placement proximity group id |
| <a name="output_private_ssh_key"></a> [private\_ssh\_key](#output\_private\_ssh\_key) | If 'ssh\_public\_key' is set to null and no file provided, it will output the private ssh key location. |
| <a name="output_s3_protocol_gateway_ips"></a> [s3\_protocol\_gateway\_ips](#output\_s3\_protocol\_gateway\_ips) | If 'private\_network' is set to false, it will output smb protocol gateway public ips, otherwise private ips. |
| <a name="output_sg_id"></a> [sg\_id](#output\_sg\_id) | Security group id |
| <a name="output_smb_protocol_gateway_ips"></a> [smb\_protocol\_gateway\_ips](#output\_smb\_protocol\_gateway\_ips) | If 'private\_network' is set to false, it will output smb protocol gateway public ips, otherwise private ips. |
| <a name="output_subnet_name"></a> [subnet\_name](#output\_subnet\_name) | Subnet name |
Expand Down
26 changes: 20 additions & 6 deletions modules/protocol_gateways/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -147,13 +147,27 @@ locals {
key_vault_url = var.key_vault_url
})

protocol_script = var.protocol == "NFS" ? local.setup_nfs_protocol_script : local.setup_smb_protocol_script
setup_s3_protocol_script = file("${path.module}/setup_s3.sh")

setup_validation_script = templatefile("${path.module}/setup_validation.sh", {
gateways_number = var.gateways_number
gateways_name = var.gateways_name
report_function_url = format("https://%s.azurewebsites.net/api/report", var.function_app_name)
vault_function_app_key_name = var.vault_function_app_key_name
key_vault_url = var.key_vault_url
protocol = var.protocol
smbw_enabled = var.smbw_enabled
})

setup_protocol_script = var.setup_protocol ? local.protocol_script : ""
smb_protocol_script = var.protocol == "SMB" ? local.setup_smb_protocol_script : ""
s3_protocol_script = var.protocol == "S3" ? local.setup_s3_protocol_script : ""
nfs_protocol_script = var.protocol == "NFS" ? local.setup_nfs_protocol_script : ""
validation_script = var.setup_protocol && (var.protocol == "SMB" || var.protocol == "S3") ? local.setup_validation_script : ""

setup_protocol_script = var.setup_protocol ? compact([local.nfs_protocol_script, local.smb_protocol_script, local.s3_protocol_script]) : []

custom_data_parts = concat([local.init_script, local.deploy_script, local.validation_script], local.setup_protocol_script)

custom_data_parts = [
local.init_script, local.deploy_script, local.setup_protocol_script
]
custom_data = join("\n", local.custom_data_parts)

gw_identity_id = var.vm_identity_name == "" ? azurerm_user_assigned_identity.this[0].id : data.azurerm_user_assigned_identity.this[0].id
Expand Down Expand Up @@ -199,7 +213,7 @@ resource "azurerm_linux_virtual_machine" "this" {
lifecycle {
ignore_changes = [tags, custom_data]
precondition {
condition = var.protocol == "NFS" ? var.gateways_number >= 1 : var.gateways_number >= 3 && var.gateways_number <= 8
condition = var.protocol == "NFS" || var.protocol == "S3" ? var.gateways_number >= 1 : var.gateways_number >= 3 && var.gateways_number <= 8
error_message = "The amount of protocol gateways should be at least 1 for NFS and at least 3 and at most 8 for SMB."
}
precondition {
Expand Down
62 changes: 62 additions & 0 deletions modules/protocol_gateways/setup_s3.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
echo "$(date -u): running s3 script"

# wait for weka s3 cluster to be ready in case it was created by another host
not_ready_hosts=$(weka s3 cluster status | grep 'Not Ready' | wc -l)
all_hosts=$(weka s3 cluster status | grep 'Host' | wc -l)

function check_cluster_status() {
if (( all_hosts > 0 && not_ready_hosts == 0 && all_hosts == cluster_size )); then
echo "$(date -u): s3 cluster is already created"
weka s3 cluster status
exit 0
fi
}

echo "$(date -u): weka S3 cluster does not exist, creating it"
# get all protocol gateways frontend container ids separated by comma
all_container_ids_str=$(echo "$all_container_ids" | tr '\n' ',' | sed 's/,$//')

function retry_create_s3_cluster {
retry_max=60
retry_sleep=30
count=$retry_max
msg="S3 cluster is created"
check_cluster_status
while [ $count -gt 0 ]; do
weka s3 cluster create $filesystem_name .config_fs --container $all_container_ids_str --port 9000 && break
count=$(($count - 1))
echo "Retrying create S3 cluster in $retry_sleep seconds..."
report "{\"hostname\": \"$HOSTNAME\", \"type\": \"progress\", \"message\": \"$msg\"}"
sleep $retry_sleep
check_cluster_status && break
done
[ $count -eq 0 ] && {
echo "create S3 cluster command failed after $retry_max attempts"
report "{\"hostname\": \"$HOSTNAME\", \"type\": \"error\", \"message\": \"create S3 cluster command failed after $retry_max attempts\"}"
echo "$(date -u): create S3 cluster failed"
return 1
}
return 0
}

if [[ $(weka s3 cluster status) ]]; then
check_cluster_status
if (( all_hosts > 0 && not_ready_hosts == 0 && all_hosts < cluster_size )); then
echo "$(date -u): S3 cluster already exists, adding current container to it"
weka s3 cluster containers add $container_id
sleep 10s
weka s3 cluster status
exit 0
fi
else
echo "$(date -u): weka S3 cluster does not exist, creating it"
retry_create_s3_cluster
echo "$(date -u): Successfully create S3 cluster..."
weka s3 cluster status
weka s3 cluster containers list
echo "$(date -u): S3 cluster is created successfully"
fi

weka s3 cluster status

echo "$(date -u): done running S3 script successfully"
132 changes: 1 addition & 131 deletions modules/protocol_gateways/setup_smb.sh
Original file line number Diff line number Diff line change
@@ -1,136 +1,6 @@
echo "$(date -u): running smb script"
weka local ps

# get token for key vault access
access_token=$(curl 'http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fvault.azure.net' -H Metadata:true | jq -r '.access_token')
# get key vault secret
function_app_key=$(curl "${key_vault_url}secrets/${vault_function_app_key_name}?api-version=2016-10-01" -H "Authorization: Bearer $access_token" | jq -r '.value')

function report {
local json_data=$1
curl ${report_function_url}?code="$function_app_key" -H 'Content-Type:application/json' -d "$json_data"
}

function wait_for_weka_fs(){
filesystem_name="default"
max_retries=30 # 30 * 10 = 5 minutes
for (( i=0; i < max_retries; i++ )); do
if [ "$(weka fs | grep -c $filesystem_name)" -ge 1 ]; then
echo "$(date -u): weka filesystem $filesystem_name is up"
break
fi
echo "$(date -u): waiting for weka filesystem $filesystem_name to be up"
sleep 10
done
if (( i > max_retries )); then
err_msg="timeout: weka filesystem $filesystem_name is not up after $max_retries attempts."
echo "$(date -u): $err_msg"
report "{\"hostname\": \"$HOSTNAME\", \"type\": \"error\", \"message\": \"$err_msg\"}"
return 1
fi
}

function create_config_fs(){
filesystem_name=".config_fs"
size="10GB"

if [ "$(weka fs | grep -c $filesystem_name)" -ge 1 ]; then
echo "$(date -u): weka filesystem $filesystem_name exists"
return 0
fi

echo "$(date -u): trying to create filesystem $filesystem_name"
output=$(weka fs create $filesystem_name default $size 2>&1)
# possiible outputs:
# FSId: 1 (means success)
# error: The given filesystem ".config_fs" already exists.
# error: Not enough available drive capacity for filesystem. requested "10.00 GB", but only "0 B" are free
if [ $? -eq 0 ]; then
echo "$(date -u): weka filesystem $filesystem_name is created"
return 0
fi

if [[ $output == *"already exists"* ]]; then
echo "$(date -u): weka filesystem $filesystem_name already exists"
break
elif [[ $output == *"Not enough available drive capacity for filesystem"* ]]; then
err_msg="Not enough available drive capacity for filesystem $filesystem_name for size $size"
echo "$(date -u): $err_msg"
report "{\"hostname\": \"$HOSTNAME\", \"type\": \"error\", \"message\": \"$err_msg\"}"
return 1
else
echo "$(date -u): output: $output"
report "{\"hostname\": \"$HOSTNAME\", \"type\": \"error\", \"message\": \"cannot create weka filesystem $filesystem_name\"}"
return 1
fi
}

if [[ ${smbw_enabled} == true ]]; then
wait_for_weka_fs || exit 1
create_config_fs || exit 1
fi

# make sure weka cluster is already up
max_retries=60
for (( i=0; i < max_retries; i++ )); do
if [ $(weka status | grep 'status: OK' | wc -l) -ge 1 ]; then
echo "$(date -u): weka cluster is up"
break
fi
echo "$(date -u): waiting for weka cluster to be up"
sleep 30
done
if (( i > max_retries )); then
err_msg="timeout: weka cluster is not up after $max_retries attempts."
echo "$(date -u): $err_msg"
report "{\"hostname\": \"$HOSTNAME\", \"type\": \"error\", \"message\": \"$err_msg\"}"
exit 1
fi

cluster_size="${gateways_number}"

current_mngmnt_ip=$(weka local resources | grep 'Management IPs' | awk '{print $NF}')
# get container id
for ((i=0; i<20; i++)); do
container_id=$(weka cluster container | grep frontend0 | grep ${gateways_name} | grep $current_mngmnt_ip | grep UP | awk '{print $1}')
if [ -n "$container_id" ]; then
echo "$(date -u): frontend0 container id: $container_id"
report "{\"hostname\": \"$HOSTNAME\", \"type\": \"progress\", \"message\": \"frontend0 container $container_id is up\"}"
break
fi
echo "$(date -u): waiting for frontend0 container to be up"
sleep 5
done

if [ -z "$container_id" ]; then
err_msg="Failed to get the frontend0 container ID."
echo "$(date -u): $err_msg"
report "{\"hostname\": \"$HOSTNAME\", \"type\": \"error\", \"message\": \"$err_msg\"}"
exit 1
fi

# wait for all containers to be ready
max_retries=60
for (( retry=1; retry<=max_retries; retry++ )); do
# get all UP gateway container ids
all_container_ids=$(weka cluster container | grep frontend0 | grep ${gateways_name} | grep UP | awk '{print $1}')
# if number of all_container_ids < cluster_size, do nothing
all_container_ids_number=$(echo "$all_container_ids" | wc -l)
if (( all_container_ids_number < cluster_size )); then
echo "$(date -u): not all containers are ready - do retry $retry of $max_retries"
sleep 20
else
echo "$(date -u): all containers are ready"
break
fi
done

if (( retry > max_retries )); then
err_msg="timeout: not all containers are ready after $max_retries attempts."
echo "$(date -u): $err_msg"
report "{\"hostname\": \"$HOSTNAME\", \"type\": \"error\", \"message\": \"$err_msg\"}"
exit 1
fi
weka local ps

# wait for weka smb cluster to be ready in case it was created by another host
weka smb cluster wait
Expand Down
Loading