Rgw Multisite异地多活的方案实践
一. multisite 的结构
zone: 对应于一个独立的集群,由一组 RGW 对外提供服务。
zonegroup: 每个 zonegroup 可以对应多个zone,zone 之间同步数据和元数据;
realm:每个realm都是独立的命名空间,可以包含多个 zonegroup,zonegroup 之间同步元数据
整体结构如下,引用redhat的文档
- master zone 和 secondly zone 有两种模式:active-active 和 active-passive。active-active 模式下master和slave 都可以读写,数据会自动同步, active-passive 下,只能在master 写入。
二. 搭建multisite 集群
这里尝试搭建,1个relam ,1个 zonegroup, 2个zone(master slave)的多活架构。
整个集群架构如下
realm | zonegroup | zone | rgw endpoint | 主机名 |
---|---|---|---|---|
movies | gz | master: gz-zone1 | http://192.168.106.12:7480 | df-vm-02 |
movies | gz | slave: gz-zone2 | http://192.168.106.15:7480 | df-vm-05 |
测试客户端
主机 | 客户端程序 |
---|---|
df-vm-01 | s3cmd |
1. 在master zone 集群中创建realm, zonegroup 和 master zone
a) 创建realm
[root@df-vm-02 ~]# radosgw-admin realm create --rgw-realm=movies --default { "id": "2d40b603-356c-4730-a5f8-336260e3ed4c", "name": "movies", "current_period": "aecbda84-ee4f-4ffe-b1aa-efabb9d342c3", "epoch": 2 }
需要设置为 default
b) 创建master zonegroup
[root@df-vm-02 ~]# radosgw-admin zonegroup create --rgw-zonegroup=gz --endpoints=http://192.168.106.12:7480 --rgw-realm=movies --master --default { "id": "3f4b9246-be0d-4e4e-913a-9edf77d96f8d", "name": "gz", "api_name": "gz", "is_master": "true", "endpoints": [ "http://192.168.106.12:7480" ], "hostnames": [], "hostnames_s3website": [], "master_zone": "71e02920-7412-44ea-adba-5426c9058397", "zones": [], "placement_targets": [ { "name": "default-placement", "tags": [] } ], "default_placement": "default-placement", "realm_id": "2d40b603-356c-4730-a5f8-336260e3ed4c" }
需要设置为default,master 并指定realm的id
c) 创建master zone
[root@df-vm-02 ~]# radosgw-admin zone create --rgw-zonegroup=gz --rgw-zone=gz-zone1 --master --default --endpoints=https://192.168.106.12
查看zonegroup 的配置
[root@df-vm-02 ~]# radosgw-admin zonegroup get --rgw-zonegroup=gz { "id": "3f4b9246-be0d-4e4e-913a-9edf77d96f8d", "name": "gz", "api_name": "gz", "is_master": "true", "endpoints": [ "http://192.168.106.12:7480" ], "hostnames": [], "hostnames_s3website": [], "master_zone": "71e02920-7412-44ea-adba-5426c9058397", "zones": [ { "id": "71e02920-7412-44ea-adba-5426c9058397", "name": "gz-zone1", "endpoints": [ "http://192.168.106.12:7480" ], "log_meta": "false", "log_data": "true", "bucket_index_max_shards": 0, "read_only": "false", "tier_type": "", "sync_from_all": "true", "sync_from": [] } ], "placement_targets": [ { "name": "default-placement", "tags": [] } ], "default_placement": "default-placement", "realm_id": "2d40b603-356c-4730-a5f8-336260e3ed4c" }
可以看到gz zonegroup是master,gz-zone1 是master zone
d) 删除默认的 default zonegroup 和default zone,并且删除默认生成的pool
[root@df-vm-02 ~]# radosgw-admin zonegroup remove --rgw-zonegroup=default --rgw-zone=default [root@df-vm-02 ~]# radosgw-admin period update --commit [root@df-vm-02 ~]# radosgw-admin zone delete --rgw-zone=default [root@df-vm-02 ~]# radosgw-admin period update --commit [root@df-vm-02 ~]# radosgw-admin zonegroup delete --rgw-zonegroup=default [root@df-vm-02 ~]# radosgw-admin period update --commit
radosgw-admin period update –commit 提交period
e) 删除默认生成的pool
[root@df-vm-02 ~]# pools=`rados lspools | grep default`; for pool in ${pools[@]}; do rados rmpool $pool $pool --yes-i-really-really-mean-it; done
这里将rgw启动的时候生成的默认带 default的pool 全部删掉,强烈建议自己检查default的pool是否都是不需要。
f) 创建同步使用的system 用户
[root@df-vm-02 ~]# radosgw-admin user create --uid="sync-admin" --display-name="sync-admin" --system { "user_id": "sync-admin", "display_name": "sync-admin", "email": "", "suspended": 0, "max_buckets": 1000, "auid": 0, "subusers": [], "keys": [ { "user": "sync-admin", "access_key": "DEMZT6Y26CCAIU9GMVWQ", "secret_key": "EkillNdrDiCWcbff7qUKMm0LH1TWJ9NrNCZcAzOB" } ], "swift_keys": [], "caps": [], "op_mask": "read, write, delete", "system": "true", "default_placement": "", "placement_tags": [], "bucket_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 }, "user_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 }, "temp_url_keys": [], "type": "rgw" }
这个用户的权限非常大,除了用作同步外,其他操作建议新建用户
g) 更新一下ceph.conf rgw的配置
[client.rgw.df-vm-02] rgw_frontends = "civetweb port=7480 num_threads=20" rgw_zone = gz-zone1
2. 创建一个secondly zone
这里有个很重要的事情,就是元数据类的操作必须在master zone中执行,例如创建bucket 或者 用户,如果在secondly zone 执行,会被重定向到master zone,如果master zone 故障,操作会失败
a) 从master zone 拉取realm 配置
[root@df-vm-05 ~]# radosgw-admin realm pull --url=http://192.168.106.12:7480 --access-key=DEMZT6Y26CCAIU9GMVWQ --secret=EkillNdrDiCWcbff7qUKMm0LH1TWJ9NrNCZcAzOB --rgw-realm=movies
这里的access 跟 secret 就是同步用户的key
设置为默认的realm
[root@df-vm-05 ~]# radosgw-admin realm default --rgw-realm=movies
b) 从master zone 拉取period 配置
[root@df-vm-05 ~]# radosgw-admin period pull --url=http://192.168.106.12:7480 --access-key=DEMZT6Y26CCAIU9GMVWQ --secret=EkillNdrDiCWcbff7qUKMm0LH1TWJ9NrNCZcAzOB --rgw-realm=movies
c) 创建secondly zone
[root@df-vm-05 ~]# radosgw-admin zone create --rgw-zonegroup=gz --rgw-zone=gz-zone2 --access-key=DEMZT6Y26CCAIU9GMVWQ --secret=EkillNdrDiCWcbff7qUKMm0LH1TWJ9NrNCZcAzOB --endpoints=http://192.168.106.15:7480
注意,这里不需要指定–default 和 –master
d) 删掉 secondly zone上面默认创建的default zone 还有pool,参考master zone
e) 更新period
[root@df-vm-05 ~]# radosgw-admin period update --commit
f) 更新rgw配置,并启动rgw
[client.rgw.df-vm-05] rgw_frontends = "civetweb port=7480 num_threads=20" rgw_zone = gz-zone2
g) 现在看看zonegroup的配置
[root@df-vm-02 ~]# radosgw-admin zonegroup get --rgw-zonegroup=gz { "id": "3f4b9246-be0d-4e4e-913a-9edf77d96f8d", "name": "gz", "api_name": "gz", "is_master": "true", "endpoints": [ "http://192.168.106.12:7480" ], "hostnames": [], "hostnames_s3website": [], "master_zone": "71e02920-7412-44ea-adba-5426c9058397", "zones": [ { "id": "1d8df413-c398-4e2c-9646-a92f603d1d8e", "name": "gz-zone2", "endpoints": [ "http://192.168.106.15:7480" ], "log_meta": "false", "log_data": "true", "bucket_index_max_shards": 0, "read_only": "false", "tier_type": "", "sync_from_all": "true", "sync_from": [] }, { "id": "71e02920-7412-44ea-adba-5426c9058397", "name": "gz-zone1", "endpoints": [ "http://192.168.106.12:7480" ], "log_meta": "false", "log_data": "true", "bucket_index_max_shards": 0, "read_only": "false", "tier_type": "", "sync_from_all": "true", "sync_from": [] } ], "placement_targets": [ { "name": "default-placement", "tags": [] } ], "default_placement": "default-placement", "realm_id": "2d40b603-356c-4730-a5f8-336260e3ed4c" }
可以看到zone2 的信息会同步到master zone1
3.查看sync的状态
master zone 同步状态
[root@df-vm-02 ~]# radosgw-admin sync status realm 2d40b603-356c-4730-a5f8-336260e3ed4c (movies) zonegroup 3f4b9246-be0d-4e4e-913a-9edf77d96f8d (gz) zone 71e02920-7412-44ea-adba-5426c9058397 (gz-zone1) metadata sync no sync (zone is master) 2018-12-10 11:22:57.146563 7f56867fc700 0 WARNING: curl operation timed out, network average transfer speed less than 1024 Bytes per second during 30 seconds. 2018-12-10 11:22:57.146589 7f56867fc700 0 WARNING: curl operation timed out, network average transfer speed less than 1024 Bytes per second during 30 seconds. 2018-12-10 11:22:57.146595 7f56867fc700 0 WARNING: curl operation timed out, network average transfer speed less than 1024 Bytes per second during 30 seconds. 2018-12-10 11:22:57.146599 7f56867fc700 0 WARNING: curl operation timed out, network average transfer speed less than 1024 Bytes per second during 30 seconds. 2018-12-10 11:22:57.146604 7f56867fc700 0 WARNING: curl operation timed out, network average transfer speed less than 1024 Bytes per second during 30 seconds. 2018-12-10 11:22:57.146609 7f56867fc700 0 WARNING: curl operation timed out, network average transfer speed less than 1024 Bytes per second during 30 seconds. 2018-12-10 11:22:57.146613 7f56867fc700 0 WARNING: curl operation timed out, network average transfer speed less than 1024 Bytes per second during 30 seconds. 2018-12-10 11:22:57.146617 7f56867fc700 0 WARNING: curl operation timed out, network average transfer speed less than 1024 Bytes per second during 30 seconds. 2018-12-10 11:22:57.146620 7f56867fc700 0 WARNING: curl operation timed out, network average transfer speed less than 1024 Bytes per second during 30 seconds. data sync source: 1d8df413-c398-4e2c-9646-a92f603d1d8e (gz-zone2) syncing full sync: 0/128 shards incremental sync: 128/128 shards
secondly zone 同步状态
[root@df-vm-05 ~]# radosgw-admin sync status realm 2d40b603-356c-4730-a5f8-336260e3ed4c (movies) zonegroup 3f4b9246-be0d-4e4e-913a-9edf77d96f8d (gz) zone 1d8df413-c398-4e2c-9646-a92f603d1d8e (gz-zone2) 2018-12-10 11:57:15.367713 7f996d95edc0 0 meta sync: ERROR: failed to fetch mdlog info metadata sync syncing full sync: 0/64 shards failed to fetch local sync status: (5) Input/output error 2018-12-10 11:57:58.402286 7f9921ffb700 0 WARNING: curl operation timed out, network average transfer speed less than 1024 Bytes per second during 30 seconds. 2018-12-10 11:57:58.402376 7f9921ffb700 0 WARNING: curl operation timed out, network average transfer speed less than 1024 Bytes per second during 30 seconds. 2018-12-10 11:57:58.402444 7f9921ffb700 0 WARNING: curl operation timed out, network average transfer speed less than 1024 Bytes per second during 30 seconds. 2018-12-10 11:57:58.402509 7f9921ffb700 0 WARNING: curl operation timed out, network average transfer speed less than 1024 Bytes per second during 30 seconds. 2018-12-10 11:57:58.404057 7f9921ffb700 0 WARNING: curl operation timed out, network average transfer speed less than 1024 Bytes per second during 30 seconds. 2018-12-10 11:57:58.404077 7f9921ffb700 0 WARNING: curl operation timed out, network average transfer speed less than 1024 Bytes per second during 30 seconds. 2018-12-10 11:57:58.404227 7f9921ffb700 0 WARNING: curl operation timed out, network average transfer speed less than 1024 Bytes per second during 30 seconds. data sync source: 71e02920-7412-44ea-adba-5426c9058397 (gz-zone1) syncing full sync: 0/128 shards incremental sync: 128/128 shards
你会发现这里有很多error,日志中也会出现mdlog, datalog not found的问题,这个时候不要慌,这是因为, 没有上传数据的时候,mdlog和datalog有很多shard 是空的,只要上传多一部分数据就可以
- 在secondly zone 上传100M的测试文件,为什么在secondly上传是为了测试active-active是否起作用
// 在master zone 创建一个新的用户 [root@df-vm-02 ~]# radosgw-admin user create --uid=tupu-user1 --display-name="tupu-user1" // 使用s3cmd 在master zone 创建一个新的bucket [root@df-vm-01 ~]# s3cmd mb s3://test1 // 生成100M文件 [root@df-vm-01 ~]# dd if=/dev/zero of=test.100M.gz bs=4M count=25 25+0 records in 25+0 records out 104857600 bytes (105 MB) copied, 0.0628717 s, 1.7 GB/s // 上传文件 [root@df-vm-01 ~]# s3cmd put test.100M.gz s3://test1 upload: 'test.100M.gz' -> 's3://test1/test.100M.gz' [part 1 of 7, 15MB] [1 of 1] 15728640 of 15728640 100% in 0s 25.63 MB/s done upload: 'test.100M.gz' -> 's3://test1/test.100M.gz' [part 2 of 7, 15MB] [1 of 1] 15728640 of 15728640 100% in 0s 23.67 MB/s done upload: 'test.100M.gz' -> 's3://test1/test.100M.gz' [part 3 of 7, 15MB] [1 of 1] 15728640 of 15728640 100% in 0s 24.48 MB/s done upload: 'test.100M.gz' -> 's3://test1/test.100M.gz' [part 4 of 7, 15MB] [1 of 1] 15728640 of 15728640 100% in 0s 28.77 MB/s done upload: 'test.100M.gz' -> 's3://test1/test.100M.gz' [part 5 of 7, 15MB] [1 of 1] 15728640 of 15728640 100% in 0s 27.22 MB/s done upload: 'test.100M.gz' -> 's3://test1/test.100M.gz' [part 6 of 7, 15MB] [1 of 1] 15728640 of 15728640 100% in 0s 30.53 MB/s done upload: 'test.100M.gz' -> 's3://test1/test.100M.gz' [part 7 of 7, 10MB] [1 of 1] 10485760 of 10485760 100% in 0s 25.02 MB/s done
再次查看 master 和 secondly的同步状态
// master zone [root@df-vm-02 ~]# radosgw-admin sync status realm 2d40b603-356c-4730-a5f8-336260e3ed4c (movies) zonegroup 3f4b9246-be0d-4e4e-913a-9edf77d96f8d (gz) zone 71e02920-7412-44ea-adba-5426c9058397 (gz-zone1) metadata sync no sync (zone is master) data sync source: 1d8df413-c398-4e2c-9646-a92f603d1d8e (gz-zone2) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is caught up with source // secondly zone [root@df-vm-05 ~]# radosgw-admin sync status realm 2d40b603-356c-4730-a5f8-336260e3ed4c (movies) zonegroup 3f4b9246-be0d-4e4e-913a-9edf77d96f8d (gz) zone 1d8df413-c398-4e2c-9646-a92f603d1d8e (gz-zone2) metadata sync syncing full sync: 0/64 shards incremental sync: 64/64 shards metadata is caught up with master data sync source: 71e02920-7412-44ea-adba-5426c9058397 (gz-zone1) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is caught up with source
data is caught up with source: 表示数据一致
查看下pool 中的对象最后确认,数据是否同步
// master zone [root@df-vm-02 ~]# rados ls -p gz-zone1.rgw.buckets.data 71e02920-7412-44ea-adba-5426c9058397.104180.1__multipart_test.100M.gz.2~QLDtus3YAVprDzlXCPe5e6zHLGZU8B1.5 71e02920-7412-44ea-adba-5426c9058397.104180.1__shadow_test.100M.gz.2~QLDtus3YAVprDzlXCPe5e6zHLGZU8B1.3_2 71e02920-7412-44ea-adba-5426c9058397.104180.1__multipart_test.100M.gz.2~QLDtus3YAVprDzlXCPe5e6zHLGZU8B1.1 71e02920-7412-44ea-adba-5426c9058397.104180.1__shadow_test.100M.gz.2~QLDtus3YAVprDzlXCPe5e6zHLGZU8B1.3_3 71e02920-7412-44ea-adba-5426c9058397.104180.1__shadow_test.100M.gz.2~QLDtus3YAVprDzlXCPe5e6zHLGZU8B1.1_3 71e02920-7412-44ea-adba-5426c9058397.104180.1__shadow_test.100M.gz.2~QLDtus3YAVprDzlXCPe5e6zHLGZU8B1.7_1 . . . // secondly zone [root@df-vm-05 ~]# rados ls -p gz-zone2.rgw.buckets.data 71e02920-7412-44ea-adba-5426c9058397.104180.1__multipart_test.100M.gz.2~QLDtus3YAVprDzlXCPe5e6zHLGZU8B1.5 71e02920-7412-44ea-adba-5426c9058397.104180.1__shadow_test.100M.gz.2~QLDtus3YAVprDzlXCPe5e6zHLGZU8B1.3_2 71e02920-7412-44ea-adba-5426c9058397.104180.1__multipart_test.100M.gz.2~QLDtus3YAVprDzlXCPe5e6zHLGZU8B1.1 71e02920-7412-44ea-adba-5426c9058397.104180.1__shadow_test.100M.gz.2~QLDtus3YAVprDzlXCPe5e6zHLGZU8B1.3_3 71e02920-7412-44ea-adba-5426c9058397.104180.1__shadow_test.100M.gz.2~QLDtus3YAVprDzlXCPe5e6zHLGZU8B1.1_3 71e02920-7412-44ea-adba-5426c9058397.104180.1__shadow_test.100M.gz.2~QLDtus3YAVprDzlXCPe5e6zHLGZU8B1.7_1 . . .
数据是真实同步的
三. 总结
multisite实现了我们的集群做异地多活的功能,在zone中配置的endpoint 可以用来同步数据,生产环境中,为了保证同步的效率,需要注意两点
- 1.配置多个endpoint 用作数据同步,避免单个rgw压力过大
- 2.客户端读写用的endpoint应该跟同步endpoint分离。