Erreur :
$ ceph health detail
HEALTH_ERR Module 'dashboard' has failed: key type unsupported
[ERR] MGR_MODULE_ERROR: Module 'dashboard' has failed: key type unsupported
Module 'dashboard' has failed: key type unsupported
Il faut désactiver le SSL puis remettre un certificat valide :
ceph config set mgr mgr/dashboard/ssl false; ceph mgr module disable dashboard ; sleep 5; ceph mgr module enable dashboard
ceph dashboard set-ssl-certificate-key -i /etc/pki/realms/pkgdata.net/private/key.pem
ceph dashboard set-ssl-certificate -i /etc/pki/realms/pkgdata.net/acme/cert.pem
ceph config set mgr mgr/dashboard/ssl true; ceph mgr module disable dashboard ; sleep 5; ceph mgr module enable dashboard
Sur une sauvegarde, le repository restic était corrompu :
Load(<index/d4b1e7f03d>, 0, 0) failed: The specified key does not exist.
La commande ‘restic repair index’ ne fonctionnait pas non plus :
# restic repair index
repository dfbf3cf6 opened (version 2, compression level auto)
created new cache in /data/restic-databases/cache
loading indexes...
Load(<index/d4b1e7f03d>, 0, 0) failed: The specified key does not exist.
Load(<index/d4b1e7f03d>, 0, 0) failed: The specified key does not exist.
removing invalid index d4b1e7f03df484f65e458d37d4089fe7e0a2312dee34a8f5350cfb5f09d7925d: The specified key does not exist.
getting pack files to read...
rebuilding index
[0:00] 100.00% 1 / 1 indexes processed
[0:00] 100.00% 2 / 2 old indexes deleted
done
On peut aussi voir le problème après avoir mounté le bucket Ceph :
s3fs os402.pkgdata.net-databases /mnt/ceph-bucket -o passwd_file=/etc/passwd-s3fs -o url=https://pkgdata.backup -o use_path_request_styl
root@on001:/mnt/ceph-bucket/index # ls
ls: cannot access 'd4b1e7f03df484f65e458d37d4089fe7e0a2312dee34a8f5350cfb5f09d7925d': No such file or directory
d4b1e7f03df484f65e458d37d4089fe7e0a2312dee34a8f5350cfb5f09d7925d f5f669ed7e8bf6afc46a102b790483e2e3720385d566207da99b32dbc076e855
Pour corriger, se loguer sur ov004.pkgdata.net et lancer :
radosgw-admin bucket check --bucket=os402.pkgdata.net-databases --check-objects --fix
Cf. https://tracker.ceph.com/issues/44509#note-8 https://github.com/nh2/ceph-fix-spilled-metadata-script/blob/main/ceph-fix-spilled-metadata.sh
La DB utilise de l’espace sur les disques à plateaux :
ceph health detail
HEALTH_WARN 10 OSD(s) experiencing BlueFS spillover
[WRN] BLUEFS_SPILLOVER: 10 OSD(s) experiencing BlueFS spillover
osd.1 spilled over 24 GiB metadata from 'db' device (2.7 GiB used of 50 GiB) to slow device
osd.2 spilled over 25 GiB metadata from 'db' device (2.4 GiB used of 50 GiB) to slow device
osd.3 spilled over 23 GiB metadata from 'db' device (2.7 GiB used of 50 GiB) to slow device
osd.4 spilled over 23 GiB metadata from 'db' device (2.5 GiB used of 50 GiB) to slow device
osd.5 spilled over 25 GiB metadata from 'db' device (2.1 GiB used of 50 GiB) to slow device
osd.6 spilled over 23 GiB metadata from 'db' device (2.1 GiB used of 50 GiB) to slow device
osd.7 spilled over 25 GiB metadata from 'db' device (2.6 GiB used of 50 GiB) to slow device
osd.8 spilled over 25 GiB metadata from 'db' device (2.8 GiB used of 50 GiB) to slow device
osd.9 spilled over 24 GiB metadata from 'db' device (1.7 GiB used of 50 GiB) to slow device
osd.10 spilled over 25 GiB metadata from 'db' device (2.7 GiB used of 50 GiB) to slow device
Cela vient du fait que :
ceph daemon /var/run/ceph/9e2d3cee-4d0c-11ef-ba6d-047c16f1285e/ceph-osd.1.asok config show|grep -i bluestore_volume_selection_policy
"bluestore_volume_selection_policy": "use_some_extra",
mais bluefs_used dans BDEV_SLOW devrait être à 0 car il y a encore de la place sur BDEV_DB :
ceph daemon /var/run/ceph/9e2d3cee-4d0c-11ef-ba6d-047c16f1285e/ceph-osd.1.asok bluestore bluefs device info
{
"dev": {
"device": "BDEV_DB",
"total": 53687083008,
"free": 50804547584,
"bluefs_used": 2882535424
},
"dev": {
"device": "BDEV_SLOW",
"total": 22000965779456,
"free": 6919077777408,
"bluefs_used": 25998458880,
"bluefs max available": 6834005999616
}
}
et :
ceph tell osd.1 bluefs stats
1 : device size 0xc7fffe000 : using 0x660700000(26 GiB)
2 : device size 0x14027fc00000 : using 0xdb184548000(14 TiB)
RocksDBBlueFSVolumeSelector
>>Settings<< extra=28 GiB, l0_size=1 GiB, l_base=1 GiB, l_multi=8 B
DEV/LEV WAL DB SLOW * * REAL FILES
LOG 0 B 16 MiB 0 B 0 B 0 B 14 MiB 1
WAL 0 B 126 MiB 0 B 0 B 0 B 108 MiB 5
DB 0 B 1.1 GiB 0 B 0 B 0 B 717 MiB 52
SLOW 0 B 24 GiB 0 B 0 B 0 B 22 GiB 352
TOTAL 0 B 26 GiB 0 B 0 B 0 B 0 B 410
MAXIMUMS:
LOG 0 B 16 MiB 0 B 0 B 0 B 14 MiB
WAL 0 B 1.7 GiB 0 B 0 B 0 B 1001 MiB
DB 0 B 1.2 GiB 0 B 0 B 0 B 801 MiB
SLOW 0 B 24 GiB 0 B 0 B 0 B 22 GiB
TOTAL 0 B 27 GiB 0 B 0 B 0 B 0 B
>> SIZE << 0 B 48 GiB 19 TiB
Pour corriger :
ceph osd add-noout osd.1
systemctl stop ceph-9e2d3cee-4d0c-11ef-ba6d-047c16f1285e@osd.1.service
ceph-bluestore-tool bluefs-bdev-migrate --path /var/lib/ceph/9e2d3cee-4d0c-11ef-ba6d-047c16f1285e/osd.1 --devs-source /var/lib/ceph/9e2d3cee-4d0c-11ef-ba6d-047c16f1285e/osd.1/block --dev-target /var/lib/ceph/9e2d3cee-4d0c-11ef-ba6d-047c16f1285e/osd.1/block.db
systemctl start ceph-9e2d3cee-4d0c-11ef-ba6d-047c16f1285e@osd.1.service
ceph osd rm-noout osd.1