[Fixed][ceph][mgr][snap_schedule] : sqlite3.OperationalError: unable to open database file

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

淡定的柠檬 · 奥运冠军王军霞成“新香港人” _大公网· 7 月前 ·

逼格高的领结 · 習水：家門口制衣廠編織群眾“就業夢”· 8 月前 ·

自信的冰淇淋 · 学术前沿｜音乐人工智能系系列讲座——“算法作 ...· 1 年前 ·

求醉的牛排 · 我的人生意义手册_百度百科· 1 年前 ·

深情的单杠 · 为什么大部分啤酒的酒精度都不高？_手机搜狐网· 1 年前 ·

JavaScript is disabled. For a better experience, please enable JavaScript in your browser before proceeding.

On every reboot or power loss, my ceph managers are crashing, and the cephfs snap_schedule is not working since 2023-02-05-18.
The ceph mgr starts anyway, and generates a crash report turning the ceph cluster in HEALTH_WARN status.
I have the issue on every node (3 nodes cluster). Probably since Quincy update.
Does anyone observe the same problem ?
Do you have some recommandations or fixes ?
snap_schedule unavailable

root@pve3:~# ceph fs snap-schedule status / | jq
Error ENOENT: Module 'snap_schedule' is not available

The crash info:

root@pve3:~# ceph crash info '2023-04-11T06:23:22.105089Z_356de37b-2e16-4f44-b050-326ddad84773'
    "backtrace": [
        "  File \"/usr/share/ceph/mgr/snap_schedule/module.py\", line 38, in __init__\n    self.client = SnapSchedClient(self)",
        "  File \"/usr/share/ceph/mgr/snap_schedule/fs/schedule_client.py\", line 169, in __init__\n    with self.get_schedule_db(fs_name) as conn_mgr:",
        "  File \"/usr/share/ceph/mgr/snap_schedule/fs/schedule_client.py\", line 203, in get_schedule_db\n    db.executescript(dump)",
        "sqlite3.OperationalError: unable to open database file"
    "ceph_version": "17.2.5",
    "crash_id": "2023-04-11T06:23:22.105089Z_356de37b-2e16-4f44-b050-326ddad84773",
    "entity_name": "mgr.pve3",
    "mgr_module": "snap_schedule",
    "mgr_module_caller": "ActivePyModule::load",
    "mgr_python_exception": "OperationalError",
    "os_id": "11",
    "os_name": "Debian GNU/Linux 11 (bullseye)",
    "os_version": "11 (bullseye)",
    "os_version_id": "11",
    "process_name": "ceph-mgr",
    "stack_sig": "2fb4f03ffef7798ee981190306cedadb7d698a3a4cd6dbb59c0400ec3f76b6ba",
    "timestamp": "2023-04-11T06:23:22.105089Z",
    "utsname_hostname": "pve3",
    "utsname_machine": "x86_64",
    "utsname_release": "5.15.102-1-pve",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP PVE 5.15.102-1 (2023-03-14T13:48Z)"
Additional information on the ceph setup

balancer           on (always on)
crash              on (always on)
devicehealth       on (always on)
orchestrator       on (always on)
pg_autoscaler      on (always on)
progress           on (always on)
rbd_support        on (always on)
status             on (always on)
telemetry          on (always on)
volumes            on (always on)
dashboard          on           
iostat             on           
nfs                on           
prometheus         on           
restful            on           
snap_schedule      on           
stats              on           
alerts             -             
influx             -             
insights           -             
localpool          -             
mirroring          -             
osd_perf_query     -             
osd_support        -             
selftest           -             
telegraf           -             
test_orchestrator  -             
zabbix             -

root@pve3:~# ceph crash ls
ID                                                                ENTITY    NEW 
2023-02-10T15:55:33.246668Z_d7bfe3b0-2647-4583-b257-60cc8bebb820  mgr.pve3       
2023-02-10T21:10:48.333710Z_fab6d271-2708-4bf4-a70b-36918d268a14  mgr.pve1       
2023-02-10T21:11:32.956340Z_d5fb4e86-1dbc-4247-845f-b9777af168c4  osd.2         
2023-02-10T21:11:34.464160Z_0b4effa2-0538-4090-9337-89259d3d78bd  osd.4         
2023-02-16T11:07:38.833449Z_6065ff7b-ef02-4d6f-bbd2-a3beb7ca7f9a  mgr.pve2       
2023-02-19T23:12:31.685803Z_9a53dab2-3817-4ebd-8a0b-f479d277d751  mgr.pve3       
2023-02-19T23:30:00.594498Z_5da1ca3c-8d1b-470f-818f-e8bde0ed01f7  mgr.pve1       
2023-03-05T16:00:29.692042Z_0f6ae1ae-8d74-4a0e-9e41-3172086f8460  mgr.pve2       
2023-03-12T11:46:48.532363Z_2ec74f76-9cbe-438c-8948-82d3732f0aea  mgr.pve3       
2023-03-12T21:32:17.037999Z_6df8796c-ec4c-42f6-b352-b273e345c22e  mgr.pve1       
2023-03-13T09:19:09.578815Z_849121da-6ad9-4036-b3b6-c556263a0f05  mgr.pve2       
2023-03-13T12:59:09.792996Z_b1de9b4f-c8a4-48ae-9dff-d3413e527e43  mgr.pve3       
2023-03-13T13:34:43.233360Z_4ed2be9e-2f07-4853-b3bf-1e0efe69afea  osd.5         
2023-03-18T09:22:35.338683Z_ab963d23-e823-40aa-b0ad-0330b5265ff7  mgr.pve1       
2023-03-24T17:39:43.801037Z_40bdd143-b099-4f15-bfd6-39a0d544ee38  mgr.pve1       
2023-04-07T14:02:36.377029Z_c1510456-3f13-4a5e-ba86-9ea7779817f2  mgr.pve3       
2023-04-07T14:03:37.643954Z_4c946518-c119-480b-87b5-2b0d48f583df  mgr.pve2       
2023-04-07T19:19:58.561573Z_e5057586-899c-4bd3-bf4e-f08d469f2013  mgr.pve1   *   
2023-04-08T17:43:39.891129Z_1770ab87-0303-4d0e-bbbb-6e62143876c0  mgr.pve3   *   
2023-04-08T18:18:12.318248Z_b8563587-a64f-4b97-a4ab-2d55049d1261  mgr.pve2   *   
2023-04-11T06:23:22.105089Z_356de37b-2e16-4f44-b050-326ddad84773  mgr.pve3   *

Turning this thread as "Fixed" since probably related problems have been discussed here:
- https://www.spinics.net/lists/ceph-users/msg74696.html
- https://tracker.ceph.com/issues/57851
And the proposed fix also fixes the above problem :
https://github.com/ceph/ceph/pull/48449/commits/8d853cc4990dc4dbccdc916115b0b30e0ac9dc19
This fix will probably come in the next ceph update.
The problem seems to be caused by the migration from 16 (Pacific) to 17 (Quincy) if you enable the snap_schedule before the migration.
The sqlite DB storage has been moved to the cephfs_metadata, and the mgr can't migrate the old database without this minor patch.
I added the 2 lines from the match manually in the files (paths can be guessed from the crash report), rebooted the node (a restart of the mgr might have been enough), and then made this mgr active. I did the change on one node only, since it's just for the DB migration to the new storage. The patch can then be removed if desired.
This immediately fixed the fs snap-schedule status command:
"start": "2022-07-17T00:00:00", "created": "2022-07-17T22:44:20", "first": "2023-04-11T21:00:00", "last": "2023-04-11T21:00:00", "last_pruned": "2023-04-11T21:00:00", "created_count": 1, "pruned_count": 1, "active": true And the cephfs automatic snapshots works again (with an appended _UTC):

root@pve3:~# ls -tr -1 /mnt/pve/cephfs/.snap
weekly_2022-07-10_231701
daily_2022-07-10_231701
scheduled-2023-04-11-21_00_00_UTC
scheduled-2023-02-05-18_00_00
scheduled-2023-02-05-17_00_00
scheduled-2023-02-05-16_00_00

As you might have noticed, I lost the retention, so I had to re-apply and check:

root@pve3:~# ceph fs snap-schedule retention add / m 12
ceph fs snap-schedule retention add / w 4
ceph fs snap-schedule retention add / d 7
ceph fs snap-schedule retention add / h 24
Retention added to path /
Retention added to path /
Retention added to path /
Retention added to path /
root@pve3:~# ceph fs snap-schedule status | jq
  "fs": "cephfs",
  "subvol": null,
  "path": "/",
  "rel_path": "/",
  "schedule": "1h",
  "retention": {
    "m": 12,
    "w": 4,
    "d": 7,
    "h": 24
  "start": "2022-07-17T00:00:00",
  "created": "2022-07-17T22:44:20",
  "first": "2023-04-11T21:00:00",
  "last": "2023-04-11T21:00:00",
  "last_pruned": "2023-04-11T21:00:00",
  "created_count": 1,
  "pruned_count": 1,
  "active": true
The details of my cephfs snapshot setup on my blog

I hope this post will save some hours to people experiencing the same issue.


    The Proxmox community has been around for many years and offers help and support for
				Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
    

    We think our community is one of the best thanks to people like you!
                        The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.
    
     
      Buy now!
     
    
    
     This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
     

     By continuing to use this site, you are consenting to our use of cookies.
     
      
       Accept
      
     
     
      
       Learn more…