MDS Logs

Please click to download! mds001 mds002 mds003 mds004 mds005

MDS Metrics

recovery time breakdown The metrics how long it takes for each recovery step are measured and calculated through the ceph mds stat command. Some states are not obtained because it is very short or they are not conducted in this recovery scenario. In the figure, the x-axis means the MDS restart count. In this experiment, there are four MDS restarings with systemctl restart ceph-mds.target. Unexpectdly, in the second recovery, the total time is 695 seconds more than other cases. The next figure shows that retrying recovery process happens with serveral up:replay, up:resolve, up:rejoin, and so on. It means that the state cannot be transitioned from up:rejoin to up:active. In the MDS log, the during up:rejoin stage, MDS internal heartbeat is not healthy messages appear and finally mds is respawn!

recovey state count The y-axis represents how many times each state is conducted every MDS restarting.

mds stat

reply rate There are two ranks. The blue is rank0, while the orange is rank1.

caps inodes

num_caps for sessoins The num caps are measured via session ls. Due to some technical issues, our metric DB only collects information of top 10 sessions.
release_capse for sessions