首页 > 云计算 > Docker/K8 > Pod挂载Volume失败问题分析
Kubernetes环境偶尔出现StatefulSet中的Pod被删除,新启动的Pod(还是调度到原有节点)挂载volume失败的问题,如下图,经过一番定位分析,也让我们对于Kubernetes系统复杂程度有了新的认知。
在分析此问题之前,作为相关背景知识,先简单介绍对于Kubernetes存储系统的理解。
存储也是Kubernetes中比较重要而复杂的系统,功能比较庞大,涉及到不同组件中,不同控制器的协作,如下图。
provision,卷分配成功
attach,卷挂载在对应worker node
mount,卷挂载为文件系统并且映射给对应Pod
umount,卷已经和对应worker node解除映射,且已经从文件系统umount
detach,卷已经从worker node卸载
recycle,卷被回收
pv controller,负责创建和回收卷
attach detach controller,负责挂载和卸载卷
volume manager,负责mount和umount卷
for { desired := getDesiredState(); current := getCurrentState(); makeChanges(desired, current); }
结合以上三个维度,Kubernetes需要保证卷的管理功能分布在不同控制器的前提下保证卷生命周期顺序的正确性。以Pod使用卷为例,看Kubernetes是如何做到这一点?
此时pvc绑定pv。
2、attach detach controlle,如果pod分配到worker node,并且对应卷已经创建,则将卷挂载到对应worker node,并且给worker node资源增加volume已经挂载对应卷的状态信息,结合相关代码2[2]和代码3[3]。
此时对应node资源状态中增加volume信息。
[root@10-10-88-152 ~]# kubectl get nodes 10-10-88-113 -o yaml apiVersion: v1 kind: Node .... volumesAttached: - devicePath: csi-add9fc778d9593d01818d65ccde7013e87327d9f675b47df42a34b860c581711 name: kubernetes.io/csi/csi-qcfsplugin^csi-qcfs-volume-4faa18f5bbbd11e8-1365 - devicePath: csi-5dd249387138238e8e2209eb471450a072dd6543adde7a6769c8461943c789ca name: kubernetes.io/csi/csi-qcfsplugin^csi-qcfs-volume-4fa9b764bbbd11e8-1366 - devicePath: csi-bc9b81e32d84e8890d17568964c1e01af97b0c175e0b73d4bf30bba54e3f1a1e name: kubernetes.io/csi/csi-qcfsplugin^csi-qcfs-volume-4fa94533bbbd11e8-1364 volumesInUse: - kubernetes.io/csi/csi-qcfsplugin^csi-qcfs-volume-4fa94533bbbd11e8-1364 - kubernetes.io/csi/csi-qcfsplugin^csi-qcfs-volume-4fa9b764bbbd11e8-1366 - kubernetes.io/csi/csi-qcfsplugin^csi-qcfs-volume-4faa18f5bbbd11e8-1365
先挂载到node中全局路径,比如/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-3ecd68c7b7d211e8/globalmount。
映射到Pod对应路径,比如/var/lib/kubelet/pods/49a5fede-b811-11e8-844f-fa7378845e00/volumes/kubernetes.io~csi/pvc-3ecd68c7b7d211e8/mount。
actualStateOfWorld中设置volume为挂载成功状态。
4、pod controller确认卷已经映射成功,启动Pod,此处不详细展开。
将Pod从desiredStateOfWorld的缓存信息中清除。
actualStateOfWorld中已经挂载的卷和desiredStateOfWorld发现Pod不应该挂载,执行UmountVolume操作,将Pod和卷映射关系解除,并将Pod从actualStateOfWorld的卷信息中剔除。
此时如果实际状态中卷没有关联任何Pod,则说明卷需要可以完全与节点分离,则先执行UnmountDevice将卷的globalpath umount掉,等到下次reconcile时执行MarkVolumeAsDetached将卷完全从实际状态中删除掉。
不同组件通过资源状态协作,attach detach controller需要PVC绑定PV的状态,volume manager需要node status中volume attached状态。
组件通过reconcile方式达到期望状态,并且状态可能需要多次reconcile中完成,如Pod清除掉后,volume最终和node分离。
volume manager发现Pod被删除,执行umount
StatefulSet发现Pod被删除,马上创建Pod
scheduler发现Pod进行调度
volume manager发现原有volume需要绑定Pod,执行mount
volume manager发现Pod被删除,执行umount/unmountDevice/MarkVolumeAsDelete(通过几次reconcile)
attach detach controller发现volume在node节点未被使用,执行detach
scheduler发现Pod进行调度
attach detach controller发现volume需要attach,执行attach
volume manager挂载
StatefulSet发现Pod被删除,马上创建Pod
volume manager发现Pod被删除,执行umount/deviceUmount(通过几次reconcile),注意此时devicePath和deviceMountPath都为空
scheduler发现Pod进行调度
volume manager发现原有volume需要绑定Pod,执行mount而此时devicePath和deviceMountPath都为空,问题出现
Sep
14
19
:
28
:
33
10
-10
-40
-16
kubelet: I0914
19
:
28
:
33.174310
1953
operation_generator.
go
:
1168
] Controller attach succeeded
for
volume
"pvc-3ecd68c7b7d211e8"
(UniqueName:
"kubernetes.io/csi/csi-qcfsplugin^csi-qcfs-volume-3ecd68c7b7d211e8-338"
) pod
"yoooo-416ea0-0"
(UID:
"49a5fede-b811-11e8-844f-fa7378845e00"
) device path:
"csi-eb93736e654600786d95eaffa7cd5d616f11a90bdc109e0df575e8646c250eb2"
Sep
14
19
:
28
:
33
10
-10
-40
-16
kubelet: I0914
19
:
28
:
33.273344
1953
operation_generator.
go
:
486
] MountVolume.WaitForAttach entering
for
volume
"pvc-3ecd68c7b7d211e8"
(UniqueName:
"kubernetes.io/csi/csi-qcfsplugin^csi-qcfs-volume-3ecd68c7b7d211e8-338"
) pod
"yoooo-416ea0-0"
(UID:
"49a5fede-b811-11e8-844f-fa7378845e00"
) DevicePath
"csi-eb93736e654600786d95eaffa7cd5d616f11a90bdc109e0df575e8646c250eb2"
Sep
14
19
:
28
:
33
10
-10
-40
-16
kubelet: I0914
19
:
28
:
33.318275
1953
operation_generator.
go
:
495
] MountVolume.WaitForAttach succeeded
for
volume
"pvc-3ecd68c7b7d211e8"
(UniqueName:
"kubernetes.io/csi/csi-qcfsplugin^csi-qcfs-volume-3ecd68c7b7d211e8-338"
) pod
"yoooo-416ea0-0"
(UID:
"49a5fede-b811-11e8-844f-fa7378845e00"
) DevicePath
"csi-eb93736e654600786d95eaffa7cd5d616f11a90bdc109e0df575e8646c250eb2"
Sep
14
19
:
28
:
33
10
-10
-40
-16
kubelet: I0914
19
:
28
:
33.319345
1953
operation_generator.
go
:
514
] MountVolume.MountDevice succeeded
for
volume
"pvc-3ecd68c7b7d211e8"
(UniqueName:
"kubernetes.io/csi/csi-qcfsplugin^csi-qcfs-volume-3ecd68c7b7d211e8-338"
) pod
"yoooo-416ea0-0"
(UID:
"49a5fede-b811-11e8-844f-fa7378845e00"
) device mount path
"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-3ecd68c7b7d211e8/globalmount"
Sep
14
19
:
29
:
12
10
-10
-40
-16
kubelet: I0914
19
:
29
:
12.826916
1953
operation_generator.
go
:
486
] MountVolume.WaitForAttach entering
for
volume
"pvc-3ecd68c7b7d211e8"
(UniqueName:
"kubernetes.io/csi/csi-qcfsplugin^csi-qcfs-volume-3ecd68c7b7d211e8-338"
) pod
"yoooo-416ea0-0"
(UID:
"67f223dc-b811-11e8-844f-fa7378845e00"
) DevicePath
"csi-eb93736e654600786d95eaffa7cd5d616f11a90bdc109e0df575e8646c250eb2"
Sep
14
19
:
29
:
14
10
-10
-40
-16
kubelet: I0914
19
:
29
:
14.465225
1953
operation_generator.
go
:
495
] MountVolume.WaitForAttach succeeded
for
volume
"pvc-3ecd68c7b7d211e8"
(UniqueName:
"kubernetes.io/csi/csi-qcfsplugin^csi-qcfs-volume-3ecd68c7b7d211e8-338"
) pod
"yoooo-416ea0-0"
(UID:
"67f223dc-b811-11e8-844f-fa7378845e00"
) DevicePath
"csi-eb93736e654600786d95eaffa7cd5d616f11a90bdc109e0df575e8646c250eb2"
Sep
14
19
:
29
:
14
10
-10
-40
-16
kubelet: I0914
19
:
29
:
14.466483
1953
operation_generator.
go
:
514
] MountVolume.MountDevice succeeded
for
volume
"pvc-3ecd68c7b7d211e8"
(UniqueName:
"kubernetes.io/csi/csi-qcfsplugin^csi-qcfs-volume-3ecd68c7b7d211e8-338"
) pod
"yoooo-416ea0-0"
(UID:
"67f223dc-b811-11e8-844f-fa7378845e00"
) device mount path
"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-3ecd68c7b7d211e8/globalmount"
Sep
14
19
:
29
:
15
10
-10
-40
-16
kubelet: W0914
19
:
29
:
15.491424
1953
csi_mounter.
go
:
354
] kubernetes.io/csi: skipping mount dir removal, path does not exist [/
var
/lib/kubelet/pods/
49
a5fede-b811
-11e8
-844f
-fa7378845e00/volumes/kubernetes.io~csi/pvc
-3
ecd68c7b7d211e8/mount]
Sep
14
19
:
29
:
15
10
-10
-40
-16
kubelet: I0914
19
:
29
:
15.491450
1953
operation_generator.
go
:
686
] UnmountVolume.TearDown succeeded
for
volume
"kubernetes.io/csi/csi-qcfsplugin^csi-qcfs-volume-3ecd68c7b7d211e8-338"
(OuterVolumeSpecName:
"data"
) pod
"49a5fede-b811-11e8-844f-fa7378845e00"
(UID:
"49a5fede-b811-11e8-844f-fa7378845e00"
). InnerVolumeSpecName
"pvc-3ecd68c7b7d211e8"
. PluginName
"kubernetes.io/csi"
, VolumeGidValue
""
Sep
14
19
:
29
:
44
10
-10
-40
-16
kubelet: W0914
19
:
29
:
44.896387
1953
csi_mounter.
go
:
354
] kubernetes.io/csi: skipping mount dir removal, path does not exist [/
var
/lib/kubelet/pods/
67f
223dc-b811
-11e8
-844f
-fa7378845e00/volumes/kubernetes.io~csi/pvc
-3
ecd68c7b7d211e8/mount]
Sep
14
19
:
29
:
44
10
-10
-40
-16
kubelet: I0914
19
:
29
:
44.896403
1953
operation_generator.
go
:
686
] UnmountVolume.TearDown succeeded
for
volume
"kubernetes.io/csi/csi-qcfsplugin^csi-qcfs-volume-3ecd68c7b7d211e8-338"
(OuterVolumeSpecName:
"data"
) pod
"67f223dc-b811-11e8-844f-fa7378845e00"
(UID:
"67f223dc-b811-11e8-844f-fa7378845e00"
). InnerVolumeSpecName
"pvc-3ecd68c7b7d211e8"
. PluginName
"kubernetes.io/csi"
, VolumeGidValue
""
Sep
14
19
:
29
:
44
10
-10
-40
-16
kubelet: I0914
19
:
29
:
44.917540
1953
reconciler.
go
:
278
] operationExecutor.UnmountDevice started
for
volume
"pvc-3ecd68c7b7d211e8"
(UniqueName:
"kubernetes.io/csi/csi-qcfsplugin^csi-qcfs-volume-3ecd68c7b7d211e8-338"
) on node
"10-10-40-16"
Sep
14
19
:
29
:
44
10
-10
-40
-16
kubelet: W0914
19
:
29
:
44.919231
1953
mount_linux.
go
:
179
] could not determine device
for
path:
"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-3ecd68c7b7d211e8/globalmount"
Sep
14
19
:
29
:
45
10
-10
-40
-16
kubelet: I0914
19
:
29
:
45.609605
1953
operation_generator.
go
:
760
] UnmountDevice succeeded
for
volume
"pvc-3ecd68c7b7d211e8"
%!(EXTRA
string
=UnmountDevice succeeded
for
volume
"pvc-3ecd68c7b7d211e8"
(UniqueName:
"kubernetes.io/csi/csi-qcfsplugin^csi-qcfs-volume-3ecd68c7b7d211e8-338"
) on node
"10-10-40-16"
)
Sep
14
19
:
29
:
45
10
-10
-40
-16
kubelet: I0914
19
:
29
:
45.624963
1953
operation_generator.
go
:
486
] MountVolume.WaitForAttach entering
for
volume
"pvc-3ecd68c7b7d211e8"
(UniqueName:
"kubernetes.io/csi/csi-qcfsplugin^csi-qcfs-volume-3ecd68c7b7d211e8-338"
) pod
"yoooo-416ea0-0"
(UID:
"77b8caf7-b811-11e8-844f-fa7378845e00"
) DevicePath
""
Sep
14
19
:
29
:
46
10
-10
-40
-16
kubelet: E0914
19
:
29
:
46.006612
1953
nestedpendingoperations.
go
:
267
] Operation
for
"\"kubernetes.io/csi/csi-qcfsplugin^csi-qcfs-volume-3ecd68c7b7d211e8-338\""
failed. No retries permitted until
2018
-09
-14
19
:
29
:
46.506583596
+
0800
CST m=+
105572.978439381
(durationBeforeRetry
500
ms). Error:
"MountVolume.WaitForAttach failed for volume \"pvc-3ecd68c7b7d211e8\" (UniqueName: \"kubernetes.io/csi/csi-qcfsplugin^csi-qcfs-volume-3ecd68c7b7d211e8-338\") pod \"yoooo-416ea0-0\" (UID: \"77b8caf7-b811-11e8-844f-fa7378845e00\") : resource name may not be empty"
Sep
14
19
:
29
:
46
10
-10
-40
-16
kubelet: I0914
19
:
29
:
46.533962
1953
operation_generator.
go
:
486
] MountVolume.WaitForAttach entering
for
volume
"pvc-3ecd68c7b7d211e8"
(UniqueName:
"kubernetes.io/csi/csi-qcfsplugin^csi-qcfs-volume-3ecd68c7b7d211e8-338"
) pod
"yoooo-416ea0-0"
(UID:
"77b8caf7-b811-11e8-844f-fa7378845e00"
) DevicePath
""
Sep 14 19:29:14以及之前DevicePath非空
Sep 14 19:29:45以及之后DevicePath为空
Sep 14 19 :29 :14 …… MountVolume .MountDevice ……
Sep 14 19 :29 :15 ….. UnmountVolume .TearDown ……
Sep 14 19 :29 :44 …… UnmountVolume .TearDown ……
Sep 14 19 :29 :44 …… operationExecutor .UnmountDevice ……
Sep 14 19 :29 :44 …… could not determine device for path ….
UnmountDevice->GenerateUnmountDeviceFunc->actualStateOfWorld.MarkDeviceAsUnmounted->asw.SetVolumeGloballyMounted
asw.SetVolumeGloballyMounted(volumeName,
false
/* globallyMounted */
,
/* devicePath */
""
,
/* deviceMountPath */
""
)
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/28218939/viewspace-2217186/,如需转载,请注明出处,否则将追究法律责任。