CKA Troubleshooting - Worker Node Failure 문제 분석 및 해결

Certificate/CKA

CKA Troubleshooting - Worker Node Failure 문제 분석 및 해결

ygtoken 2025. 3. 4. 20:54

728x90

CKA(Certified Kubernetes Administrator) 시험에서는 Kubernetes 클러스터에서 Worker Node의 장애를 진단하고 해결하는 능력이 중요한 평가 요소입니다. 본 글에서는 'Worker Node Failure' 문제를 상세히 분석하고 해결 방법을 설명하겠습니다.

1. 문제 설명: 워커 노드 장애(Worker Node Failure)

📌 출제 의도

이 문제는 Kubernetes 클러스터에서 Worker Node가 정상적으로 동작하지 않을 때, 문제를 진단하고 해결하는 능력을 평가합니다. 주요 진단 항목은 다음과 같습니다.

노드의 상태 확인 및 복구 (kubectl get nodes 결과 분석)
kubelet 및 노드 컴포넌트 점검 (systemctl status kubelet 분석)
네트워크 및 CNI 플러그인 상태 확인 (kubectl get pods -n kube-system 결과 분석)
Pod 및 컨테이너 런타임 문제 해결 (systemctl status containerd 결과 분석)

❌ 문제 상황

kubectl get nodes 명령을 실행했을 때 특정 Worker Node가 NotReady 상태로 표시됩니다.
NAME STATUS ROLES AGE VERSION worker-node-1 Ready <none> 10d v1.20.2 worker-node-2 NotReady <none> 10d v1.20.2
kubectl describe node worker-node-2를 실행하면 Ready 상태가 False로 표시되고, 이유(Reason)가 KubeletNotReady로 나타납니다.
Conditions: Type Status Reason Message ---- ------ ------ ------- MemoryPressure False KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False KubeletHasSufficientPID kubelet has sufficient PID available Ready False KubeletNotReady runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
해당 노드에서 실행되는 Pod들이 Unknown 또는 ContainerCreating 상태로 지속됩니다.
systemctl status kubelet 실행 시 kubelet이 정상적으로 동작하지 않고 중지되었거나 오류 메시지가 나타납니다.
● kubelet.service - kubelet: The Kubernetes Node Agent Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled) Active: inactive (dead) since Tue 2025-03-04 11:30:00 KST; 5min ago

2. 문제 해결 접근법

✅ 1단계: Worker Node 상태 확인

먼저, Worker Node의 상태를 확인하여 현재 노드가 Ready 상태인지 확인합니다.

kubectl get nodes

만약 NotReady 상태인 경우, 해당 노드에 SSH로 접속하여 상태를 점검해야 합니다.

ssh <worker-node-ip>
sudo journalctl -u kubelet -f

출력 예시:

Mar 04 11:25:00 worker-node-2 kubelet[1234]: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message: cni config uninitialized

✅ 2단계: kubelet 서비스 확인 및 재시작

노드의 kubelet이 실행되고 있는지 확인하고, 비활성화된 경우 다시 시작해야 합니다.

sudo systemctl status kubelet

출력 예시:

● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Tue 2025-03-04 11:30:00 KST; 5min ago

kubelet이 중지된 경우, 다시 시작합니다.

sudo systemctl restart kubelet

재시작 후 상태를 다시 확인합니다.

sudo systemctl status kubelet
kubectl get nodes

✅ 3단계: 컨테이너 런타임 문제 확인 및 복구

Kubernetes는 컨테이너 런타임(Container Runtime)을 사용하여 Pod을 실행합니다. 컨테이너 런타임이 올바르게 실행되지 않으면 Pod이 정상적으로 생성되지 않을 수 있습니다.

sudo systemctl status containerd

출력 예시:

● containerd.service - containerd container runtime
   Loaded: loaded (/lib/systemd/system/containerd.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Tue 2025-03-04 11:20:00 KST; 10min ago

컨테이너 런타임을 다시 시작합니다.

sudo systemctl restart containerd

✅ 4단계: 네트워크 및 CNI 플러그인 점검

CNI 플러그인이 정상 동작하는지 확인합니다.

kubectl get pods -n kube-system | grep cni

출력 예시:

calico-node-xyz123   0/1     CrashLoopBackOff   10   10m

CNI 플러그인 파드를 삭제하여 재시작합니다.

kubectl delete pod calico-node-xyz123 -n kube-system

✅ 5단계: 노드 드레인 및 복구

kubectl drain worker-node-2 --ignore-daemonsets --delete-local-data
kubectl uncordon worker-node-2

노드가 Ready 상태로 변경되었는지 확인합니다.

kubectl get nodes

3. 결론

위 단계를 거쳐 Worker Node 장애를 해결할 수 있습니다. Kubernetes Worker Node 문제를 해결할 때는 다음 접근법을 기억하세요.

✅ 문제 해결 접근법 정리

Worker Node 상태 확인 → kubectl get nodes
kubelet 상태 확인 및 재시작 → sudo systemctl restart kubelet
컨테이너 런타임 상태 점검 → sudo systemctl status containerd
네트워크 및 CNI 플러그인 점검 → kubectl get pods -n kube-system | grep cni
노드 드레인 후 복구 → kubectl drain <worker-node> & kubectl uncordon <worker-node>

728x90

저작자표시 비영리 변경금지 (새창열림)

'Certificate > CKA' 카테고리의 다른 글

CKA Troubleshooting - Control Plane Failure 문제 분석 및 해결 (0)	2025.03.04
CKA Troubleshooting - Application Failure 문제 분석 및 해결 (0)	2025.03.04

현재글CKA Troubleshooting - Worker Node Failure 문제 분석 및 해결

YG Tech Blog

A blog about IT, covering topics from cloud computing and DevOps to Kubernetes and system architecture. Sharing insights, solutions, and best practices for modern IT professionals

Security, 서비스메시, Python, Istio, Cilium, DevOps, 서비스_운영, RAG, DaemonSet, gitops, statefulset, k8s, langchain, argocd, 쿠버네티스, 파이썬, CI/CD, Minio, YAML, kubernetes,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

YG Tech Blog