Kubernetes (k8s) is a container orchestration platform. It automates deploying, scaling, and recovering large numbers of containers across a cluster. Beyond a single server where Docker handles things directly, once container counts grow to dozens or hundreds, questions like who restarts a terminated container, how to scale on traffic spikes, and how containers communicate need a declarative model.
Cluster Architecture
A k8s cluster consists of the Control Plane and Worker Nodes.
flowchart TB
subgraph cp["Control Plane"]
API["API Server"]
ETCD["etcd"]
SCHED["Scheduler"]
CM["Controller Manager"]
end
subgraph wn1["Worker Node"]
KL1["kubelet"]
KP1["kube-proxy"]
CR1["Container Runtime"]
P1["Pod"]
P2["Pod"]
end
subgraph wn2["Worker Node"]
KL2["kubelet"]
KP2["kube-proxy"]
CR2["Container Runtime"]
P3["Pod"]
end
API --> SCHED
API --> CM
API --> ETCD
API --> KL1
API --> KL2
Control Plane
The set of components that manage the entire cluster.
API Server is the entry point for all requests. Whether from kubectl or internal components, every k8s operation goes through it. etcd is a distributed key-value store holding the cluster state — which Pods run where, which Deployments exist, and so on.
Scheduler decides which Node should run a newly created Pod, considering resource availability and affinity rules. Controller Manager watches whether the current state matches the declared state. If a Deployment says replicas: 3 but only 2 Pods exist, the Controller creates one more.
Worker Node
The servers where containers actually run.
kubelet manages Pod lifecycles on each Node. It receives instructions from the API Server and starts containers through the Container Runtime. kube-proxy manages Node-level networking rules, routing traffic that arrives at a Service to the appropriate Pod.
End-to-End Flow
Running kubectl apply -f deployment.yaml sends the request to the API Server. It stores the desired state in etcd. The Scheduler picks a Node for each Pod. The Node’s kubelet creates the containers. The Controller Manager continues monitoring and correcting any drift between declared and actual state.
What backend developers interact with directly is kubectl and YAML manifests. The rest is handled internally by k8s.
Core Objects
Pod
A Pod is the smallest deployable unit in k8s. Containers are wrapped in Pods because containers in the same Pod share a network namespace and storage. This enables sidecar patterns like placing a log collector or proxy alongside the main container.
In most cases, one Pod contains one container. Pods are rarely created directly; Deployments manage them.
Deployment
A Deployment declaratively manages Pods. Declare “maintain 3 Pods with this image” and k8s automatically creates the 3 Pods and restarts any that terminate. It is the most frequently used object when deploying backend services.
The default deployment strategy is a rolling update. New Pods spin up one by one while old Pods shut down one by one. The service stays available throughout. If something goes wrong, kubectl rollout undo reverts to the previous version.
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
spec:
replicas: 3
selector:
matchLabels:
app: api-server
template:
metadata:
labels:
app: api-server
spec:
containers:
- name: api
image: api-server:1.2.0
ports:
- containerPort: 8080
Service
Pods are created and destroyed frequently, and their IPs change. Using Pod IPs directly is unreliable. A Service provides a stable access point to a set of Pods.
ClusterIP assigns a virtual IP accessible only within the cluster. It is the primary choice for inter-service communication. NodePort opens a specific port on each Node for external access. LoadBalancer automatically provisions a cloud load balancer.
ClusterIP is the most common choice in backend development. Call another service at http://service-name:port and k8s DNS resolves it to the Service’s ClusterIP. No need to implement service discovery separately.
Namespace
A Namespace logically partitions a cluster. It isolates environments like dev, staging, and production within the same cluster. Resource names only need to be unique within a Namespace.
Network
Ingress
If a Service provides access within the cluster, Ingress routes external traffic to internal Services. It distributes traffic based on domain names or URL paths. Since it enables path-based routing without a separate API Gateway, it is frequently used in backend service architectures.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api-ingress
spec:
rules:
- host: api.example.com
http:
paths:
- path: /users
pathType: Prefix
backend:
service:
name: user-service
port:
number: 80
- path: /orders
pathType: Prefix
backend:
service:
name: order-service
port:
number: 80
api.example.com/users routes to user-service, /orders to order-service. Ingress defines the rules; an Ingress Controller (nginx, traefik, etc.) handles the actual traffic.
Configuration and Storage
ConfigMap and Secret
Embedding configuration in a container image forces a rebuild for every change. A ConfigMap separates configuration data into its own object. It injects values as environment variables or mounts them as volumes.
A Secret has the same structure as a ConfigMap but stores sensitive information like passwords and API keys. Values are base64-encoded (not encrypted), but combined with RBAC, access control is possible.
In backend development, ConfigMap and Secret handle the separation of DB connection strings, external API keys, and similar configuration from code.
PersistentVolume
Pod deletion erases internal data. Workloads like databases need persistent storage. A PersistentVolume (PV) is storage pre-provisioned by a cluster administrator. A PersistentVolumeClaim (PVC) is how a Pod requests a PV. The Pod only knows about the PVC — it does not need to know where the actual storage resides.
Health Checks
For k8s to automatically judge Pod health, the application must expose its status. Backend developers implement this directly.
readinessProbe checks if a Pod is ready to receive traffic. Unready Pods are excluded from Service routing. Useful when the server needs cache warm-up before accepting requests.
livenessProbe checks if a Pod is functioning normally. Failure triggers a restart. It detects deadlocks or unresponsive states.
startupProbe is for slow-starting applications. It defers liveness/readiness checks until startup completes.
containers:
- name: api
image: api-server:1.2.0
readinessProbe:
httpGet:
path: /health/ready
port: 8080
periodSeconds: 5
livenessProbe:
httpGet:
path: /health/live
port: 8080
periodSeconds: 10
Implement /health/ready and /health/live endpoints in the backend server. Readiness typically checks DB connections and external dependencies. Liveness checks only whether the server process itself is alive.
Scaling
Manual Scaling
kubectl scale deployment api-server --replicas=5
This directly changes the Deployment’s replica count. Suitable when traffic patterns are predictable or when pre-scaling for a known event.
HPA
Unpredictable traffic makes manual scaling impractical. HPA (Horizontal Pod Autoscaler) adjusts Pod count based on metrics.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-server-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
This says “scale up when average CPU exceeds 70%, scale down when it drops. Minimum 2 Pods, maximum 10.”
metrics-server periodically collects resource usage from each Pod. The HPA Controller compares current average utilization against the target and calculates the needed Pod count. If 3 Pods average 90% CPU with a 70% target, it scales to 90/70 × 3 ≈ 4 Pods.
HPA requires resource requests on the Deployment. Without requests, there is no denominator for “70% of what.”
containers:
- name: api
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
Requests guarantee minimum resources for a Pod. The Scheduler uses this value to judge Node capacity. Limits cap maximum resources. Exceeding CPU limits causes throttling; exceeding memory limits triggers an OOMKill. Monitoring backend service memory usage and setting appropriate values is important.
Beyond CPU, HPA supports memory and custom metrics (request count, queue length, etc.). VPA (Vertical Pod Autoscaler) adjusts individual Pod resources instead of Pod count, but using it on the same metrics as HPA simultaneously can cause conflicts.
Operations
kubectl Basics
kubectl get pods # List Pods
kubectl get pods -o wide # Include Node placement
kubectl describe pod <name> # Pod details + events
kubectl logs <pod-name> # View logs
kubectl logs <pod-name> -f # Follow logs
kubectl exec -it <pod-name> -- sh # Shell into container
Debugging Flow
kubectl get pods shows Pod status. States like CrashLoopBackOff or ImagePullBackOff indicate the cause. kubectl describe pod <name> reveals events — whether the Scheduler failed to find a Node, resources were insufficient, or the image pull failed. kubectl logs shows application logs. If logs are not enough, kubectl exec provides direct access inside the container.
A Pod failing to start after deployment is the most common k8s issue backend developers encounter. Learning this flow covers most debugging scenarios.
Wrap-up
k8s operates on a declarative model: declare the desired state and the system maintains it. Set replicas: 3 on a Deployment and k8s keeps 3 Pods running. Set a target CPU on HPA and k8s adjusts Pod count automatically. For backend developers, the key responsibilities are implementing health check endpoints, configuring resource requests, and knowing kubectl for debugging.