Kubernetes (k8s) is a container orchestration platform. It automates deploying, scaling, and recovering large numbers of containers across a cluster. Beyond a single server where Docker handles things directly, once container counts grow to dozens or hundreds, questions like who restarts a terminated container, how to scale on traffic spikes, and how containers communicate need a declarative model.

Cluster Architecture

A k8s cluster consists of the Control Plane and Worker Nodes.

flowchart TB
    subgraph cp["Control Plane"]
        API["API Server"]
        ETCD["etcd"]
        SCHED["Scheduler"]
        CM["Controller Manager"]
    end

    subgraph wn1["Worker Node"]
        KL1["kubelet"]
        KP1["kube-proxy"]
        CR1["Container Runtime"]
        P1["Pod"]
        P2["Pod"]
    end

    subgraph wn2["Worker Node"]
        KL2["kubelet"]
        KP2["kube-proxy"]
        CR2["Container Runtime"]
        P3["Pod"]
    end

    API --> SCHED
    API --> CM
    API --> ETCD
    API --> KL1
    API --> KL2

Control Plane

The set of components that manage the entire cluster.

API Server is the entry point for all requests. Whether from kubectl or internal components, every k8s operation goes through it. etcd is a distributed key-value store holding the cluster state — which Pods run where, which Deployments exist, and so on.

Scheduler decides which Node should run a newly created Pod, considering resource availability and affinity rules. Controller Manager watches whether the current state matches the declared state. If a Deployment says replicas: 3 but only 2 Pods exist, the Controller creates one more.

Worker Node

The servers where containers actually run.

kubelet manages Pod lifecycles on each Node. It receives instructions from the API Server and starts containers through the Container Runtime. kube-proxy manages Node-level networking rules, routing traffic that arrives at a Service to the appropriate Pod.

End-to-End Flow

Running kubectl apply -f deployment.yaml sends the request to the API Server. It stores the desired state in etcd. The Scheduler picks a Node for each Pod. The Node’s kubelet creates the containers. The Controller Manager continues monitoring and correcting any drift between declared and actual state.

What backend developers interact with directly is kubectl and YAML manifests. The rest is handled internally by k8s.

Core Objects

Pod

A Pod is the smallest deployable unit in k8s. Containers are wrapped in Pods because containers in the same Pod share a network namespace and storage. This enables sidecar patterns like placing a log collector or proxy alongside the main container.

In most cases, one Pod contains one container. Pods are rarely created directly; Deployments manage them.

Deployment

A Deployment declaratively manages Pods. Declare “maintain 3 Pods with this image” and k8s automatically creates the 3 Pods and restarts any that terminate. It is the most frequently used object when deploying backend services.

The default deployment strategy is a rolling update. New Pods spin up one by one while old Pods shut down one by one. The service stays available throughout. If something goes wrong, kubectl rollout undo reverts to the previous version.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api-server
  template:
    metadata:
      labels:
        app: api-server
    spec:
      containers:
        - name: api
          image: api-server:1.2.0
          ports:
            - containerPort: 8080

Service

Pods are created and destroyed frequently, and their IPs change. Using Pod IPs directly is unreliable. A Service provides a stable access point to a set of Pods.

ClusterIP assigns a virtual IP accessible only within the cluster. It is the primary choice for inter-service communication. NodePort opens a specific port on each Node for external access. LoadBalancer automatically provisions a cloud load balancer.

ClusterIP is the most common choice in backend development. Call another service at http://service-name:port and k8s DNS resolves it to the Service’s ClusterIP. No need to implement service discovery separately.

Namespace

A Namespace logically partitions a cluster. It isolates environments like dev, staging, and production within the same cluster. Resource names only need to be unique within a Namespace.

Network

Ingress

If a Service provides access within the cluster, Ingress routes external traffic to internal Services. It distributes traffic based on domain names or URL paths. Since it enables path-based routing without a separate API Gateway, it is frequently used in backend service architectures.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-ingress
spec:
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /users
            pathType: Prefix
            backend:
              service:
                name: user-service
                port:
                  number: 80
          - path: /orders
            pathType: Prefix
            backend:
              service:
                name: order-service
                port:
                  number: 80

api.example.com/users routes to user-service, /orders to order-service. Ingress defines the rules; an Ingress Controller (nginx, traefik, etc.) handles the actual traffic.

Configuration and Storage

ConfigMap and Secret

Embedding configuration in a container image forces a rebuild for every change. A ConfigMap separates configuration data into its own object. It injects values as environment variables or mounts them as volumes.

A Secret has the same structure as a ConfigMap but stores sensitive information like passwords and API keys. Values are base64-encoded (not encrypted), but combined with RBAC, access control is possible.

In backend development, ConfigMap and Secret handle the separation of DB connection strings, external API keys, and similar configuration from code.

PersistentVolume

Pod deletion erases internal data. Workloads like databases need persistent storage. A PersistentVolume (PV) is storage pre-provisioned by a cluster administrator. A PersistentVolumeClaim (PVC) is how a Pod requests a PV. The Pod only knows about the PVC — it does not need to know where the actual storage resides.

Health Checks

For k8s to automatically judge Pod health, the application must expose its status. Backend developers implement this directly.

readinessProbe checks if a Pod is ready to receive traffic. Unready Pods are excluded from Service routing. Useful when the server needs cache warm-up before accepting requests.

livenessProbe checks if a Pod is functioning normally. Failure triggers a restart. It detects deadlocks or unresponsive states.

startupProbe is for slow-starting applications. It defers liveness/readiness checks until startup completes.

containers:
  - name: api
    image: api-server:1.2.0
    readinessProbe:
      httpGet:
        path: /health/ready
        port: 8080
      periodSeconds: 5
    livenessProbe:
      httpGet:
        path: /health/live
        port: 8080
      periodSeconds: 10

Implement /health/ready and /health/live endpoints in the backend server. Readiness typically checks DB connections and external dependencies. Liveness checks only whether the server process itself is alive.

Scaling

Manual Scaling

kubectl scale deployment api-server --replicas=5

This directly changes the Deployment’s replica count. Suitable when traffic patterns are predictable or when pre-scaling for a known event.

HPA

Unpredictable traffic makes manual scaling impractical. HPA (Horizontal Pod Autoscaler) adjusts Pod count based on metrics.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

This says “scale up when average CPU exceeds 70%, scale down when it drops. Minimum 2 Pods, maximum 10.”

metrics-server periodically collects resource usage from each Pod. The HPA Controller compares current average utilization against the target and calculates the needed Pod count. If 3 Pods average 90% CPU with a 70% target, it scales to 90/70 × 3 ≈ 4 Pods.

HPA requires resource requests on the Deployment. Without requests, there is no denominator for “70% of what.”

containers:
  - name: api
    resources:
      requests:
        cpu: 200m
        memory: 256Mi
      limits:
        cpu: 500m
        memory: 512Mi

Requests guarantee minimum resources for a Pod. The Scheduler uses this value to judge Node capacity. Limits cap maximum resources. Exceeding CPU limits causes throttling; exceeding memory limits triggers an OOMKill. Monitoring backend service memory usage and setting appropriate values is important.

Beyond CPU, HPA supports memory and custom metrics (request count, queue length, etc.). VPA (Vertical Pod Autoscaler) adjusts individual Pod resources instead of Pod count, but using it on the same metrics as HPA simultaneously can cause conflicts.

Operations

kubectl Basics

kubectl get pods                    # List Pods
kubectl get pods -o wide            # Include Node placement
kubectl describe pod <name>         # Pod details + events
kubectl logs <pod-name>             # View logs
kubectl logs <pod-name> -f          # Follow logs
kubectl exec -it <pod-name> -- sh  # Shell into container

Debugging Flow

kubectl get pods shows Pod status. States like CrashLoopBackOff or ImagePullBackOff indicate the cause. kubectl describe pod <name> reveals events — whether the Scheduler failed to find a Node, resources were insufficient, or the image pull failed. kubectl logs shows application logs. If logs are not enough, kubectl exec provides direct access inside the container.

A Pod failing to start after deployment is the most common k8s issue backend developers encounter. Learning this flow covers most debugging scenarios.

Wrap-up

k8s operates on a declarative model: declare the desired state and the system maintains it. Set replicas: 3 on a Deployment and k8s keeps 3 Pods running. Set a target CPU on HPA and k8s adjusts Pod count automatically. For backend developers, the key responsibilities are implementing health check endpoints, configuring resource requests, and knowing kubectl for debugging.