Chase Mao's blog

Blog Building From Zero (4) Distrubted Deploy: Docker and Kubernetes

2024-06-25

Preface

In last part Blog Building From Scratch (3) Blog Server : Nodejs, React and Golang, we build a full stack blog server and deploy it in a vm. We have already achieved our our goal, to build a running blog from scratch, which is great. But what if we dive deep into deployments and try to build distributed system, to run the blog server in multi vms and enhance its realability. It will be our goal in this part, to deploy blog server in multi vms and use kubernetes to orchestra them.

Docker

Before turning into kubernetes, we must introduce docker. Docker is a container technology, to establish a stable enviorment for application to run. When it comes to our blog server, how will docker help us.

Let’s go back on how we deploy blog server in last part, we set up a service to run Node and Golang. In order to do that, we must install Node and Golang application in vm. And we must fetch necessary dependance for our code, such as npm lib and go lib. We have already done all that in the one vm, so the blog server is able to run on it. If we switch to another vm, the requirement wont be meet. And even worse, we may have conflict version of dependance, it could become a disater to deploy our blog server in another vm.

Here is where docker can be helpful. It provide a container, in which all necessary dependance have been prepared. So we can deploy our blog server without troublsome enviroment issue.

With docker, we can write a file named Dockerfile, which indicate how to build a container image. It will some commands like shell commands, which help prepare enviroment and start run our application. After the image has been composed, we can push it into docker hub, and we can fetch it from other vm, and use that image to start a docker container. And bang, server is up already.

Below is a example of blog Dockerfile.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# Stage 1: Build the Go binary
FROM golang:1.22 AS go-builder

WORKDIR /go/src/app
COPY blogbackend/ ./
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o blogbackend .

# Stage 2: Build the Node.js application
FROM node:22 AS node-builder

WORKDIR /node
COPY blogfrontend/ ./
RUN npm install

# Stage 3: Final stage where both Node.js and Go applications exist
FROM node:22

# Copy Go binary from Stage 1
WORKDIR /go-app
COPY --from=go-builder /go/src/app/blogbackend .

# Copy Node.js application from Stage 2
WORKDIR /node-app
COPY --from=node-builder /node .

# Expose ports for both applications
EXPOSE 5000

# Run both applications (replace "your-node-app.js" with your actual Node.js app entry point)
CMD /go-app/blogbackend & node server.js

It is quite self explained. Copy go source code and build, copy nodejs and react source code and install, and finally put them together and run our blog. It will be better if we pull source code from git without depending on local source code. I will improve that later.

Then let’s discuss how to install and use docker step by step. About installing, we can follow Docker Docs Guide.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
for pkg in docker.io docker-doc docker-compose docker-compose-v2 podman-docker containerd runc; do sudo apt-get remove $pkg; done

# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

# Add the repository to Apt sources:
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update


sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

After install, we can test hellow world image, and we should be able to see hello world greeter.

1
sudo docker run hello-world

With docker installed and Dockerfile prepared, we can build and run our blog image.

1
2
3
4
sudo docker build -t blog .

# -p <host-port>:<container-port> -v <host-directory>:<container-directory>
sudo docker run -p 127.0.0.1:5000:5000 -v /home/ubuntu/workspace/blogarticle:/blogarticle blog

Pay attention to it that we publish container’s 5000 port into localhost’ 5000 port, if the blog service we set in last part is still using 5000 port, we should stop it sudo systemctl stop blog. And we share files /home/ubuntu/workspace/blogarticle in local to /blogarticle in container.

After docker run we should able to reach out to blog server by curl localhost:5000, and we can forward port in from vm to local to test it in local browser.

1
ssh -N -f -L 5000:127.0.0.1:5000 vm_ssh_account

It use ssh local forward, when we open localhost:5000 in browser, the requset will be forward to 127.0.0.1:5000 in vm. Besides it, we can use VSCode to ssh in vm and forward ports too. We will leave that part to Google and Chatgpt.

We can also attach into docker container, and test it inside container.

1
2
3
4
# List out all the container
sudo docker ps

sudo docker exec -it container_id /bin/sh

With image build successfully, we can push it to docker hub and reuse it in another vm. To use docker hub, we need register an account in docker hub, and then create a repository.

1
2
3
4
5
6
7
8
# Login docker hub
sudo docker login

# Tag our image with repository (rename chasemao/private)
sudo docker tag blog chasemao/private

# Push image by tag
sudo docker push chasemao/private

After image is pushed, we can find it in docker hub, then we can use in another vm. Let’s apply a new vm. If the repository is private, we need to login sudo docker login.

1
sudo docker run -p 127.0.0.1:5000:5000 -v /home/ubuntu/workspace/blogarticle:/blogarticle blog

Great, we can deploy our blog server in different vm very easy by using docker.

Wait for a moment, if we have like hundreds of vm to deploy, it will still be a diaster. That is where kubernetes could be useful.

Kubernetes

How does Kubernetes work in general?

The typical usage is when we have lots of vms. Some of vm will run as control plane, or worker, or both. We can send commands to control plane, for example we want to deploy 100 blog server image. The control plane will execute that command, and it will schedule how to deploy them. If got enough resource, 100 image will be deployed in some workers. The running image in worker is called pod.

That is the overview of how Kubernetes work. To build a Kubernetes cluster, we need install component in vm.

  • As for worker, we need install kubelet, which monitor the resource, report to api server in control plane, and manage pod in the vm. Besides kubelet, kube proxy should be set up for network communication.
  • As for control plane, there will be four main. API server, which expose all api of control plane and used by other components to communicate with each other. ETCD, which is the database of control plane. Scheduler, which schedule pods on proper workers. Controller Manager, which notices all states within cluster and response when something is wrong.

There are many ways to set up Kubernetes cluster. I choose to use Kubeadm recommand in official documentation. There are lots of details in it, and we highly sugget reading the documents throughout. We will list important steps below.

Set up default cgroup driver for docker and containerd

Kubernetes use systemd as default cgroup driver, so we need set up it for docker and containerd too. By the way, containerd will be automaticlly installed if we follow step before and have installed docker.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
sudo vim /etc/docker/daemon.json

# Add following in daemon.json
{
  "exec-opts": ["native.cgroupdriver=systemd"]
}

# Restore default config for containerd
containerd config default | sudo tee /etc/containerd/config.toml

# Choose to use systemd for containerd
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/g' /etc/containerd/config.toml

# Restart
sudo systemctl restart containerd
sudo systemctl restart docker

Disable swap

There will be some issue if not set, because kubelet is not properly configured to use swap.

1
2
3
4
5
# Disable swapping temporarily
sudo swapoff -a 

# Delete anything related with swp
sudo vim /etc/fstab

Install necessary components in each vm

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
sudo apt-get update

sudo apt-get install -y apt-transport-https ca-certificates curl gpg

curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.30/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg

echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.30/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list

sudo apt-get update

sudo apt-get install -y kubelet kubeadm kubectl

sudo apt-mark hold kubelet kubeadm kubectl

sudo systemctl enable --now kubelet

Set up public ip

I encountered huge problem in this step about the network config. Let me explain a little.

Usually Kubernetes cluster is deployed in a private network. Each node can comunicate each other via private ip.

As for me, I have three vms which are not in the same private network. So I have to use public ip. It is ok too for Kubernetes. But here comes the tricky part. The vms which I bought from cloud provider is in a NAT enviroment, which means when using ip addr show or ifconfig, there is only private ip in network interface, no public ip at all. The public ip is given in the console of cloud provider. When we access to that public ip, the traffic will be route to the private ip. If I want to init the Kubernetes with that public ip, I really cannt, because Kubelet will double check if the ip is in the network interface. If it is not, it wont run.

Let’s summary the problem, I want to use public ip to set up Kubernetes cluster, but the ip is not in the vm’s network interface, So the init will fail.

I have found the same issue in Stackoverflow. Finally I make it run by adding the public ip into network interface mannuly like below.

1
2
3
4
5
6
7
8
9
# Add public ip into network interface
sudo ip addr add xx.xx.xx.xx/32 dev eth0

# Double check it
ip addr show eth0

# Add it into boot script to make sure if vm restart the ip still be set
sudo vim /etc/rc.local
sudo systemctl enable rc.local

Init Kubernetes cluster

Now we can init kubernetes cluster with sudo kubeadm init, but I have a public ip problem, so I need to speicify the ip I gonna use by sudo kubeadm init --config init.yaml, init.yaml is like bellow.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: "xx.xx.xx.xx"
  bindPort: 6443
nodeRegistration:
  kubeletExtraArgs:
    "node-ip": "xx.xx.xx.xx"
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
controlPlaneEndpoint: "xx.xx.xx.xx:6443"

controlPlaneEndpoint is the public ip, and can be set as domain too, if We want to set control plane cluster, the domain will be a universe endpoint of control plane cluster by using load balancer.

After init success, it will recommand us to run follwing commands to use kubectl without root permission.

1
2
3
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Besides that it will give a command for other worker or control plane to join the cluster.

1
2
sudo kubeadm join xx.xx.xx.xx:6443 --token xxxxxxxxx \
        --discovery-token-ca-cert-hash xxxxxxxx

We can use its commands on other vm, but we will do that later. Now let’s check the status of control plane.

1
kubectl get nods

We will find a nod in it, but it is not ready. It is because we haven’t install CNI (Container Network Interface), which help set up network configuration.

Install CNI

There are many cni avaiable for Kubernetes. We can find them here.

I used cilium, and will introduce how to install it according to officical documentation.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# Get helm to install cilium
wget https://get.helm.sh/helm-v3.15.1-linux-amd64.tar.gz
tar -zxvf helm-v3.15.1-linux-amd64.tar.gz
sudo mv linux-amd64/helm /usr/local/bin/helm

# Install cilium
helm repo add cilium https://helm.cilium.io/
helm install cilium cilium/cilium --version 1.15.6 --namespace kube-system --set k8sServiceHost=xx.xx.xx.xx --set k8sServicePort=6443

# Enable relay and ui
helm upgrade cilium cilium/cilium --version 1.15.6 \
   --namespace kube-system \
   --reuse-values \
   --set hubble.relay.enabled=true \
   --set hubble.ui.enabled=true

Pay attention to that we need setup k8sServiceHost which will be controlPlaneEndpoint if we set before or the ip address of control plane. We can check if cilium is running.

1
kubectl -n kube-system get pods

Worker join cluster

Now we can make workers join cluster. Before that make sure, other vm has intalled necessary components. Usually we can run kubeadm join command given before. But I have a public ip issue. So I am using join.yaml like below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
apiVersion: kubeadm.k8s.io/v1beta3
kind: JoinConfiguration
nodeRegistration:
  kubeletExtraArgs:
    "node-ip": "xx.xx.xx.xx"
discovery:
  bootstrapToken:
    apiServerEndpoint: "xx.xx.xx.xx:6443"
    token: xxxx
    caCertHashes: ["xxxxx"]

Note that node-ip is the public ip of worker, and apiServerEndpoint is controlPlaneEndpoint we set for control plane before, and token and caCertHashes is the token provide in the kubeadm join command after we run kubeadm init.

We can run sudo kubeadm join --config join.yaml to join. And test it by run kubectl get nods in control plane. We should see a new node just added.

Set up secret key for docker

If we are using private repository in docker hub to store image, we need to set up secret to fetch the image in Kubernetes cluster.

1
2
3
4
5
kubectl create secret docker-registry blog \
  --docker-server="https://index.docker.io/v1/" \
  --docker-username=xxx \
  --docker-password=xxx \
  --docker-email="xxx@xx.com"

Exciting deployments

After join another two vms, I got a Kubernetes cluster with three vms. Now we can deploy our blog server by writing a deploy.yaml file.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: blog
  labels:
    app: blog
spec:
  selector:
    matchLabels:
      app: blog
  template:
    metadata:
      labels:
        app: blog
    spec:
      containers:
        - name: blog
          image: "your repository (like chasemao/private)"
          imagePullPolicy: "Always"
          ports:
            - containerPort: 5000
          volumeMounts:
            - name: article-volume
              mountPath: /blogarticle
      imagePullSecrets:
        - name: blog
      volumes:
        - name: article-volume
          hostPath:
            path: /home/ubuntu/workspace/blogarticle

The kind is DaemonSet, which means it will be deployed on every vm. We can alse set it Deployment, and set spec.replicas as the number of pod. containerPort indicate the port we want it to expose. Because our node server is running on port 5000, so We expose port 5000 here. And volumeMounts indicate what directory we want to map into container. Because our golang application get article info from blogarticle directory, we need to mount that directory. Don’t forget that /home/ubuntu/workspace/blogarticle directory is where we set up our crontab task to pull article from git. We can review it in last part Blog Building From Zero (3) Blog Server : Nodejs, React and Golang. If using a private docker hub repository, we need to set imagePullSecrets which we set up before.

The we can apply this file.

1
kubectl apply -f deploy.yaml

After apply, we can check pod status.

1
kubectl get pods

If everything works fine, we should be able to see some pods running here.

We may found no pod is running on control plane, beacuse control plane have a no scheduler taint. We can remove it.

1
kubectl taint node node_ip node-role.kubernetes.io/control-plane:NoSchedule-

Expose service

Let’s introduce bit of background of service. It has two function:

  • Being Gateway of pods. When we deploy many pods, to access these pods. We can use each pods ip. Service provide a unified entry to reach all these pods.

  • Exposing pods. We can use differen type of service. Official documentation is here. ClusterIP is exposing pods in local. NodePort is exposing pods with node’s ip. LoadBalancer is using a load balancer, which Kubernetes does not directly offer, and is usually a component in cloud provider.

Got some pods running, we can use ClusterIP type service to access it using service.yaml.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
apiVersion: v1
kind: Service
metadata:
  name: blog
spec:
  selector:
    app: blog
  ports:
    - protocol: TCP
      port: 5000
      targetPort: 5000
  type: ClusterIP

Then run it.

1
kubectl apply -f service.yaml

And check its status.

1
kubectl get svc

We can find clusterip and port in it, and we can test it by curl clusterip:port.

Install dashboard

We can install dashboard to check the resource of cluster in web UI.

1
2
3
4
5
# Add kubernetes-dashboard repository
helm repo add kubernetes-dashboard https://kubernetes.github.io/dashboard/

# Deploy a Helm Release named "kubernetes-dashboard" using the kubernetes-dashboard chart
helm upgrade --install kubernetes-dashboard kubernetes-dashboard/kubernetes-dashboard --create-namespace --namespace kubernetes-dashboard

And then apply a service account kubectl apply -f k8s/dashboard-adminuser.yaml.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
apiVersion: v1
kind: ServiceAccount
metadata:
  name: admin-user
  namespace: kubernetes-dashboard
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: admin-user
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: admin-user
  namespace: kubernetes-dashboard

And then create a token.

1
kubectl -n kubernetes-dashboard create token admin-user

Then we can access it.

1
2
3
4
5
6
7
8
# Get svc ip of dashboard
kubectl -n kubernetes-dashboard get svc

# Forward ip of kubernetes-dashboard-kong-proxy to local
kubectl -n kubernetes-dashboard port-forward svc/kubernetes-dashboard-kong-proxy local_ip:dashboard_ip

# Forward local_ip in vm to loacl pc
ssh -N -f -L 8000:127.0.0.1:local_ip vm_ssh_account

Now we can access dashboard from local pc in https://localhost:8000.

Debug

During setting up Kubernets cluster, We may face with bugs. Here provide useful debug tool.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Check running container in containerd, make sure all component of kubernetes is running
sudo crictl ps -a

# Get logs of container
sudo crictl logs container_id

# Get kubernets events
kubectl get events --all-namespaces

# Get kubelet journal
journalctl -u kubelet | tail

# Get containerd journal
journalctl -u containerd | tail 

# Run curl pod to test
kubectl run curl --image=radial/busyboxplus:curl -i --tty --rm

# Attach into pod
kubectl exec -it <pod-name> -- /bin/bash

# Restart component, if works serveral times when I have no clue what happened
sudo systemctl stop kubelet
sudo systemctl restart containerd
sudo systemctl start kubelet

Access

The best practice of accessing Kubernetes svc is via Load Balancer. Unfortunately, I don’t have one. So I considered serveal ways to do it.

  • I found a interesting project, metallb. We can build a customed load balancer within 3 layer or 2 layer of network. 2 layer require cluster is set up in a LAN, which not sute me. 3 layer need router support BGP protocl, which not suite me too.

  • Using dns as a balance loader. I can add all three node’s ip address into dns record. When client try to reach, they will get one of three. There are two main problem. It means all three vm need to have the https certificate. So I have to find a way to distrbute certificate, in case it expire. And if some vm is down, I have to find a way to remove it from dns record, and when up again, add it back.

  • Using one vm as entry, it solve https certificate things and route the traffic into Kubernets svc. Service will handle the load balance thing and make sure traffic into all vms. The biggest problem is that it have a single node in the entry. Once the entry is down, everything is down.

Finally I chooze last way, it has the same problem as serving in single node, through it still prevents part of problem which is when blog server is down, it can still use blog server in other vm. Later when I got time, I may use planB.

To set up is quite easy, We just need to change the nginx configuration.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# add in http{}
server {
   listen 443 ssl;
   listen [::]:443 ssl;
   server_name your_domain.com; # your domain here
   ssl_certificate       /usr/local/etc/v2ray/server.crt; 
   ssl_certificate_key   /usr/local/etc/v2ray/server.key;
   ssl_session_timeout 1d;
   ssl_session_cache shared:MozSSL:10m;
   ssl_session_tickets off;
   ssl_protocols         TLSv1.2 TLSv1.3;
   ssl_ciphers           ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384;
   ssl_prefer_server_ciphers off;
    
    location / {
       proxy_redirect off;
       proxy_pass svc_cluster_ip:svc_port; # Use svc cluster ip and port
       proxy_http_version 1.1;
       proxy_set_header Upgrade $http_upgrade;
       proxy_set_header Connection "upgrade";
       proxy_set_header Host $host;
       proxy_set_header X-Real-IP $remote_addr;
       proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
   }
}

It will route the traffic from domain into svc.

Out of curiosity, I test its performance between with and without Kubernets.

With help of hey, I do the pressure test like below.

1
2
# -c how many worker, -q each worker's max qps, -n total request
./hey -c 50 -q 2 -n 2000 https://chasemao.com

Below is running without Kubernets.

qps p99(ms) avg(ms) cpu peak
200 66 26 37%
200 66 26 37%
400 59 26 46%
600 64 26 64%
800 70 27 80%
1000 80 29 95%
1200 77 33 97%
1400 N/A N/A N/A

Below is running with Kubernets.

qps p99(ms) avg(ms) cpu peak
200 486 117 26%
400 185 105 42%
600 243 115 97%
800 N/A N/A N/A

We can see that with Kubernetes, the peak qps is half as without. It seems like it is because I am testing it in control plane, where api server, etcd will consume a lot of cpu.

Summary

Blog Building From Zero (1) Resouces : VM, and Domain

Blog Building From Zero (2) Basic Server : HTTP, HTTPS, and Nginx

Blog Building From Zero (3) Blog Server : Nodejs, React and Golang

Blog Building From Zero (4) Distrubted Deploy : Docker and Kubernetes

In these four articles, we build blog server from scratch successfully, and delpoy in three vms with Kubernetes. It is really expressive to build it and solve each challenge.

  • Apply for VM and domain (challenge of wallet).
  • Run https nginx server.
  • Design blog system architecture.
  • Format markdown and refer image properly.
  • Using docker image.
  • Deploy with Kubernetes, fixing NAT network interface.

I hope you enjoy it, and thank you for reading.