新系统使用rockylinux9.5,旧系统虚拟机装的是centos7
1 目标服务器
1.1 禁止swap
swapoff -a
vi /etc/fstab
#/dev/mapper/rl-swap none swap defaults 0 0
#执行,swap一行都是0
free -h
1.2 关闭防火墙
只是为了减少维护成本。
systemctl stop firewalld
systemctl disable firewalld
systemctl status firewalld
1.3 关闭SE
# 临时关闭 重启系统后还会开启
setenforce 0
# 永久关闭
vi /etc/selinux/config
# 将SELINUX=enforcing改为SELINUX=disabled,
1.4 更改主机名
hostnamectl set-hostname master7
1.5 添加host
vi /etc/hosts
10.101.10.6 master6
10.101.10.7 master7
10.101.10.8 master8
1.6 配置ip_forward机制
# 设置
modprobe br_netfilter
# net.ipv4.ip_forward为0,则pod的ip无法转发
sysctl -w net.ipv4.ip_forward=1
sysctl -w net.bridge.bridge-nf-call-iptables=1
sysctl -w net.bridge.bridge-nf-call-ip6tables=1
sysctl -p
# 检查
sysctl -a | grep net.ipv4.ip_forward
sysctl -a | grep net.bridge.bridge-nf-call-iptables
sysctl -a | grep net.bridge.bridge-nf-call-ip6tables
1.7 时间同步
sudo dnf install chrony
sudo systemctl start chronyd
sudo systemctl enable chronyd# 添加配置
vi /etc/chrony.conf
# 添加如下配置
pool ntp1.aliyun.com iburst
pool ntp2.aliyun.com iburstserver ntp1.aliyun.com iburst
server ntp2.aliyun.com iburst
server ntp3.aliyun.com iburst
server ntp4.aliyun.com iburst
server ntp5.aliyun.com iburst
server ntp7.aliyun.com iburst# 立即同步
sudo chronyc -a makestep# 查看时间状态
timedatectl status
1.8 添加rancher用户
useradd rancher
usermod -aG docker rancher
echo 123456 | passwd --stdin rancher
cat /etc/group | grep docker
2 源服务器
由原来的master节点添加新的节点,因此这个是在源服务器上执行。
2.1 免密登录
# 在原master节点中执行
su - rancher
ssh-copy-id rancher@master7
2.2 安装新的rke
curl -sfL https://get.rke2.io | sh -
2.2 添加节点
rke管理k8s节点的新增与删除,更改cluster.yml配置,然后执行rke up --update-only --config cluster.yml,因为涉及到etcd的添加,因此需要选择空闲时段来处理。
2.3 安装kubectlctl
安装对应的kubectl
https://dl.k8s.io/release/v1.30.7/bin/linux/amd64/kubectl
chmod +x kubectl
cp -a kubectl /usr/bin
cd /root
mkdir .kube
cp /home/rancher/kube_config_cluster.yml /root/.kube/config
3 一些问题
3.1 docker版本不兼容问题
su - rancher
rke up --update-only --config cluster.yml
执行完命令后,提示下面的错误信息,rancher官网也有这个错误Failed to set up SSH tunneling for host [xxx.xxx.xxx.xxx]: Can't retrieve Docker Info#
WARN[0000] Failed to set up SSH tunneling for host [master6]: Can't retrieve Docker Info: error during connect: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info": Unable to access node with address [master6:22] using SSH. Please check if you are able to SSH to the node using the specified SSH Private Key and if you have configured the correct SSH username. Error: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain
WARN[0000] Removing host [master6] from node lists
INFO[0000] [network] No hosts added existing cluster, skipping port check
但在源服务器中执行,下面的命令是通过的
ssh -i ~/.ssh/id_rsa rancher@master6
查看docker版本,估计是docker版本
# 目标服务器
[root@master6 ~]# docker --version
Docker version 27.4.0, build bde2b89
# 源服务器
[root@master1 ~]# docker --version
Docker version 19.03.8, build afacb8b
docker并不是最新的就好,当前 rke 版本Release v1.6.5,但是安装的时候提示,也就是说docker27.4.1当前不支持。因此还得做版本回退。
[rancher@master8 ~]$ rke up --config cluster.yml
INFO[0000] Running RKE version: v1.6.5
INFO[0000] Initiating Kubernetes cluster
INFO[0000] [certificates] GenerateServingCertificate is disabled, checking if there are unused kubelet certificates
INFO[0000] [certificates] Generating Kubernetes API server certificates
INFO[0000] [certificates] Generating admin certificates and kubeconfig
INFO[0000] [certificates] Generating kube-etcd-master6 certificate and key
INFO[0000] [certificates] Generating kube-etcd-master7 certificate and key
INFO[0000] [certificates] Generating kube-etcd-master8 certificate and key
INFO[0000] Successfully Deployed state file at [./cluster.rkestate]
INFO[0000] Building Kubernetes cluster
INFO[0000] [dialer] Setup tunnel for host [master7]
INFO[0000] [dialer] Setup tunnel for host [master8]
INFO[0000] [dialer] Setup tunnel for host [master6]
FATA[0001] Unsupported Docker version found [27.4.1] on host [master8], supported versions are [1.13.x 17.03.x 17.06.x 17.09.x 18.06.x 18.09.x 19.03.x 20.10.x 23.0.x 24.0.x 25.0.x 26.0.x 26.1.x 27.0.x 27.1.x 27.2.x]
重置docker环境
systemctl disable docker
sudo systemctl stop docker.socket
systemctl stop docker
dnf remove docker-ce docker-ce-cli containerd.io docker-compose-plugin -y
# 删除docker数据
sudo rm -rf /var/lib/docker
sudo rm -rf /var/lib/containerd
rm -rf /home/docker
# 清理残留文件,如果是重装下面两步也可以跳过
sudo rm -rf /etc/docker
sudo rm -rf /etc/systemd/system/docker.service.d
# 查看可用的docker
sudo yum list docker-ce --showduplicates | sort -r
# 安装指定版本的docker
yum install docker-ce-27.2.1-1.el9 docker-ce-cli-27.2.1-1.el9 containerd.io -y
# 更改docker路径
vi /lib/systemd/system/docker.service
# 重启docker
systemctl start docker
systemctl enable docker
3.2 rke下载不了文件
虽然你改了/etc/docker/daemon.json,但是执行rke up --config cluster.yml,镜像还是下载不下来。在各个节点手工执行一下,如下面拉去对应的镜像,然后再rke up --config cluster.yml就可以往下走了。
docker pull rancher/rke-tools:v0.1.105
下面是执行过程中,我的截图,可以看到有些rancher相关的镜像比较大,都有16.GB,而有些镜像还在下载过程中。
3.3 canal安装失败
calico-kube-controllers安装也失败,但是解决下面的问题后,一并会解决
# 执行这个可以看到详细的错误日志
kubectl describe pod canal-5vznx -n kube-systemEvents:Type Reason Age From Message---- ------ ---- ---- -------Normal Scheduled 32m default-scheduler Successfully assigned kube-system/canal-5vznx to master7Normal Pulling 27m (x4 over 32m) kubelet Pulling image "rancher/calico-cni:v3.28.1-rancher1"Warning Failed 25m (x4 over 31m) kubelet Error: ErrImagePullWarning Failed 24m (x7 over 31m) kubelet Error: ImagePullBackOffWarning Failed 11m (x7 over 31m) kubelet Failed to pull image "rancher/calico-cni:v3.28.1-rancher1": rpc error: code = Canceled desc = context canceledNormal BackOff 2m44s (x77 over 31m) kubelet Back-off pulling image "rancher/calico-cni:v3.28.1-rancher1"# 于是手工执行
docker pull rancher/calico-cni:v3.28.1-rancher1
docker pull rancher/mirrored-calico-node:v3.28.1
3.5 kuboard安装失败
下面看还是同样的问题,镜像下载不下来,这个是因为kuboard要设置secret到本地harbor中下载镜像。
Events:Type Reason Age From Message---- ------ ---- ---- -------Normal Scheduled 46s default-scheduler Successfully assigned kube-system/kuboard-559bccdc6-zf67z to master6Normal BackOff 18s (x2 over 44s) kubelet Back-off pulling image "10.101.10.2:8081/mid/eipwork/kuboard:latest"Warning Failed 18s (x2 over 44s) kubelet Error: ImagePullBackOffWarning FailedToRetrieveImagePullSecret 3s (x5 over 46s) kubelet Unable to retrieve some image pull secrets (regcred); attempting to pull the image may not succeed.Normal Pulling 3s (x3 over 45s) kubelet Pulling image "10.101.10.2:8081/mid/eipwork/kuboard:latest"Warning Failed 3s (x3 over 45s) kubelet Failed to pull image "10.101.10.2:8081/mid/eipwork/kuboard:latest": Error response from daemon: unauthorized: unauthorized to access repository: mid/eipwork/kuboard, action: pull: unauthorized to access repository: mid/eipwork/kuboard, action: pullWarning Failed 3s (x3 over 45s) kubelet Error: ErrImagePull
kubectl create secret docker-registry regcred \--docker-server=http://harbor的ip:端口 \--docker-username=用户名 \--docker-password=密码\--docker-email=邮箱 \-n kube-system
接口要获取kuboard的token
echo $(kubectl -n kube-system get secret $(kubectl -n kube-system get secret | grep kuboard-user | awk '{print $1}') -o go-template='{{.data.token}}' | base64 -d)
3.6 kuboard拿不到token
以往都很容易执行上面的命令就可以了,但是今天不知道为什么kuboard没有创建对应的secret。检查账户信息,里面确实没有scecret
kubectl get serviceaccount kuboard-user -n kube-system -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:annotations:kubectl.kubernetes.io/last-applied-configuration: |{"apiVersion":"v1","kind":"ServiceAccount","metadata":{"annotations":{},"name":"kuboard-user","namespace":"kube-system"}}creationTimestamp: "2024-12-21T07:24:12Z"name: kuboard-usernamespace: kube-systemresourceVersion: "3491"uid: 7d46c0a1-07e9-4cb2-ad99-00b7e6091151
解决方案如下,创建了secret,接着按照上面的命令,从secret中拿到token就可以登录kuboard的网页了。
# 这个命令会创建一个新的Token Secret
kubectl apply -f - <<EOF
apiVersion: v1
kind: Secret
metadata:name: kuboard-user-tokennamespace: kube-systemannotations:kubernetes.io/service-account.name: kuboard-user
type: kubernetes.io/service-account-token
EOF# 将这个新创建的Secret关联到ServiceAccount
kubectl patch serviceaccount kuboard-user -n kube-system --patch '{"secrets":[{"name":"kuboard-user-token"}]}'