Liarlee's Notebook

发表于2023-12-22|更新于2025-09-23|Linux

命令说明vmstat 提供的信息是从硬件和物理参数的角度。vmstat 第一次的数据显示的是从上一次重启到现在的平均值，所以只能用来参考。标题内容就不在解释了，分别是进程，内存，交换， IO，系统，和 CPU。详细参数如下： r - 等待运行的进程数量, 多少个进程在R状态，并一直取得并占用CPU时间片。 b - Uninterreptable Sleep的进程数量，有多少个处于D状态显示的进程，通常代表有多少个进程正常等待IO资源。 swpd： VirtualMemory的值（KB）, 交换的使用量，如果未开启交换分区就会显示是 0。这个指标和后面的 si so 是整体的， swpd 有用量，那么si/so 就有数据， swpd 是 0，那么永远不会有 si/so。 free： IdleMemory的值（KB), 完全可用的内存，这里free的内存是完全没有被分配的。total - buffer - cache - used 计算完了之后剩下的, 与 free 命令中的 free 字段是同样的含义。 buff：被buffer使用的内存量（KB），被 buff 使用的内存空间。 cache：被cache使用的内存量（KB），被 cache 占用的内存空间，这部分和buffer 一样，是临时数据，在落盘后buffer 会被释放， cache可以直接被释放。 si： Swapin 内存从硬盘换入的数据量（KB/s)，读取交换分区（disk）到内存的数据量。 so：Swapout 内存交换到硬盘的数据量（KB/s），写入磁盘交换分区的数据量。 bi： blockin，发送到块设备的块数量（blocks/s), 系统收到disk发送来的数据量，也就是通常说的读磁盘数据。 bo： blockout ，从块设备接收的块数量（blocks/s），系统写入disk的数据量，也就是通常说的写磁盘数据。 in：包括时钟在内的那每秒中断数。 cs：进程上下文每秒切换的次数。 us：非内核代码使用的CPU百分比，包括用户时间和Nice时间。 sy：内核代码使用的CPU时间百分比。 id： CPU的空闲时间百分比。 wa：等待IO操作花费的时间百分比。这个容易误解的是， wa 其实cpu 是空闲的，统计的方式是，在收集数据的时候有多少个进程处于需要处理io或者等待处理io的状态，当时那个时刻的百分比，而实际上等待io的时间其实CPU没有做事，所以在CPU的度量上，这个时间的CPU是可以做其他事情的，可以被其他需要进行计算的进程使用掉这部分时间。 st：虚拟化层操作花费的时间百分比。这个通常是指虚拟化层的限制。这个指标如果有，那么表示虚拟化层对cpu的使用进行了限制（EC2 T系列实例）或者是底层的资源并不完全满足需求而导致的争抢。 gu： Time spent running KVM guest code (guest time, including guest nice). ，这个是新添加的指标，之前没有见过。如果是KVM host os 上面这个指标还是比较有用的。输出结果如下： ~ vmstat -w 1 --procs-- -----------------------memory---------------------- ---swap-- -----io---- -system-- ----------cpu---------- r b swpd free buff cache si so bi bo in cs us sy id wa st gu 2 0 1005492 42600 1328 1973356 4 7 77 312 1767 6 2 2 95 1 0 0 1 0 1005492 42652 1328 1973360 0 0 0 0 2519 3373 1 2 97 0 0 0 1 0 1005492 42652 1328 1973360 0 0 0 8 1001 1844 0 1 99 0 0 0 1 0 1005492 42652 1328 1973360 0 0 0 12 1056 1822 1 0 99 0 0 0 1 0 1005492 42652 1328 1973368 0 0 0 0 1308 2150 1 1 98 0 0 0 1 0 1005492 43156 1328 1973376 0 0 0 0 1394 2245 3 1 97 0 0 0 1 0 1005492 43156 1328 1973376 0 0 0 0 1125 1898 1 1 99 0 0 0 1 0 1005492 43408 1328 1973376 0 0 0 55 1229 2218 1 1 99 0 0 0 1 0 1005492 44948 1328 1973392 0 0 0 0 3544 5490 7 4 89 0 0 0 1 0 1005492 45228 1328 1973428 0 0 0 0 1410 2082 3 4 93 0 0 0分析指标的逻辑输出之后先看前几次输出的 r 和 b。 r 有没有超过核心数； b 这里是不是有数据，数据在运行命令的周期是不是一直存在，数量有没有超过CPU核心数。变化的频率的幅度是什么样的。例子：总共观察10s ，每秒输出一次数据，每秒都有超过cpu 核心数的不可中断进程，那么这代表当前的操作系统可能存在大量io进程，被阻塞的IO进程比较多。然后看最后一列的CPU情况， id的数据是多少，如果这个时候id 比较小，看 us/sy/wa ，cpu 把时间花费在了什么地方。通常的场景下， cpu 应该把时间尽可能的使用在 us 这个部分。如果其他而部分比较多的就需要关注较多的那个部分了。有交换就看看交换，没有交换直接看内存相关， cache + free 总共有多少。如果内存用量是不是在oom 的边缘，然后再看 in/cs，一个是硬中断次数，一个是进程上下文切换次数，这两个一个可能代表潜在的cpu被硬件事件终止，另一个cpu忙于在进程之间反复横跳。这些都可能指向当前系统的效率并不高，或者潜在的问题。命令示例 vmstat -m – 显示内核占用内存的分配情况。 vmstat -a – 将内存的使用量分为活动内存和非活动内存。 vmstat -n 2 10 – 展示vmstat的结果10次，每两秒一次。更多参考https://docs.oracle.com/cd/E19455-01/805-7229/6j6q8svh5/index.html

Kubectl Apply 报错 annotation Too long

发表于2023-12-22|更新于2025-09-23|Kubernetes

重装 Prometheus operator 的时候报错，提示 annotation 太长了，不能 apply > kubectl apply -f ./setup customresourcedefinition.apiextensions.k8s.io/alertmanagerconfigs.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/probes.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/scrapeconfigs.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com created Warning: Detected changes to resource monitoring which is currently being deleted. namespace/monitoring unchanged Error from server (Invalid): error when creating "setup/0prometheusCustomResourceDefinition.yaml": CustomResourceDefinition.apiextensions.k8s.io "prometheuses.monitoring.coreos.com" is invalid: metadata.annotations: Too long: must have at most 262144 bytes Error from server (Invalid): error when creating "setup/0prometheusagentCustomResourceDefinition.yaml": CustomResourceDefinition.apiextensions.k8s.io "prometheusagents.monitoring.coreos.com" is invalid: metadata.annotations: Too long: must have at most 262144 bytes 解决方案使用 Kubectl Create> kubectl create -f ./setup customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/prometheusagents.monitoring.coreos.com created Error from server (AlreadyExists): error when creating "setup/0alertmanagerConfigCustomResourceDefinition.yaml": customresourcedefinitions.apiextensions.k8s.io "alertmanagerconfigs.monitoring.coreos.com" already exists Error from server (AlreadyExists): error when creating "setup/0alertmanagerCustomResourceDefinition.yaml": customresourcedefinitions.apiextensions.k8s.io "alertmanagers.monitoring.coreos.com" already exists Error from server (AlreadyExists): error when creating "setup/0podmonitorCustomResourceDefinition.yaml": customresourcedefinitions.apiextensions.k8s.io "podmonitors.monitoring.coreos.com" already exists Error from server (AlreadyExists): error when creating "setup/0probeCustomResourceDefinition.yaml": customresourcedefinitions.apiextensions.k8s.io "probes.monitoring.coreos.com" already exists Error from server (AlreadyExists): error when creating "setup/0prometheusruleCustomResourceDefinition.yaml": customresourcedefinitions.apiextensions.k8s.io "prometheusrules.monitoring.coreos.com" already exists Error from server (AlreadyExists): error when creating "setup/0scrapeconfigCustomResourceDefinition.yaml": customresourcedefinitions.apiextensions.k8s.io "scrapeconfigs.monitoring.coreos.com" already exists Error from server (AlreadyExists): error when creating "setup/0servicemonitorCustomResourceDefinition.yaml": customresourcedefinitions.apiextensions.k8s.io "servicemonitors.monitoring.coreos.com" already exists Error from server (AlreadyExists): error when creating "setup/0thanosrulerCustomResourceDefinition.yaml": customresourcedefinitions.apiextensions.k8s.io "thanosrulers.monitoring.coreos.com" already exists Error from server (AlreadyExists): error when creating "setup/namespace.yaml": object is being deleted: namespaces "monitoring" already exists 使用 Kubectl Replace使用 create 命令去创建 crd ，使用 replace 更新 crd，他们都不添加 last-applied-configuration 这个字段， > kubectl replace -f ./setup customresourcedefinition.apiextensions.k8s.io/alertmanagerconfigs.monitoring.coreos.com replaced customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com replaced customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com replaced customresourcedefinition.apiextensions.k8s.io/probes.monitoring.coreos.com replaced customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com replaced customresourcedefinition.apiextensions.k8s.io/prometheusagents.monitoring.coreos.com replaced customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com replaced customresourcedefinition.apiextensions.k8s.io/scrapeconfigs.monitoring.coreos.com replaced customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com replaced customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com replaced namespace/monitoring replaced SSA 的方式创建当然还有另一个方法就是使用 server-side apply： > kubectl apply --server-side -f ./setup https://kubernetes.io/docs/reference/using-api/server-side-apply/

archlinux 配置 xorg 使用 nvidia T4

发表于2023-12-21|更新于2025-09-23|Linux

最近的一个想法，基于 dd 到中国区域的 archlinux，可以尝试直接改改 xorg，用用 nvidia 的显卡。大概折腾了一天，记录一下步骤和过程。之前使用的是 Xorg + DWM 的简单架构，软件非常少。那么在这个软件的基础上启用显卡和配置xorgserver 使用显卡，基本上就是这两部分。安装显卡驱动查看显卡信息> lspci 00:1e.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1) 安装驱动直接参考archwiki ，一条命令搞定，走dkms。安装的这个版本的闭源驱动。> pacman -Ss nvidia-dkms extra/nvidia-dkms 545.29.06-1 [installed] NVIDIA drivers - module sources > pacman -S nvidia-utils # 启动驱动程序的守护进程。 # 这个如果不开启的话， xrdp 调用 xorg server 的时候启动会失败，显示无法找到 nvidia module，其实不是找不到，只是由于没有图形化，所以没有装载。 sudo systemctl enable nvidia-persistenced.service # 调整显示的分辨率，目前还是手动调整的，还在改配置文件。 > xrandr --output DVI-D-0 --mode 1920x1080 --rate 60 等pacman自己处理完dkms的模块编译之后，重启os就可以看到nvidia模块被装载。> lsmod | grep nvidia nvidia_uvm 3481600 0 nvidia_drm 118784 7 nvidia_modeset 1585152 6 nvidia_drm nvidia 62402560 141 nvidia_uvm,nvidia_modeset video 77824 1 nvidia_modeset > modinfo nvidia filename: /lib/modules/6.6.7-zen1-1-zen/updates/dkms/nvidia.ko.zst alias: char-major-195-* version: 545.29.06 supported: external license: NVIDIA firmware: nvidia/545.29.06/gsp_tu10x.bin firmware: nvidia/545.29.06/gsp_ga10x.bin srcversion: 8302209549E8FEAC029EDC0 alias: pci:v000010DEd*sv*sd*bc06sc80i00* alias: pci:v000010DEd*sv*sd*bc03sc02i00* alias: pci:v000010DEd*sv*sd*bc03sc00i00* depends: retpoline: Y name: nvidia vermagic: 6.6.7-zen1-1-zen SMP preempt mod_unload 配置xrdp 这里已经可以正常的驱动显卡，但是xorg不会去调用显卡启动图形，查看了archwiki之后，发现如果使用 xrdp ，那么还需要重新安装 xrdp 的后端。需要使用来自aur的后端来匹配nvidia显卡的支持，而不是 archlinux 的主仓库里面的默认软件包，使用 paru 进行的下面步骤。 aur 里面的 xorgxrdp，会自动拉取编译 xrdp-git，因此这里的版本显示和 aur 仓库的不完全一致。> paru -Ss xrdp aur/xorgxrdp-nvidia 0.2.18.r55.g3a4d465-1 [+4 ~0.00] [Installed] Xorg drivers for xrdp, with NVIDIA GPU support. aur/xrdp-git 0.9.18.r565.geb1c3cd4-1 [+30 ~0.01] [Installed: 0.9.18.r599.g9fbe0ad1-1] An open source remote desktop protocol (RDP) server. Git version, devel branch. 安装完成之后，需要重启一下系统。编译安装aur的过程中，最后的步骤里面有一个红色的提示文字，需要在配置中变更一下xrdp读取的xorg配置文件的位置。> cat -n /etc/xrdp/sesman.ini 130 [Xorg] 143 param=Xorg 144 ; Leave the rest parameters as-is unless you understand what will happen. 145 param=-config ; 下面的这两个，启用nvida的配置文件，注释默认的。 ; 这里配置的是启动xorg server 的时候读取的配置文件位置以及 xorg server 的参数。 146 ; param=xrdp/xorg.conf 147 param=xrdp/xorg_nvidia.conf 148 param=-noreset 149 param=-nolisten 之后，使用 nvidia 提供的 util ，生成一份nvidia显卡的的配置文件， merge 配置文件里面的这份。> nvidia-xconfig -c /etc/X11/xrdp/xorg_nvidia.conf # 这里面可以不手动备份，该命令会自动备份之前的配文件，如果有不同的配置会自动merge。最后，我在这样做完之后，重启 xrdp ， xorg 确实开始使用显卡了，但是所有启动的程序不会使用显卡，默认还是CPU在计算，这是因为 xorg 的配置文件里面少了 FILE Section, 这个文件的位置可能会不同，我这边确实在这个位置，文件存在的情况下直接拿来用了。Section "Files" ModulePath "/usr/lib64/nvidia/xorg" ModulePath "/usr/lib64/xorg/modules" EndSection 参考了这个网页中的配置： https://forums.developer.nvidia.com/t/glxinfo-command-returning-badwindow-invalid-window-parameter-error/36172 最终的配置文件内容如下：/etc/X11/xrdp/xorg_nvidia.conf > cat -n /etc/X11/xrdp/xorg_nvidia.conf 1 Section "ServerLayout" 2 Identifier "XRDP GPU Server" 3 Screen 0 "dGPU" 4 InputDevice "xrdpMouse" "CorePointer" 5 InputDevice "xrdpKeyboard" "CoreKeyboard" 6 EndSection 7 8 Section "ServerFlags" 9 # This line prevents "ServerLayout" sections in xorg.conf.d files 10 # overriding the "XRDP GPU Server" layout (xrdp #1784) 11 Option "DefaultServerLayout" "XRDP GPU Server" 12 Option "DontVTSwitch" "on" 13 Option "AutoAddDevices" "off" 14 EndSection 15 16 Section "Files" 17 ModulePath "/usr/lib64/nvidia/xorg" 18 ModulePath "/usr/lib64/xorg/modules" 19 EndSection 20 21 Section "Module" 22 Load "xorgxrdp" 23 EndSection 24 25 Section "InputDevice" 26 Identifier "xrdpKeyboard" 27 Driver "xrdpkeyb" 28 EndSection 29 30 Section "InputDevice" 31 Identifier "xrdpMouse" 32 Driver "xrdpmouse" 33 EndSection 34 35 Section "Screen" 36 Identifier "dGPU" 37 Device "dGPU" 38 Option "DPI" "96 x 96" 39 # T4 needs an entry here, this is not the desktop size 40 SubSection "Display" 41 Virtual 1920 1080 42 EndSubSection 43 EndSection 44 45 Section "Device" 46 Identifier "dGPU" 47 Driver "nvidia" 48 # T4 may need to comment out next line 49 # Option "UseDisplayDevice" "none" 50 Option "ConnectToAcpid" "false" 51 BusID "PCI:0:30:0" 52 EndSection 排查思路和日志的位置。首先查看 xrdp 的 status，是不是正常。排查日志 /var/log/xrdp.log ，这里面记录的是 xrdp 的启动的过程。如果没有错误，那么查看 /var/log/xrdp-sesman.log, 这里面会记录xorg启动的命令和 xorg 的状态。可以把日志里面记录的命令复制出来手动执行，直接看命令的输出；或者查看 xorg 的日志，看看历史记录了什么错误。[2023-12-21T16:50:17.711+0800] [INFO ] Socket 13: connection ac ...

查看EKS集群节点上的容器和ENI的对应关系

发表于2023-12-08|更新于2025-09-23|EKS

版本信息root@ip-172-31-35-61 ~ [1]# containerd -v containerd github.com/containerd/containerd v1.6.16 31aa4358a36870b21a992d3ad2bef29e1d693bec.m root@ip-172-31-35-61 ~# uname -a Linux ip-172-31-35-61.cn-north-1.compute.internal 6.1.10-arch1-1 #1 SMP PREEMPT_DYNAMIC Mon, 06 Feb 2023 09:28:04 +0000 x86_64 GNU/Linux root@ip-172-31-35-60 ~ [1]# kubelet --version Kubernetes v1.24.9-eks-49d8fe8 容器虚拟网卡和节点网卡的关主要的思路是通过veth的id 。找到 Pod 的 Pause 容器 root@ip-172-31-35-61 ~# nerdctl -n k8s.io ps | grep -v pause CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 218279edbee1 918309763551.dkr.ecr.cn-north-1.amazonaws.com.cn/eks/kube-proxy:v1.24.9-minimal-eksbuild.1 "kube-proxy --v=2 --…" 14 hours ago Up k8s://kube-system/kube-proxy-kqlj2/kube-proxy 3a8aaf963501 123456123456.dkr.ecr.cn-north-1.amazonaws.com.cn/haydenarchlinux:latest "sleep 365d" 8 minutes ago Up k8s://default/haydenarch-77c4f7cff9-dxsps/haydenarch 进入容器内部查看网卡的ID root@ip-172-31-35-61 ~# nerdctl -n k8s.io exec -it 3a8aaf963501 fish root@haydenarch-77c4f7cff9-dxsps /# cat /sys/class/net/eth0/iflink 54 root@haydenarch-77c4f7cff9-dxsps /# ip ad 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 3: eth0@if54: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default link/ether 66:3a:fb:f2:00:b2 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 172.31.39.84/32 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::643a:fbff:fef2:b2/64 scope link valid_lft forever preferred_lft forever 在实例上面查看网卡的信息，找到带有 id 54 的虚拟网卡对。 root@ip-172-31-35-61 ~# ip ad | grep 54 54: enid573ff579e6@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default 查看这个虚拟网卡的全部信息，可以看到 link-netns，这是容器的Network NameSpace。 ```shell root@ip-172-31-35-61 ~ [1]# ip link show enid573ff579e6 54: enid573ff579e6@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default link/ether 36:06:dc:85:42:a4 brd ff:ff:ff:ff:ff:ff link-netns cni-bb451da7-00be-2d59-b5e0-1dd4e77565e8 可以使用下面的命令查看所有的ns并找到对应的id。 root@ip-172-31-35-61 ~# ip netns list cni-2e29d022-09f4-6626-f416-29cfb9652e78 (id: 13) cni-e5be29a1-d111-53ec-8f88-94118eb1466c (id: 10) cni-bb451da7-00be-2d59-b5e0-1dd4e77565e8 (id: 8) # 这个就是 cni-673e1c4b-acd1-ab7e-b374-1a4192eea86e (id: 6) cni-1207e72f-e68e-a7e7-161e-979e9e9ada7a (id: 5) cni-a8b7807f-a37e-dd4e-98cb-9d3e7cd42f11 (id: 4) cni-08a7a0e2-8e33-b669-83ee-75957676ffbb (id: 3) cni-8a26c501-2ea9-f4be-36ae-745a5c307fbc (id: 0) cni-2598e89a-d2ba-38a8-6412-92fbff32871b (id: 9) cni-817961be-8d28-b32c-816f-a842b9d243fe (id: 2) cni-a84561aa-8668-8b7c-4b5e-56568e14dc83 (id: 12) cni-5de27d6e-c311-6ee3-4ed5-85b2a05b6943 (id: 11) cni-d102a5e6-d6ea-fb90-2ac8-a97cd8df9c85 (id: 7) cni-88bf7480-bdd6-e063-3103-894d29b16302 (id: 1) 可以直接进入pod 的network ns 查看容器的网络信息，这和上面在容器内直接看到的信息是一致的。 root@ip-172-31-35-61 ~ [1]# ip netns exec cni-bb451da7-00be-2d59-b5e0-1dd4e77565e8 ip ad 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 3: eth0@if54: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default link/ether 66:3a:fb:f2:00:b2 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 172.31.39.84/32 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::643a:fbff:fef2:b2/64 scope link valid_lft forever preferred_lft forever 如何确定容器的网卡绑定在了哪个ENI上面由于VPC CNI本身的实现，在EKS的节点上面会出现多的ENI，具体pod的流量是怎么走向eth网卡的，去了eth0 还是 eth1 或者其他的网卡，需要通过路由条目来确定。和上面的记录时间不同，所以这个部分的记录，是另一个节点的另一个容器了， ip地址会不一致。查看容器的IP地址 > kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES haydenarch-7d9ff55cbd-4b7pl 1/1 Running 0 18h 172.31.55.218 ip-172-31-55-30.cn-north-1.compute.internal <none> <none> 查看节点的路由表，看看这个ip地址的路由信息。 [root@ip-172-31-55-30 ~]# ip rule list | grep 172.31.55.218 512: from all to 172.31.55.218 lookup main 1536: from 172.31.55.218 lookup 2 那么上面的结果显示，这个IP地址的数据包会查 table 2，查看 table 2 [root@ip-172-31-55-30 ~]# ip r s table 2 default via 172.31.48.1 dev eth1 172.31.48.1 dev eth1 scope link 观察的这个容器现在是在 eth1 上面的，查看节点的网卡信息。 [root@ip-172-31-55-30 ~]# ip ad s eth1 17 ...

自管理节点加入集群

发表于2023-12-08|更新于2025-09-23|EKS

添加一个自管理的节点创建集群，启动一个新的 EC2，登录到已经启动的 EKS 优化 OS 内，准备复制一些脚本过来。添加EC2的标签： kubernetes.io/cluster/clusterName owned 配置EC2的Instance Profile 控制台获取 Kubernetes APIServer的Endpoint URL 获取 apiserver b64 CA : cat ~/.kube/config 这个文件里面可以找到，或者是通过EKS的控制台上面，找到 Base64 的 CA。编辑 userdata，或者 ssh 登录到ec2上面创建一个bash脚本用来调用 bootstrap.sh mkdir ~/eks; touch ~/eks/start.sh --- #!/bin/bash set -ex B64_CLUSTER_CA= API_SERVER_URL= K8S_CLUSTER_DNS_IP=10.100.0.10 /etc/eks/bootstrap.sh ${ClusterName} --b64-cluster-ca ${B64_CLUSTER_CA} --apiserver-endpoint ${API_SERVER_URL} 集群里面没有节点组，也不会创建aws-auth configmap 所以节点无法正常的加入集群，需要手动创建。 apiVersion: v1 data: mapRoles: | - groups: - system:bootstrappers - system:nodes rolearn: [CLUSTER_ROLE] username: system:node:{{EC2PrivateDNSName}} mapUsers: | [] kind: ConfigMap metadata: name: aws-auth namespace: kube-system 需要复制的文件sudo pacman -S containerd # 安装Containerd scp /etc/eks/bootstrap.sh root@54.222.253.235:/etc/eks/bootstrap.sh # 复制bootstrap scp /usr/bin/imds root@54.222.253.235:/usr/bin/imds # shell 脚本，用来帮忙调用ec2 metadata 获取实例和VPC子网的信息 scp -pr /etc/eks/ root@54.222.253.235:/etc/eks/ # 直接复制了eks的相关脚本和配置模板 scp -pr /var/lib/kubelet/kubeconfig root@54.222.253.235:/var/lib/kubelet/kubeconfig # 复制kubeletconfig配置文件模板 scp -pr /etc/kubernetes/ root@54.222.253.235:/etc/kubernetes/ # 复制 kubernetes 的配置文件 scp -pr /etc/kubernetes/kubelet/ root@54.222.253.235:/etc/kubernetes/kubelet/ # 上面的命令没有递归复制，所以需要指定 # 设置对应的内核参数，如果不做kubelet 会报错提示这些参数不符合要求。 kernel.panic = 10 kernel.panic_on_oops = 1 vm.overcommit_memory = 1 Bootstrap 脚本内容记录一下脚本自动配置的内容，大概就是获取变量， aws的服务地址， ec2 元数据地址，替换模板中的变量生成Kubelet配置文件和 Containerd 的配置文件（这个替换是一次性的，也就是说， bootstrap只能变更模板中的变量一次，第二次执行只会生成刷新一次集群的信息，以及重启服务）。读取bootstrap后面给出的参数，设置变量，例如： ClusterName etc. 查看Kubelet的版本，决定Runtime， containerd | dockerd, 判断条件是 kubelet 版本大于 1.24++ kubelet --version ++ grep -Eo '[0-9]\.[0-9]+\.[0-9]+' + KUBELET_VERSION=1.24.9 --- + IS_124_OR_GREATER=true + DEFAULT_CONTAINER_RUNTIME=containerd 设置ECR以及Pause容器地址# 获取region以及aws service domain + AWS_DEFAULT_REGION=cn-north-1 + AWS_SERVICES_DOMAIN=amazonaws.com.cn # 调用脚本 /etc/eks/get-ecr-uri.sh cn-north-1 amazonaws.com.cn '' + ECR_URI=918309763551.dkr.ecr.cn-north-1.amazonaws.com.cn + PAUSE_CONTAINER_IMAGE=918309763551.dkr.ecr.cn-north-1.amazonaws.com.cn/eks/pause + PAUSE_CONTAINER=918309763551.dkr.ecr.cn-north-1.amazonaws.com.cn/eks/pause:3.5 + CA_CERTIFICATE_DIRECTORY=/etc/kubernetes/pki + CA_CERTIFICATE_FILE_PATH=/etc/kubernetes/pki/ca.crt 创建证书目录：+ mkdir -p /etc/kubernetes/pki + sed -i s,MASTER_ENDPOINT,https://CE0253A94B6B14215AE3282580CFA5E3.yl4.cn-north-1.eks.amazonaws.com.cn,g /var/lib/kubelet/kubeconfig + sed -i s,AWS_REGION,cn-north-1,g /var/lib/kubelet/kubeconfig + sed -i s,CLUSTER_NAME,NewClusterForManualJoin,g /var/lib/kubelet/kubeconfig 获取 VPC CIDR# imds shell script help to get metadata. imds: /usr/bin/imds ++ imds latest/meta-data/local-ipv4 ++ imds latest/meta-data/network/interfaces/macs/02:66:06:2e:48:08/vpc-ipv4-cidr-blocks 创建kubelet 配置, 计算预留的资源和 MaxPod 等等参数的数值。/etc/kubernetes/kubelet/kubelet-config.json + mkdir -p /etc/systemd/system/kubelet.service.d + sudo mkdir -p /etc/containerd + sudo mkdir -p /etc/cni/net.d + sudo mkdir -p /etc/systemd/system/containerd.service.d 创建containerd 配置文件+ printf '[Service]\nSlice=runtime.slice\n' + sudo tee /etc/systemd/system/containerd.service.d/00-runtime-slice.conf + sudo sed -i s,SANDBOX_IMAGE,918309763551.dkr.ecr.cn-north-1.amazonaws.com.cn/eks/pause:3.5,g /etc/eks/containerd/containerd-config.toml kubelet配置和启动。+ sudo cp -v /etc/eks/containerd/kubelet-containerd.service /etc/systemd/system/kubelet.service + sudo chown root:root /etc/systemd/system/kubelet.service + sudo containerd config dump + systemctl enable kubelet + systemctl start kubelet

Filebeat 输出日志到 Opensearch

发表于2023-12-08|更新于2025-09-23|Docker

这个最后基本上可以确认是一个兼容性问题，测试完成发现，开启兼容模式的Opensearch+filebeat的组合， filebeat 还是会不定期重启。背景需求是，使用ES + filebeat 模式在收集日志。使用Supervisor作为容器的主进程管理工具，启动后分别运行应用（这里用nginx代替） + filebeat 现在想要用ECS Fargate，然后依旧还是这个模式，尽可能新的变动之前的架构， ES 替换成 OpenSearch。按照这个路数测试。创建Opensearch版本：OpenSearch 2.11 (latest)OpenSearch_2_11_R20231113-P1 (latest)Availability Zone(s)1-AZ without standby 构建Supervisor管理的容器创建 Dockerfile创建dockerfile的部分，比较难的是，需要找到合适的filebeat版本参考页面: Agents and ingestion tools其他的步骤就下载安装就可以. # 使用官方Nginx作为基础镜像 FROM reg.liarlee.site/docker.io/nginx # 安装Supervisor RUN apt-get update && apt-get install -y supervisor RUN mkdir -p /var/log/supervisor RUN mkdir -p /etc/filebeat/ #RUN curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-8.11.3-amd64.deb && dpkg -i filebeat-8.11.3-amd64.deb RUN curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-oss-7.12.1-amd64.deb && dpkg -i filebeat-oss-7.12.1-amd64.deb COPY filebeat.yml /etc/filebeat/filebeat.yml COPY nginx.conf /etc/nginx/nginx.conf # 将Supervisor配置文件复制到容器中 COPY supervisord.conf /etc/supervisor/conf.d/supervisord.conf # 启动Supervisor来管理Nginx进程 CMD [ "/usr/bin/supervisord", "-n" ] 准备配置文件需要准备的配置文件一共3个： supervisord.conf supervisor的管理配置，决定了那些进程被管理。 > cat ./supervisord.conf [unix_http_server] file=/var/run/supervisor.sock ; (the path to the socket file) chmod=0700 ; socket file mode (default 0700) chown=nobody:nogroup ; socket file uid:gid owner [supervisord] logfile_maxbytes=50MB ; 日志文件的最大大小 logfile_backups=10 ; 日志文件的备份数 loglevel=info ; 日志级别 nodaemon=false ; 是否以守护进程模式启动Supervisor minfds=1024 ; 可以打开的文件描述符的最小数量 minprocs=200 ; 可以创建的进程的最小数量 [program:nginx] command=/usr/sbin/nginx -g "daemon off;" ; 启动Nginx的命令 autostart=true ; 在Supervisor启动时自动启动 autorestart=true ; 程序异常退出后自动重启 stderr_logfile=/var/log/nginx/error.log ; 错误日志文件路径 stdout_logfile=/var/log/access.log ; 访问日志文件路径 [program:filebeat] command=/usr/bin/filebeat -e -c /etc/filebeat/filebeat.yml ; 启动Filebeat的命令 autostart=true autorestart=true stderr_logfile=/var/log/filebeat.err.log stdout_logfile=/var/log/filebeat.out.log filebeat.yml filebeat的配置文件。这配置文件 GPT 会直接写出一个可以用 output.opensearch:，其实还是不能的，只能使用原本的配置文件。 (也许是我选择的filebeats版本不正确, 所以不行吧 filebeat本身是es序列里面的产品，不支持opensearch也合理，如果写成opensearch 会找不到output 的定义，也说明并不支持这个字段。 2023-12-14T12:03:12.560Z INFO [publisher_pipeline_output] pipeline/output.go:145 Attempting to reconnect to backoff(elasticsearch(https://vpc-ecs-nginx-opensearch-qt7m5rmhddggkiuapyybcmz5oe.cn-north-1.es.amazonaws.com.cn:443)) with 7 reconnect attempt(s) ```shell > cat ./filebeat.yml filebeat.inputs: - type: filestream id: nginxaccesslog paths: - /var/log/access.log fields: log_type: access seccomp.enabled: false # 这个不关闭的话可能会是一个干扰。 logging.level: debug # 由于调试方便设置了DEBUG。 # 这个配置段是关闭 xpack， xpack功能只在es里面提供，商业版本。 ilm.enabled: false setup.ilm.enabled: false setup.pack.security.enabled: false setup.xpack.graph.enabled: false setup.xpack.watcher.enabled: false setup.xpack.monitoring.enabled: false setup.xpack.reporting.enabled: false # output就是还用es output.elasticsearch: enable: true hosts: ["vpc-ecs-nginx-opensearch-qt7m5rmhddggkiuapyybcmz5oe.cn-north-1.es.amazonaws.com.cn:443"] # 这个部分需要手动指定443, 因为是es的默认配置, 所以直接去 9200,就会连接不上. protocol: "https" xpack 报错的日志大概是这样的： 2023-12-14T12:03:12.560Z ERROR [publisher_pipeline_output] pipeline/output.go:154 Failed to connect to backoff(elasticsearch(https://vpc-ecs-nginx-opensearch-qt7m5rmhddggkiuapyybcmz5oe.cn-north-1.es.amazonaws.com.cn:443)): Connection marked as failed because the onConnect callback failed: request checking for ILM availability failed: 401 Unauthorized: {“Message”:”Your request: ‘/_xpack’ is not allowed.”} 2023-12-14T12:03:12.560Z INFO [publisher_pipeline_output] pipeline/output.go:145 Attempting to reconnect to backoff(elasticsearch(https://vpc-ecs-nginx-opensearch-qt7m5rmhddggkiuapyybcmz5oe.cn-north-1.es.amazonaws.com.cn:443)) with 7 reconnect attempt(s) nginx.conf 这个是nginx 应用文件，模拟一个应用程序，提供webserver服务。配置文件就是标准的配置文件, 修改一下日志输出的路径. access_log /var/log/access.log main; 由于baseimage用的是nginx的，所以nginx 的日志输出会软链接到/dev/stdout, filebeat 不收软链接的文件, 开了DEBUG会看到跳过这个文件的日志. Buildstage接下来就可以Build镜像然后进行测试了。 > dive build -t reg.liarlee.site/library/superv-nginx:v31 . > docker push reg.liarlee.site/library/superv-nginx:v31 > docker run -it --name superv-nginx --rm reg.liarlee.site/library/superv-nginx:v31 运行启动之后可以看到输出的日志是： 2023-12-14 14:03:31,093 INFO success: filebeat entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2023-12-14 14:03:31,093 INFO success: nginx entered ...

Windows Core EKS 节点管理命令

发表于2023-12-05|更新于2025-09-23|Kubernetes

windows 查看磁盘空间的使用情况: Get-PSDrive -Name C | Select-Object Name, Free windows 实例的磁盘空间扩容 diskpart list volume select volume 0 extend 从ecr下载镜像 $ecrCreds = (Get-ECRLoginCommand).password Write-Host $ecrCreds ctr -n k8s.io image pull -u AWS:$ecrCreds ecr link pull image 需要使用节点的C盘空间, 　在节点的磁盘空间不足的情况下，会报错。　 rpc error: code = Unknown desc = failed to pull and unpack image failed to extract layer sha256:9ee7a25f1f619685e0c27cd1f08b34fd7a567f8f0fa789gf9aeb79c72169afa: hcsshim::ImportLayer failed in Win32: There is not enough space on the disk. (0x70): unknown

TrueNas Scale 让 ZFS 占用更多内存作为 ARC

发表于2023-10-08|更新于2025-09-23|Application

安装 TrueNas Scale 玩儿了一下，劝退了有点。 TrueNAS Scale 作为 Linux Based 版本在使用内存上比较谨慎， ARC 并不会使用所有的内存，这样的设计过于保守了。内存当然是利用的越充分越好。 Config the ARC Memory to 75% Total Memory : $ echo 2995556352 > /sys/module/zfs/parameters/zfs_arc_max $ echo 268435456 > /sys/module/zfs/parameters/zfs_arc_sys_free 也可以使用一个这样的 shell 脚本来进行分析和配置。这个脚本会将 arc 的比例调整到 90% 的总内存。 #!/bin/sh PATH="/bin:/sbin:/usr/bin:/usr/sbin:${PATH}" export PATH ARC_PCT="90" ARC_BYTES=$(grep '^MemTotal' /proc/meminfo | awk -v pct=${ARC_PCT} '{printf "%d", $2 * 1024 * (pct / 100.0)}') echo ${ARC_BYTES} > /sys/module/zfs/parameters/zfs_arc_max SYS_FREE_BYTES=$((8*1024*1024*1024)) echo ${SYS_FREE_BYTES} > /sys/module/zfs/parameters/zfs_arc_sys_free

TrueNas Core 当前基准测试指标

发表于2023-10-08|更新于2025-09-23|Application

我在日常的使用中，如果是连续的文件，我其实感受不到磁盘性能的问题，一直都很好，如果是小文件随机就非常的折磨。这让我意识到自己还没有对磁盘进行基准测试，记录如下：TrueNAS 读取和写入的测试监控，每次测试5mins，开始的时间是 9：55 ，所有磁盘都有写入的原因是，fio 在创建测试文件来测试读取。 ada0 是一个 sata ssd。ada1 和 ada2 是两个 hdd。吞吐量：IOPS：读取测试这个是符合预期的 read 的所有请求都来自于内存。 root@haydentruenas[/mnt/root_pool]# fio --name=seqread --rw=read --bs=1M --size=1G --numjobs=1 --runtime=300 --time_based --group_reporting seqread: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=1 fio-3.28 Starting 1 process seqread: Laying out IO file (1 file / 1024MiB) Jobs: 1 (f=1): [R(1)][100.0%][r=806MiB/s][r=805 IOPS][eta 00m:00s] seqread: (groupid=0, jobs=1): err= 0: pid=72054: Tue Jul 16 10:01:10 2024 read: IOPS=861, BW=861MiB/s (903MB/s)(252GiB/300001msec) clat (usec): min=1059, max=28988, avg=1149.82, stdev=154.88 lat (usec): min=1060, max=28989, avg=1150.88, stdev=154.93 clat percentiles (usec): | 1.00th=[ 1090], 5.00th=[ 1090], 10.00th=[ 1106], 20.00th=[ 1106], | 30.00th=[ 1123], 40.00th=[ 1123], 50.00th=[ 1123], 60.00th=[ 1123], | 70.00th=[ 1139], 80.00th=[ 1156], 90.00th=[ 1237], 95.00th=[ 1287], | 99.00th=[ 1500], 99.50th=[ 1614], 99.90th=[ 2114], 99.95th=[ 2409], | 99.99th=[ 6194] bw ( KiB/s): min=507904, max=907772, per=100.00%, avg=882606.25, stdev=36668.31, samples=593 iops : min= 496, max= 886, avg=861.50, stdev=35.77, samples=593 lat (msec) : 2=99.87%, 4=0.12%, 10=0.01%, 20=0.01%, 50=0.01% cpu : usr=1.58%, sys=98.14%, ctx=20974, majf=0, minf=257 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=258344,0,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): READ: bw=861MiB/s (903MB/s), 861MiB/s-861MiB/s (903MB/s-903MB/s), io=252GiB (271GB), run=300001-300001msec 写入测试写入的测试结果看起来没有那么美好，感觉是psync的原因，没过内存数据直接写入的机械硬盘，看起来这个速度有点差，这个盘 6Gbps 的总线，能达到 83.3MiB/s，感觉还是有点儿问题，这个性能不足应该来自蜗牛星际的 IO 板子可能不太行。 root@haydentruenas[/mnt/root_pool]# fio --name=seqread --rw=write --bs=1M --size=1G --numjobs=1 --runtime=300 --time_based --group_reporting seqread: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=1 fio-3.28 Starting 1 process Jobs: 1 (f=1): [W(1)][100.0%][w=78.2MiB/s][w=78 IOPS][eta 00m:00s] seqread: (groupid=0, jobs=1): err= 0: pid=72202: Tue Jul 16 10:09:24 2024 write: IOPS=83, BW=83.3MiB/s (87.4MB/s)(24.4GiB/300007msec); 0 zone resets clat (usec): min=642, max=956978, avg=11914.63, stdev=7270.55 lat (usec): min=684, max=957053, avg=11987.96, stdev=7270.61 clat percentiles (usec): | 1.00th=[ 766], 5.00th=[ 9765], 10.00th=[ 11076], 20.00th=[ 11338], | 30.00th=[ 11469], 40.00th=[ 11469], 50.00th=[ 11600], 60.00th=[ 11863], | 70.00th=[ 11994], 80.00th=[ 12256], 90.00th=[ 13042], 95.00th=[ 14222], | 99.00th=[ 22938], 99.50th=[ 35390], 99.90th=[ 49546], 99.95th=[ 54264], | 99.99th=[225444] bw ( KiB/s): min= 7501, max=686786, per=100.00%, avg=85553.58, stdev=31496.12, samples=594 iops : min= 7, max= 670, avg=83.08, stdev=30.76, samples=594 lat (usec) : 750=0.93%, 1000=0.21% lat (msec) : 2=0.84%, 4=0.51%, 10=2.88%, 20=93.54%, 50=1.03% lat (msec) : 100=0.06%, 250=0.01%, 500=0.01%, 1000=0.01% cpu : usr=0.71%, sys=7.19%, ctx=197395, majf=0, minf=0 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=0,25001,0,0 ...

在EC2实例每次启动的时候都运行Userdata

发表于2023-09-06|更新于2025-09-23|Linux

使用Cloud-init提供读取Userdata的功能。需要在Userdata中添加一个 MIME 的头部，覆盖默认的行为。 https://cloudinit.readthedocs.io/en/latest/topics/format.html#mime-multi-part-archivehttps://repost.aws/zh-Hans/knowledge-center/execute-user-data-ec2 具体需要添加的MIME 部分： Content-Type: multipart/mixed; boundary="//" MIME-Version: 1.0 --// Content-Type: text/cloud-config; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="cloud-config.txt" #cloud-config cloud_final_modules: - [scripts-user, always] --// Content-Type: text/x-shellscript; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="userdata.txt" 之后重新启动这个EC2 就可以了。 Troubleshooting Guide 首先可以的查看Userdata ，在控制台或者 AWScli 查看实例内部日志或者 Console的日志：/var/log/cloud-init.log/var/log/cloud-init-output.log 查看userdata 注入的脚本内容：/var/lib/cloud/instances/i-09e08d362af7fa401/scripts 在这个目录下。文件树： [root@ip-172-31-50-3 i-09e08d362af7fa401]# tree . ├── boot-finished ├── cloud-config.txt ├── datasource ├── handlers ├── obj.pkl ├── scripts # 这个 │ ├── part-001 │ └── part-002 ├── sem │ ├── config_amazonlinux_repo_https │ ├── config_disk_setup │ ├── config_keys_to_console │ ├── config_locale │ ├── config_mounts │ ├── config_phone_home │ ├── config_power_state_change │ ├── config_resolv_conf │ ├── config_rsyslog │ ├── config_runcmd │ ├── config_scripts_per_instance │ ├── config_scripts_user │ ├── config_set_hostname │ ├── config_set_passwords │ ├── config_ssh │ ├── config_ssh_authkey_fingerprints │ ├── config_timezone │ ├── config_users_groups │ ├── config_write_files │ ├── config_write_metadata │ ├── config_yum_add_repo │ ├── config_yum_configure │ └── consume_data ├── user-data.txt ├── user-data.txt.i ├── vendor-data.txt └── vendor-data.txt.i 3 directories, 33 files 如果需要临时的改一些内容，可以写在Userdata里面，完成操作之后删除Userdata 就可以了。