升级以及清理内核的步骤
通常情况下升级内核版本的步骤CentOS 升级步骤
yum makecache -y
yum update -y
grub2-editenv list
grub2-set-default 'CentOS Linux (3.10.xxxxx.el7.elrepo.x86_64) 7 (Core)' # entry_name
systemctl reboot
清理旧版本的步骤RHEL 或者 Centos
rpm -qa kernel*
# 这个命令会列出所有当前已经安装的版本的内核, 然后手动使用命令移除对应的软件包即可。
直接使用yum 移除不需要的版本即可.
yum remove -y kernel-devel-5.10.216-204.855.amzn2.x86_64 kernel-devel-5.10.218-208.862.amzn2.x86_64 kernel-5.10.216-204.855.amzn2.x86_64 kernel-5.10.218-208.862.amzn2.x86_64
rpm -qa | grep kernel
kernel-tools-5.10.219-208.866.amzn2.x86_64
kernel-headers-5.10.219-208.866.amzn2.x86_64
kernel-devel-5.10.219-208.866.amzn2.x86_64
kernel-5.10.219-208.866.amzn2.x86_64
列出确认一下是不是已经清理出来.
ls -alh /boot/
total 29M
dr-xr-xr-x 4 root root 4.0K Jul 19 15:02 ./
dr-xr-xr-x 19 root root 268 Jul 1 17:32 ../
-rw-r--r-- 1 root root 174 Jun 18 22:04 .vmlinuz-5.10.219-208.866.amzn2.x86_64.hmac
-rw------- 1 root root 4.5M Jun 18 22:04 System.map-5.10.219-208.866.amzn2.x86_64
-rw-r--r-- 1 root root 141K Jun 18 22:04 config-5.10.219-208.866.amzn2.x86_64
drwxr-xr-x 3 root root 17 Oct 14 2022 efi/
drwx------ 5 root root 79 Jul 19 15:02 grub2/
-rw------- 1 root root 14M Jul 9 15:03 initramfs-5.10.219-208.866.amzn2.x86_64.img
-rw-r--r-- 1 root root 643K Oct 14 2022 initrd-plymouth.img
-rw-r--r-- 1 root root 268K Jun 18 22:05 symvers-5.10.219-208.866.amzn2.x86_64.gz
-rwxr-xr-x 1 root root 9.7M Jun 18 22:04 vmlinuz-5.10.219-208.866.amzn2.x86_64*
当然 如果全都卸载了. 也是可以重装的(doge.
yum groupinstall -y "Development Tools"
yum install -y kernel kernel-devel kernel-debug
Ubuntu 降级Ubuntu Online 的内核不能直接卸载, 需要安装, 然后切换, 卸载新的
root@ip-172-31-59-13:~# update-initramfs -k all -c
update-initramfs: Generating /boot/initrd.img-5.15.0-1048-aws
update-initramfs: Generating /boot/initrd.img-5.4.0-1126-aws
root@ip-172-31-59-13:~# update-grub
Sourcing file `/etc/default/grub'
Sourcing file `/etc/default/grub.d/40-force-partuuid.cfg'
Sourcing file `/etc/default/grub.d/50-cloudimg-settings.cfg'
Sourcing file `/etc/default/grub.d/init-select.cfg'
Generating grub configuration file ...
GRUB_FORCE_PARTUUID is set, will attempt initrdless boot
Found linux image: /boot/vmlinuz-5.15.0-1048-aws
Found initrd image: /boot/microcode.cpio /boot/initrd.img-5.15.0-1048-aws
Found linux image: /boot/vmlinuz-5.4.0-1126-aws
Found initrd image: /boot/microcode.cpio /boot/initrd.img-5.4.0-1126-aws
Found Ubuntu 20.04.6 LTS (20.04) on /dev/nvme0n1p1
done
查看可用内核的版本
root@ip-172-31-59-13:$ apt search linux-image | grep 5.4.0 | grep linux-image | grep aws
查看所有已经安装的内核
root@ip-172-31-59-13:~$ dpkg --get-selections | grep linux
console-setup-linux install
libselinux1:amd64 install
linux-aws install
linux-aws-5.15-headers-5.15.0-1048 install
linux-aws-headers-5.4.0-1126 install
linux-base install
linux-headers-5.15.0-1048-aws install
linux-headers-5.4.0-1126-aws install
linux-headers-aws install
linux-image-5.15.0-1048-aws install
linux-image-5.4.0-1126-aws install
linux-image-aws install
linux-modules-5.15.0-1048-aws install
linux-modules-5.4.0-1126-aws install
util-linux install
安装内核
root@ip-172-31-59-13:~$ apt install -y linux-image-5.4.0-1126-aws/focal-updates linux-headers-5.4.0-1126-aws
指定Grub Entry条目
root@ip-172-31-59-13:~$ vim /etc/default/grub
其中Entry的变量应该设置为下面的格式:
Advanced options for Ubuntu>Ubuntu, with Linux 5.4.0-1126-aws
清理内核的步骤 - Version 2Deb 包管理工具清理步骤
列出所有已经安装的内核版本:dpkg --list | grep linux-image
列出所有旧的内核并自动删除除当前内核之外的旧内核:sudo apt-get autoremove --purge`
如果想手动删除旧内核,可以使用以下命令,sudo apt-get remove --purge linux-image-X.X.X-X-generic
Rpm 包管理工具的清理步骤
查看安装的内核rpm -qa | grep kernel
使用yum卸载sudo yum install yum-utils
设置只保留两个内核sudo package-cleanup --oldkernels --count=2
Linux内存管理笔记
内存管理部分的笔记Crash命令的使用使用这个命令需要有debuginfo 以及kernel debug 的数据包, 同时可能需要gdb。
需要在配置文件里面开启这个 仓库: rhel-8-baseos-rhui-debug-rpms
具体的步骤也可以看这个文档, 来自Redhat 官方: https://access.redhat.com/solutions/9907
yum install -y kernel-debuginfo
# 使用这个命令就可以安装, 但是尺寸非常的大。
crash /boot/vmlinuz-$(uname -a)
使用命令crash来进行 PM 和 VM 的对应关系:
内核的 debug 文件在: /var/lib/debug/lib/modules/kernel-version/
使用crash命令:
~ # ❯❯❯ crash
crash 7.3.2-4.el8
Copyright (C) 2002-2022 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011, 2020-2022 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...
WARNING: kernel relocated [592MB]: patching 107327 gdb minimal_symbol values
KERNEL: /usr/lib/debug/lib/modules/4.18.0-477.13.1.el8_8.x86_64/vmlinux [TAINTED]
DUMPFILE: /proc/kcore
CPUS: 2
DATE: Tue Jul 11 17:07:01 CST 2023
UPTIME: 01:04:34
LOAD AVERAGE: 0.15, 0.03, 0.01
TASKS: 226
NODENAME: center
RELEASE: 4.18.0-477.13.1.el8_8.x86_64
VERSION: #1 SMP Thu May 18 10:27:05 EDT 2023
MACHINE: x86_64 (2199 Mhz)
MEMORY: 7.9 GB
PID: 7657
COMMAND: "crash"
TASK: ffff9ce7d835a800 [THREAD_INFO: ffff9ce7d835a800]
CPU: 0
STATE: TASK_RUNNING (ACTIVE)
crash> vm -p [pid]
PID: 913 TASK: ffff9ce7c75fd000 CPU: 0 COMMAND: "sshd"
MM PGD RSS TOTAL_VM
ffff9ce7c11b8000 ffff9ce7c75ae000 7604k 76644k
VMA START END FLAGS FILE
ffff9ce7c759f828 55a7fcc7b000 55a7fcd4c000 8000875 /usr/sbin/sshd
VIRTUAL PHYSICAL
55a7fcc7b000 12026b000
55a7fcc7c000 1201df000
55a7fcc7d000 1201ec000
55a7fcc7e000 1200c7000
55a7fcc7f000 120c43000
55a7fcc80000 10fa79000
55a7fcc81000 11fdd3000
55a7fcc82000 11087f000
55a7fcc83000 11fa8d000
55a7fcc84000 10fe05000
55a7fcc85000 110870000
55a7fcc86000 10fa2c000
55a7fcc87000 10f9fc000
55a7fcc88000 10fdab000
55a7fcc89000 11f296000
55a7fcc8a000 1117ec000
55a7fcc8b000 10fdac000
55a7fcc8c000 120c65000
55a7fcc8d000 12011b000
55a7fcc8e000 110714000
55a7fcc8f000 110c83000
55a7fcc90000 110c90000
55a7fcc91000 110d2b000
55a7fcc92000 120730000
55a7fcc93000 12076f000
55a7fcc94000 1207e8000
55a7fcc95000 110c2f000
55a7fcc96000 110c3c000
55a7fcc97000 120650000
55a7fcc98000 1206c1000
55a7fcc99000 120c67000
55a7fcc9a000 120c0f000
55a7fcc9b000 FILE: /usr/sbin/sshd OFFSET: 20000
55a7fcc9c000 11d46d000
55a7fcc9d000 10fe01000
55a7fcc9e000 10fdb9000
55a7fcc9f000 10fde7000
55a7fcca0000 FILE: /usr/sbin/sshd OFFSET: 25000
# 结果省略了后面的部分, 太长了。。 。。
可以看到内存的映射关系, notmapped 表示没有被映射到物理内存的部分。
一般来说 后面的三位是一样的, 如果是THP的话, 那么后面的五位是一样的。
这个vtop 可以直接查看里面保存的内容以及具体的映射关系。
crash> vtop 55d5473fc000
VIRTUAL PHYSICAL
55d5473fc000 (not accessible)
rd命令可以读取指定的内存虚拟地址之后的偏移量。
crash> rd 55d54879d000 100
rd: invalid user virtual address: 55d54879d000 type: "64-bit UVADDR"
超过内存申请容量的使用, 会导致 访问内存越界, 例如申请了1G的内存,但是尝试写入超出的数据量, 会导致数据写到后续不属于这个进程的空间上, 而这个时候内核会触发一个 segfault, 来终止这个进程。
这个报错不是立刻发生的,可能确实会溢出一部分。
匿名页面 实际上是 mmap with MAP_ANONYMOUS flag映射出来的虚拟内存地址, 当需要第一次去写匿名页面的时候, 会将物理内存的地址映射到虚拟内存并将其中填0.
overcommit 0 可以所有的地址, 1 无限制,虚拟内存没有限制, 2 按照一定的比例进行计算, 最终的结果。
GDB 调试工具的使用记录
首先写了一个这样的程序:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
int *data;
data = (int *)malloc(100 * sizeof(int)); // 分配 100 个 int 大小的内存块
if (data == NULL) {
// 内存分配失败的处理
return 1;
}
//data[100] = 0;
// 使用 data 数组...
// 例如,初始化数组
for (int i = 0; i < 100; ++i) {
data[i] = 0; // 将每个元素初始化为 0
}
free(data);
printf("%d\n", data[2]);
}
编译这个程序.
╰─>$ gcc ./123.c -g -Og
运行需要调试的程序.
╰─>$ ./a.out
971012533
尝试使用 gdb 调试.
╰─>$ gdb ./a.out -d .
╰─>$ gdb ./a.out -c ./COREDUMP_FILE
指令说明
list - list 指令会列出 10 行代码. 可以重复使用, 每10行一次.
(gdb) list
11 return 1;
12 }
13
14 //data[100] = 0;
15
16 // 使用 data 数组...
17 // 例如,初始化数组
18 for (int i = 0; i < 100; ++i) {
19 data[i ...
添加一个Redhat到EKS集群, 基于Packer的步骤
Copy From zhojiew 的私有仓库文档, 已经经过授权 ~
官方提供了基于packer工具的构建脚本
这里手动把相关的步骤执行下,基于redhat9构建一个自定义ami。据称eks优化的ami也是通过以下步骤完成的
手动构建ami拉仓库
cd /home/ec2-suer
sudo yum install git -y
git clone https://github.com/awslabs/amazon-eks-ami.git
配置环境变量
KUBERNETES_VERSION=1.26.4
KUBERNETES_BUILD_DATE=2023-05-11
BINARY_BUCKET_NAME=amazon-eks
BINARY_BUCKET_REGION=cn-north-1
DOCKER_VERSION=20.10.23-1.amzn2.0.1
CONTAINERD_VERSION=1.6.*
RUNC_VERSION=1.1.5-1.amzn2
CNI_PLUGIN_VERSION=v0.8.6
PULL_CNI_FROM_GITHUB=true
SONOBUOY_E2E_REGISTRY=""
PAUSE_CONTAINER_VERSION=3.5
CACHE_CONTAINER_IMAGES=false
WORKING_DIR=/tmp/worker
TEMPLATE_DIR=/home/ec2-user/amazon-eks-ami
复制文件更新内核(可以跳过)
mkdir -p $WORKING_DIR
mkdir -p $WORKING_DIR/log-collector-script
mkdir -p $WORKING_DIR/bin
mv $TEMPLATE_DIR/files/* $WORKING_DIR/
mv $TEMPLATE_DIR/log-collector-script/linux/eks-log-collector.sh $WORKING_DIR/log-collector-script/
sudo chmod -R a+x $WORKING_DIR/bin/
sudo mv /tmp/worker/bin/* /usr/bin/
# sudo bash $TEMPLATE_DIR/scripts/upgrade_kernel.sh
KERNEL_VERSION=5.10
sudo grubby \
--update-kernel=ALL \
--args="psi=1"
sudo grubby \
--update-kernel=ALL \
--args="clocksource=tsc tsc=reliable"
sudo reboot
构建的主要逻辑在脚本install-worker.sh中
# sudo bash $TEMPLATE_DIR/scripts/install-worker.sh
export AWS_DEFAULT_OUTPUT="json"
ARCH="amd64"
sudo yum update -y
sudo yum install -y \
chrony \
conntrack \
curl \
ethtool \
ipvsadm \
jq \
nfs-utils \
socat \
unzip \
wget \
yum-utils \
yum-plugin-versionlock \
mdadm \
pigz
# Remove any old kernel versions.
sudo package-cleanup --oldkernels --count=1 -y
# Remove the ec2-net-utils package
if yum list installed | grep ec2-net-utils; then sudo yum remove ec2-net-utils -y -q; fi
sudo mkdir -p /etc/eks/
sudo mv $WORKING_DIR/configure-clocksource.service /etc/eks/configure-clocksource.service
# iptables
sudo mv $WORKING_DIR/iptables-restore.service /etc/eks/iptables-restore.service
# awscli
sudo yum install less unzip jq -y
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install --update
complete -C '/usr/local/bin/aws_completer' aws
# systemd
sudo mv "${WORKING_DIR}/runtime.slice" /etc/systemd/system/runtime.slice
编译安装runc
# install runc and lock version
# sudo yum install -y runc-${RUNC_VERSION}
sudo yum install libseccomp-devel.x86_64 golang -y
go env -w GOPROXY=https://goproxy.io,direct
git clone https://github.com/opencontainers/runc
cd runc
make
sudo make install
安装containerd
# install containerd and lock version
sudo yum install -y yum-utils device-mapper-persistent-data lvm2
# sudo yum install -y containerd-${CONTAINERD_VERSION}
yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
sudo yum install -y containerd # 1.6.21
配置containerd
sudo mkdir -p /etc/eks/containerd
sudo mv $WORKING_DIR/containerd-config.toml /etc/eks/containerd/containerd-config.toml
# containerd and related service
sudo mv $WORKING_DIR/kubelet-containerd.service /etc/eks/containerd/kubelet-containerd.service
sudo mv $WORKING_DIR/sandbox-image.service /etc/eks/containerd/sandbox-image.service
sudo mv $WORKING_DIR/pull-sandbox-image.sh /etc/eks/containerd/pull-sandbox-image.sh
sudo mv $WORKING_DIR/pull-image.sh /etc/eks/containerd/pull-image.sh
sudo chmod +x /etc/eks/containerd/pull-sandbox-image.sh
sudo chmod +x /etc/eks/containerd/pull-image.sh
sudo mkdir -p /etc/systemd/system/containerd.service.d
cat << EOF | sudo tee /etc/systemd/system/containerd.service.d/10-compat-symlink.conf
[Service]
ExecStartPre=/bin/ln -sf /run/containerd/containerd.sock /run/dockershim.sock
EOF
cat << EOF | sudo tee -a /etc/modules-load.d/containerd.conf
overlay
br_netfilter
EOF
cat << EOF | sudo tee -a /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
# skip docker
日志轮换配置
# logrotate
sudo mv $WORKING_DIR/logrotate-kube-proxy /etc/logrotate.d/kube-proxy
sudo mv $WORKING_DIR/logrotate.conf /etc/logrotate.conf
sudo chown root:root /etc/logrotate.d/kube-proxy
sudo chown root:root /etc/logrotate.conf
sudo mkdir -p /var/log/journal
下载kubelet和aws-iam-authenticator
## download bin in china region
S3_DOMAIN="amazonaws.com.cn"
S3_PATH="s3://amazon-eks/1.26.4/2023-05-11/bin/linux/amd64"
# Verify that the aws-iam-authenticator is at last v0.5.9 or greater
BINARIES=(
kubelet
aws-iam-authenticator
)
for binary in ${BINARIES[*]}; do
aws s3 cp $S3_PATH/$binary . --region cn-north-1
sudo chmod +x $binary
sudo mv $binary /usr/bin/
done
继续配置服务
# kubernetes
sudo mkdir -p /etc/kubernetes/manifests
sudo mkdir -p /var/lib/kubernetes
sudo mkdir -p /var/lib/kubelet
sudo mkdir -p /opt/cni/bin
CNI_PLUGIN_FILENAME="cni-plugins-linux-${ARCH}-${CNI_PLUGIN_VERSION}"
aws s3 cp --region $BINARY_BUCKET_REGION $S3_PATH/${CNI_PLUGIN_FILENAME}.tgz .
su ...
BufferIO与DirectIO的比较
测试方法使用BufferIO的方式, 测试文件的写入:
#!/bin/bash
perf record -T -C 0 -- taskset -c 0 dd if=/dev/zero of=./a.dat bs=4k count=16384
使用DirectIO的方式, 测试文件的写入:
#!/bin/bash
perf record -T -C 0 -- taskset -c 0 dd if=/dev/zero of=./a.dat bs=4k count=16384 oflag=direct
运行结果BufferIO:
[root@ip-172-31-53-200 perf_records]# ./start_test_bufferio.sh
16384+0 records in
16384+0 records out
67108864 bytes (67 MB, 64 MiB) copied, 0.118848 s, 565 MB/s
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.225 MB perf.data (485 samples) ]
ll -h
-rw-r--r--. 1 root root 64M Jun 1 13:45 a.dat
[root@ip-172-31-53-200 ~]# dstat -tf
----system---- -----cpu0-usage----------cpu1-usage----------cpu2-usage----------cpu3-usage---- dsk/nvme0n1 ---net/lo-----net/eth0- ---paging-- ---system--
time |usr sys idl wai stl:usr sys idl wai stl:usr sys idl wai stl:usr sys idl wai stl| read writ| recv send: recv send| in out | int csw
01-06 13:35:48| 2 0 99 0 0: 1 0 99 0 0: 0 1 98 0 0: 1 0 99 0 0|8192B 35k|1096B 1096B: 968B 828B| 0 0 | 712 971
01-06 13:35:49| 25 9 60 5 0: 0 0 99 0 0: 4 12 84 0 0: 4 8 90 0 0| 0 64M|1096B 1096B: 576B 756B| 0 0 |2283 1311
01-06 13:35:50| 6 1 94 0 0: 1 1 99 0 0: 16 2 83 0 0: 0 1 99 0 0| 0 0 |1096B 1096B: 156B 418B| 0 0 | 954 1018
DirectIO:
[root@ip-172-31-53-200 perf_records]# ./start_test_directio.sh
16384+0 records in
16384+0 records out
67108864 bytes (67 MB, 64 MiB) copied, 10.4225 s, 6.4 MB/s
[ perf record: Woken up 9 times to write data ]
[ perf record: Captured and wrote 2.417 MB perf.data (41489 samples) ]
[root@ip-172-31-53-200 ~]# dstat -tf
----system---- -----cpu0-usage----------cpu1-usage----------cpu2-usage----------cpu3-usage---- dsk/nvme0n1 ---net/lo-----net/eth0- ---paging-- ---system--
time |usr sys idl wai stl:usr sys idl wai stl:usr sys idl wai stl:usr sys idl wai stl| read writ| recv send: recv send| in out | int csw
01-06 13:36:36| 0 1 99 0 0: 1 1 99 0 0: 0 1 100 0 0: 0 1 100 0 0| 0 0 |1097B 1097B: 688B 624B| 0 0 | 622 930
01-06 13:36:37| 3 4 32 61 0: 4 14 81 0 0: 1 0 97 0 0: 0 5 94 0 0| 0 4277k|1095B 1095B: 332B 338B| 0 0 |6434 3133
01-06 13:36:38| 3 3 0 92 0: 1 1 99 0 0: 3 1 96 0 0: 0 0 99 0 0| 0 6421k|1096B 1096B: 52B 174B| 0 0 |8767 4148
01-06 13:36:39| 4 4 0 92 0: 0 0 99 0 0: 4 1 96 0 0: 2 0 100 0 0| 0 6431k|1096B 1096B: 52B 150B| 0 0 |8790 4191
01-06 13:36:40| 4 4 0 91 0: 0 1 99 0 0: 2 1 96 0 0: 0 1 99 0 0| 0 6320k|1096B 1096B: 52B 142B| 0 0 |8744 4092
01-06 13:36:41| 4 4 0 92 0: 1 0 99 0 0: 3 0 96 0 0: 0 0 100 0 0| 0 6216k|1096B 1096B: 52B 142B| 0 0 |8662 4103
01-06 13:36:42| 3 4 0 92 0: 1 1 99 0 0: 2 2 96 0 0: 0 0 99 0 0| 0 7492k|1576B 1576B: 52B 134B| 0 0 |8756 4099
01-06 13:36:43| 3 3 0 91 0: 1 0 99 0 0: 4 1 96 0 0: 0 0 100 0 0| 0 6284k|1096B 1096B: 52B 134B| 0 0 |8720 4077
01-06 13:36:44| 4 2 0 92 0: 0 0 99 0 0: 2 1 96 0 0: 0 0 99 0 0| 0 6296k|1096B 1096B: 52B 134B| 0 0 |8788 4067
01-06 13:36:45| 4 5 0 91 0: 1 0 99 0 0: 4 0 96 0 0: 1 0 99 0 0| 0 6368k|1096B 1096B: 52B 134B| 0 0 |8792 4071
01-06 13:36:46| 3 5 0 92 0: 1 1 99 0 0: 4 1 96 0 0: 0 0 100 0 0| 0 5904k|1096B 1096B: 52B 134B| 0 0 |8576 3893
01-06 13:36:47| 25 7 0 69 0: 0 0 99 0 0: 2 1 96 0 0: 0 0 100 0 0| 0 4811k|1097B 1097B: 364B 763B| 0 0 |7035 3360
01-06 13:36:48| 4 0 96 0 0: 1 0 99 0 0: 22 3 75 1 0: 0 1 100 0 0|2642k 109k|1095B 1095B: 208B 472B| 0 0 | 977 1008
01-06 13:36:49| 0 1 99 0 0: 0 0 100 0 0: 0 0 98 0 0: 0 0 99 0 0| 0 0 |1096B 1096B: 104B 276B| 0 0 | 640 903
Perf 采样结果BufferIO:
DirectIO:
VPCFlowlog 解析
VPC Flow Log 怎么看https://docs.amazonaws.cn/vpc/latest/userguide/flow-logs.html#flow-log-recordshttps://docs.amazonaws.cn/vpc/latest/userguide/flow-logs-records-examples.html#flow-log-example-tcp-flag
vpc flow log里的tcp-flags记录的不是某个单个tcp包头里的flag,而是单次观察的时间窗口里这条flow的所有tcp包出现过的tcp flag的合计。
TCP flags can be OR-ed during the aggregation interval. For short connections, the flags might be set on the same line in the flow log record, for example, 19 for SYN-ACK and FIN, and 3 for SYN and FIN. For an example, see TCP flag sequence.For general information about TCP flags (such as the meaning of flags like FIN, SYN, and ACK), see TCP segment structureon Wikipedia..
这个记录里面的值, 是这样计算出来的, 从右向左 , 从 0 次方开始计算。
FIN 2^0
SYN 2^1
RST 2^2
PSH 2^3
ACK 2^4
URG 2^5
ECE 2^6
CWR 2^7
Linux OS Debug 方法记录
触发 EC2 Linux 的 NMI Unknown 中断发送一个诊断请求给 EC2, 触发 os 本身 NMI Unknown 事件,这个时间会触发 Kdump 记录当时的现场。
aws ec2 send-diagnostic-interrupt --region cn-north-1 --instance-id i-********************
记录下来的现场文件保存在 /var/crash/
[root@mysql 5.14.0-284.11.1.el9_2.x86_64]# ll /var/crash/
total 0
drwxr-xr-x. 2 root root 67 Jun 6 05:22 127.0.0.1-2023-06-06-05:22:11
drwxr-xr-x. 2 root root 67 Jun 6 08:58 127.0.0.1-2023-06-06-08:58:20
drwxr-xr-x. 2 root root 67 Jun 9 09:39 127.0.0.1-2023-06-09-09:39:56
使用Crash命令进行分析, 需要安装kernel-debug 和 kernel-debuginfo kernel-devel
[root@mysql 5.14.0-284.11.1.el9_2.x86_64]# crash /usr/lib/debug/lib/modules/5.14.0-284.11.1.el9_2.x86_64/vmlinux /var/crash/127.0.0.1-2023-06-09-09\:39\:56/vmcore
相关文档:
New – Trigger a Kernel Panic to Diagnose Unresponsive EC2 Instances发送诊断中断(适用于高级用户)
Cscope 查看内核源代码# 下载源代码
yum install -y yum-utils
yum
yum download --source kernel
# 解压代码包
rpm2cpio ./kernel-5.14.0-284.11.1.el9_2.src.rpm | cpio -div
tar xf ./linux-5.14.0-284.11.1.el9_2.tar.xz
# 使用命令查看源代码
make cscope ARCH=x86
# 读取并标记tag
make tags ARCH=x86
# 查看
cscope -d
Dracut 的使用和命令# 添加驱动程序到 ramfs
]$ dracut -f -v --add-drivers "nvme ena" /boot/initramfs-$(uname -r).img $(uname -r)
# 查看是否有模块在 ramfs 中
]$ lsinitrd /boot/initramfs-$(uname -r).img | grep -E "nvme|ena"
安全软件引起的用户空间进程失去响应Redhat关于这个问题的文档说明:
https://access.redhat.com/solutions/5201171
https://access.redhat.com/solutions/2838901
使用Ftrace的方法,和一部分命令的使用方法:
[root@ip-172-31-51-167 ~]$ echo 'func fanotify_get_response +p' > /sys/kernel/debug/dynamic_debug/control
追踪这个系统调用, 并输出 callgraph.内核的DynamicTracing, 这是一个古老的方式了, 出现在Kprobe之前。会直接将追踪的结果输出到dmesg中。
[root@ip-172-31-51-167 ~]$ perf trace -s -p 2688
[root@ip-172-31-51-167 ~]$ cd /var/crash/127.0.0.1-2023-08-11-06:53:10
[root@ip-172-31-51-167 ~]$ crash /usr/lib/debug/lib/modules/6.1.34-59.116.amzn2023.x86_64/vmlinux vmcore
[root@ip-172-31-51-167 127.0.0.1-2023-08-11-06:53:10]$ ll /var/crash
total 0
drwxr-xr-x. 2 root root 67 Aug 10 05:52 127.0.0.1-2023-08-10-05:52:16
drwxr-xr-x. 2 root root 67 Aug 10 06:13 127.0.0.1-2023-08-10-06:13:44
drwxr-xr-x. 2 root root 67 Aug 11 05:45 127.0.0.1-2023-08-10-13:03:27
drwxr-xr-x. 2 root root 91 Aug 12 15:05 127.0.0.1-2023-08-11-04:57:13
drwxr-xr-x. 2 root root 67 Aug 11 08:41 127.0.0.1-2023-08-11-06:53:10
drwxr-xr-x. 2 root root 67 Aug 11 20:56 badstop
drwxr-xr-x. 2 root root 41 Aug 11 20:46 crash
Grubby 命令简单的用法设置内核参数:
# 查看所有内核的参数
$ grubby --info=ALL
# 设置默认的启动内核
$ grubby --set-default-index=1
# 查看当前的默认启动内核
$ grubby --default-kernel
# 移除所有内核的参数
$ grubby --update-kernel=ALL --remove-args="systemd.log_level=debug systemd.log_target=kmsg log_buf_len=1M loglevel=8 crashkernel=512M"
# 更新所有内核的参数
$ grubby --update-kernel=ALL --args="systemd.log_level=debug systemd.log_target=kmsg log_buf_len=1M loglevel=8 crashkernel=512M"
# 为特定的内核添加参数。
$ grubby --update-kernel=/boot/vmlinuz-5.9.1-1.el8.elrepo.x86_64 --args=“systemd.log_level=debug systemd.log_target=kmsg log_buf_len=1M loglevel=8 crashkernel=512M”
[root@ip-172-31-0-170 ~]# sudo kdumpctl status
kdump: Kdump is operational
[root@ip-172-31-0-170 ~]# sudo kdumpctl showmem
kdump: Reserved 256MB memory for crash kernel
[root@ip-172-31-0-170 ~]# cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt1)/boot/vmlinuz-6.1.34-59.116.amzn2023.x86_64 root=UUID=483d7075-a0f8-4ba8-a951-a668fa079cac ro console=tty0 console=ttyS0,115200n8 nvme_core.io_timeout=42949672
95 rd.emergency=poweroff rd.shell=0 selinux=1 security=selinux quiet systemd.log_level=debug systemd.log_target=kmsg log_buf_len=1M loglevel=8 crashkernel=512M
快速启动一个 prometheus 和 grafana
快速创建一个可用的 prometheus 和 grafana 进行测试, 并将数据保留在当前的目录中, 在重启之后数据不会丢失:
创建一个目录.
mkdir /opt/monitor
mkdir /opt/monitor/grafana
mkdir /opt/monitor/grafana_data
mkdir /opt/monitor/prometheus
mkdir /opt/monitor/prometheus_data
touch /opt/monitor/docker-compose.yaml
创建docker-compose 文件
---
version: "3"
services:
prometheus:
container_name: prometheus
image: reg.liarlee.site/docker.io/prom/prometheus:latest
restart: always
network_mode: host
environment:
- TZ=Asia/Shanghai
volumes:
# - /opt/monitor/prometheus/prometheus.yaml:/etc/prometheus/prometheus.yml
- /opt/monitor/prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/usr/share/prometheus/console_libraries'
- '--web.console.templates=/usr/share/prometheus/consoles'
- '--storage.tsdb.retention.time=90d'
grafana:
container_name: grafana
image: reg.liarlee.site/docker.io/grafana/grafana-oss:main-ubuntu
restart: always
network_mode: host
environment:
- TZ=Asia/Shanghai
volumes:
- /opt/monitor/grafana_data:/var/lib/grafana
- /opt/monitor/grafana/datasource:/etc/grafana/provisioning/datasources
# - /opt/monitor/grafana/grafana.ini:/etc/grafana/grafana.ini
- /etc/localtime:/etc/localtime:ro
user: '472'
准备基础配置文件
docker compose up -d
docker cp grafana:/etc/grafana/grafana.ini /opt/monitor/grafana/grafana.ini
docker cp prometheus:/etc/prometheus/prometheus.yml /opt/monitor/prometheus/prometheus.yaml
chown -R 472:472 /opt/monitor/grafana_data
chown -R 472:472 /opt/monitor/grafana
chown -R nobody:nobody /opt/monitor/prometheus_data
chown -R nobody:nobody /opt/monitor/prometheus
docker compose down --remove-orphans
准备prometheus 作为默认的Datasource
touch /opt/monitor/grafana/datasource/datasource.yml
---
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: http://localhost:9090
isDefault: true
access: proxy
editable: true
修改配置文件中需要的参数, 取消配置文件中的注释, 然后重启即可。
docker compose down --remove-orphans && docker compose up -d
数据库单表的测试
基于这个问题的测试为什么MySQL单表不要超过2000w行?
测试过程:CREATE TABLE test(
id int NOT NULL AUTO_INCREMENT PRIMARY KEY comment '主键',
person_id int not null comment '用户id',
person_name VARCHAR(200) comment '用户名称',
gmt_create datetime comment '创建时间',
gmt_modified datetime comment '修改时间'
) comment '人员信息表';
插入数据:
insert into test values(1,1,'user_1', NOW(), now());
insert into test (person_id, person_name, gmt_create, gmt_modified)
select (@i:=@i+1) as rownum, person_name, now(), now() from test, (select @i:=100) as init;
set @i=1;
//测试 SQL,记录他们的运行时间
select count(*) from test;
select count(*) from test where id=XXX;
查看这个表格的数据量大小:
show table status like 'test'\G
200w行表:
mysql> describe table test;
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------+
| 1 | SIMPLE | test | NULL | ALL | NULL | NULL | NULL | NULL | 2092640 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------+
1 row in set, 1 warning (0.00 sec)
MySQL [test]> select count(*) from test;
1 row in set (0.045 sec)
1 row in set (0.050 sec)
1 row in set (0.050 sec)
400w:
mysql> describe table test;
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------+
| 1 | SIMPLE | test | NULL | ALL | NULL | NULL | NULL | NULL | 4185280 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------+
1 row in set, 1 warning (0.00 sec)
MySQL [test]> select count(*) from test;
1 row in set (0.126 sec)
1 row in set (0.120 sec)
1 row in set (0.119 sec)
800w:
mysql> describe table test;
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------+
| 1 | SIMPLE | test | NULL | ALL | NULL | NULL | NULL | NULL | 8370120 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------+
1 row in set, 1 warning (0.00 sec)
MySQL [test]> select count(*) from test;
1 row in set (0.266 sec)
1 row in set (0.266 sec)
1 row in set (0.253 sec)
1600w:
mysql> describe table test;
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+-------+
| 1 | SIMPLE | test | NULL | ALL | NULL | NULL | NULL | NULL | 16337090 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+-------+
1 row in set, 1 warning (0.00 sec)
MySQL [test]> select count(*) from test;
1 row in set (0.544 sec)
1 row in set (0.524 sec)
1 row in set (0.523 sec)
3200w:
mysql> describe table test;
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+-------+
| 1 | SIMPLE | test | NULL | ALL | NULL | NULL | NULL | NULL | 32665301 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+-------+
1 row in set, 1 warning (0.00 sec)
MySQL [test]> select count(*) from test;
1 row in set (1.068 sec)
1 row in set (1.057 sec)
1 row in set (1.044 sec)
这个结果基本上都是线性的, 感觉数据量实在是太小了。
mysql> show table status like 'test'\G
*************************** 1. row ***************************
Name: test
Engine: InnoDB
Version: 10
Row_format: Dynamic
Rows: 4182365
Avg_row_length: 48
Data_length: 202063872
Max_data_length: 0
...
Network 相关知识不知道放那儿
Origin Version:https://datatracker.ietf.org/doc/html/rfc1180
Chinese Version:http://arthurchiao.art/blog/rfc1180-a-tcp-ip-tutorial-zh/
Tuning initcwnd for optimum Performance:
https://www.cdnplanet.com/blog/tune-tcp-initcwnd-for-optimum-performance/
https://www.kawabangga.com/posts/5217
Other Linux Network Stack Explaination:https://www.clockblog.life/article/2023/7/4/44.html
Linux内核网络https://www.clockblog.life/article/2023/7/4/44.html
https://blogs.runsunway.com/
buffer/cache 无法释放
问题看到了一个案例, 这个案例的问题是: 为什么我的buffer/cache在echo 3 之后, 还是不能回收, 内存的占用很大。
命令如下:
root@ip-172-31-47-174 ~# free -h
total used free shared buff/cache available
Mem: 7.5Gi 1.3Gi 3.5Gi 2.1Gi 2.6Gi 3.8Gi
Swap: 0B 0B 0B
root@ip-172-31-47-174 ~# echo 3 > /proc/sys/vm/drop_caches
root@ip-172-31-47-174 ~# free -h
total used free shared buff/cache available
Mem: 7.5Gi 1.3Gi 3.7Gi 2.1Gi 2.4Gi 3.9Gi
Swap: 0B 0B 0B
root@ip-172-31-47-174 ~# free
total used free shared buff/cache available
Mem: 7833520 1376608 3917368 2160524 2539544 4059944
Swap: 0 0 0
分析和答案分析: 开始的时候我并没有发现具体有什么问题, 认为是应用程序确实无法回收cache的空间,因为正在使用。
例如Firefox启动的时候就会使用Cache的空间来进行数据的存储。
答案: 第二天看到了大佬的更新, 这个问题的原因是: 内核会将 shared 的空间, 一并统计在 buffer/cache 中, 所以free命令的输出是正常的, 实际cache已经释放了一部分,没有大幅度变化的原因是因为那部分是 shmem的空间, 所以。。。释放不掉。
初见这个结论是有些震惊的, 我一直都认为 shared 字段里面统计的内存是独立的, 仔细看看上面的命令, 确实 shared空间基本上与 buffer/cache的空间是差不多的。
测试一部分信息:
OS: Arch Linux x86_64
Kernel: 6.3.1-arch2-1
Software Version: free from procps-ng 3.3.17
测试的方法, 我只是尝试证明free命令的统计方式变化, 所以直接简化了, 直接扩大ShareMemory。
# 直接创建一个临时的目录
# 其实直接使用 /dev/shm 也行, 但是可用空间会被限制到 物理内存的一半。
sudo mkdir /mnt/tmpfs/
# 挂载到 /mnt/tmpfs/
sudo mount -t tmpfs -o size=5000m shared /mnt/tmpfs/
# 创建一个文件占用那个部分的内存。
sudo fallocate -l 4G /mnt/tmpfs/file
按照上面的步骤测试, 可以发现 share memory 确实也同时被统计在了 buffer/cache 里面, 与客户的现象完全一致。这个命令确实就是这样工作的。
按照这个思路应该看看meminfo 以及 内核的文档,Mark一下准备开始走一遍这个思路。
立刻查看free命令的manpage, 发现果然没有更新, 说明摘要:
sharedMemory used (mostly) by tmpfs (Shmem in /proc/meminfo)
buffersMemory used by kernel buffers (Buffers in /proc/meminfo)
cacheMemory used by the page cache and slabs (Cached and SReclaimable in /proc/meminfo)
buff/cacheSum of buffers and cache
看 /proc/meminfo , 发现确实是取值取到了 shmem, 这个值是对的, 现在的问题就是为什么内核提供了这样的一个值。
sudo cat /proc/meminfo | grep -Ei "mem|cache|buffer|active"
MemTotal: 7833520 kB
MemFree: 4417692 kB
MemAvailable: 4615060 kB
Buffers: 0 kB
Cached: 2550368 kB # 这个的统计就。。。。 free是对的
SwapCached: 0 kB
Active: 830560 kB
Inactive: 2397160 kB
Active(anon): 444220 kB
Inactive(anon): 2385016 kB
Active(file): 386340 kB
Inactive(file): 12144 kB
Shmem: 2151884 kB # 这个是创建的文件大小,转换成文件的大小, 差不多是 2G 左右。
现在的问题变成, 具体是什么时候 meminfo里面的值变更了统计方式呢? 为什么这样统计呢?
不会找具体是什么时候commit的变更, 直接看代码吧。 希望我看的是对的。
// https://elixir.bootlin.com/linux/latest/source/fs/proc/meminfo.c
static int meminfo_proc_show(struct seq_file *m, void *v)
{
struct sysinfo i;
...
cached = global_node_page_state(NR_FILE_PAGES) -
total_swapcache_pages() - i.bufferram;
if (cached < 0)
cached = 0
...
show_val_kb(m, "MemTotal: ", i.totalram);
show_val_kb(m, "MemFree: ", i.freeram);
show_val_kb(m, "MemAvailable: ", available);
show_val_kb(m, "Buffers: ", i.bufferram);
show_val_kb(m, "Cached: ", cached);
show_val_kb(m, "SwapCached: ", total_swapcache_pages());
show_val_kb(m, "Active: ", pages[LRU_ACTIVE_ANON] +
pages[LRU_ACTIVE_FILE]);
show_val_kb(m, "Inactive: ", pages[LRU_INACTIVE_ANON] +
pages[LRU_INACTIVE_FILE]);
show_val_kb(m, "Active(anon): ", pages[LRU_ACTIVE_ANON]);
show_val_kb(m, "Inactive(anon): ", pages[LRU_INACTIVE_ANON]);
show_val_kb(m, "Active(file): ", pages[LRU_ACTIVE_FILE]);
show_val_kb(m, "Inactive(file): ", pages[LRU_INACTIVE_FILE]);
show_val_kb(m, "Unevictable: ", pages[LRU_UNEVICTABLE]);
show_val_kb(m, "Mlocked: ", global_zone_page_state(NR_MLOCK));
可以看到从一个sysinfo 的 struct 取出, 然后进行计算, 存储在cached 变量里面,然后输出:
// https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/sysinfo.h#L14
__kernel_ulong_t bufferram; /* Memory used by buffers */
好了 我看不动了,Cached 的最后结果: os所有可用的文件页面 - SwapCached的数值 - Buffers的数值,大概应该是这个意思, 按照这么算的话, 确实会加入 shmem 的部分,邮件归档以及解释
That’s a reasonable position to take.
Another point of view is that everything in tmpfs is part of the page cache and can be written out to swap, so keeping it as part of Cached is not misleading.
I can see it both ways, and personally, I’d lean towards clarifyingthe documentation about how shmem is accounted rather than changing how the memory usage is reported.
这是上面的邮件链接中的一部分, 解释了为什么将 shmem 计算到 Cached 中的原因,看起来现在应该由一个新的指标数值来处理这个了。
更多的部分看参考链接吧,我顺便看了讲解,大佬讲的清楚。
参考链接https://zhuanlan.zhihu.com/p/586107891
https://www.cnblogs.com/tsecer/p/16290025.html