avatar
文章
155
标签
73
分类
16

Home
Archives
Link
About
Liarlee's Notebook
搜索
Home
Archives
Link
About
SElinux问题排查
发表于2024-07-13|更新于2025-05-10|Linux
https://docs.redhat.com/zh_hans/documentation/red_hat_enterprise_linux/9/html/using_selinux/analyzing-an-already-found-selinux-denial_troubleshooting-problems-related-to-selinux安装相关软件. root@ip-172-31-54-198:/var/mnt/docker_root# rpm-ostree install policycoreutils-python-utils setroubleshoot-server Checking out tree 1873357... done Enabled rpm-md repositories: fedora-cisco-openh264 updates fedora updates-archive Updating metadata for 'updates'... done Updating metadata for 'updates-archive'... done Importing rpm-md... done rpm-md repo 'fedora-cisco-openh264' (cached); generated: 2024-03-12T11:45:42Z solvables: 3 rpm-md repo 'updates'; generated: 2024-07-12T04:06:14Z solvables: 21267 rpm-md repo 'fedora' (cached); generated: 2024-04-14T18:51:11Z solvables: 74881 rpm-md repo 'updates-archive'; generated: 2024-06-30T05:31:17Z solvables: 22869 Resolving dependencies... done Will download: 21 packages (6.0 MB) Downloading from 'updates'... done Downloading from 'fedora'... done Downloading from 'updates-archive'... done Importing packages... done Checking out packages... done Running pre scripts... done Running post scripts... done Running posttrans scripts... done Writing rpmdb... done Writing OSTree commit... done Staging deployment... done Added: checkpolicy-3.6-3.fc40.x86_64 gobject-introspection-1.80.1-1.fc40.x86_64 ima-evm-utils-1.5-4.fc40.x86_64 initscripts-service-10.23-1.fc40.noarch libfsverity-1.4-12.fc40.x86_64 policycoreutils-python-utils-3.6-3.fc40.noarch python3-audit-4.0.1-1.fc40.x86_64 python3-dasbus-1.7-6.fc40.noarch python3-dbus-1.3.2-6.fc40.x86_64 python3-gobject-base-3.48.2-1.fc40.x86_64 python3-libselinux-3.6-4.fc40.x86_64 python3-libsemanage-3.6-3.fc40.x86_64 python3-libxml2-2.12.7-1.fc40.x86_64 python3-policycoreutils-3.6-3.fc40.noarch python3-rpm-4.19.1.1-1.fc40.x86_64 python3-setools-4.5.1-2.fc40.x86_64 python3-systemd-235-9.fc40.x86_64 rpm-build-libs-4.19.1.1-1.fc40.x86_64 rpm-sign-libs-4.19.1.1-1.fc40.x86_64 setroubleshoot-plugins-3.3.14-9.fc40.noarch setroubleshoot-server-3.3.33-1.fc40.x86_64 Changes queued for next boot. Run "systemctl reboot" to start a reboot root@ip-172-31-54-198:/var/mnt/docker_root# systemctl reboot 查看建议的方案这个命令会给出建议的安全上下文, 对于通常的解决方案来说应该是足够的.我这里尝试容器里面运行的mysqld进程去写用户的家目录, 遇到了拒绝, 在调整之后,还是没有办法完全匹配上下文, 导致被拒绝, 后面还是设置了 permissive.默认情况下 Fedora CoreOS SElinux 是开启的状态, 关闭的话需要同时关闭系统的selinux 以及 docker 的selinux support, 比较麻烦, 还是 permissive 吧. root@ip-172-31-54-198:~# sealert -l "*" SELinux is preventing mysqld from write access on the directory mysql. ***** Plugin catchall_labels (83.8 confidence) suggests ******************* If you want to allow mysqld to have write access on the mysql directory Then you need to change the label on mysql Do # semanage fcontext -a -t FILE_TYPE 'mysql' where FILE_TYPE is one of the following: bpf_t, cifs_t, container_file_t, container_var_lib_t, fusefs_t, hugetlbfs_t, nfs_t, svirt_home_t, tmpfs_t, virt_home_t. Then execute: restorecon -v 'mysql' ***** Plugin catchall (17.1 confidence) suggests ************************** If you believe that mysqld should be allowed write access on the mysql directory by default. Then you should report this as a bug. You can generate a local policy module to allow this access. Do allow this access for now by executing: # ausearch -c 'mysqld' --raw | audit2allow -M my-mysqld # semodule -X 300 -i my-mysqld.pp Additional Information: Source Context system_u:system_r:container_t:s0:c114,c1019 Target Context system_u:object_r:mnt_t:s0 Target Objects mysql [ dir ] Source mysqld Source Path mysqld Port <Unknown> Host ip-172-31-54-198 Source RPM Packages Target RPM Packages SELinux Policy RPM selinux-policy-targeted-40.22-1.fc40.noarch Local Policy RPM selinux-policy-targeted-40.22-1.fc40.noarch Selinux Enabled True Policy Type targeted Enforcing Mode Enforcing Host Name ip-172-31-54-198 Platform Linux ip-172-31-54-198 6.8.11-300.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Mon May 27 14:53:33 UTC 2024 x86_64 Alert Count 1 First Seen 2024-07-13 11:16:08 CST Last Seen 2024-07-13 11:16:08 CST Local ID b331ca4f-5699-4d11-94d2-84638e0f0f8a Raw Audit Messages type=AVC msg=audit(1720840568.80:300): avc: denied { write } for pid=3596 comm="mysqld" name="mysql" dev="nvme0n1p4" ino=9437365 scontext=system_u:system_r:container_t:s0:c114,c1019 tcontext=system_u:object_r: ...
MySQL 主从复制
发表于2024-07-13|更新于2025-05-10|Database
被问了一个数据库的问题, 主从复制需要 主库锁表 和 主库重启 吗? Setting Up Binary Log File Position Based Replication使用docker配置一遍, 测试一下记录了步骤, 有些东西还是 MySQL 的官方文档好用. 主库需要做的事情 需要有唯一的 server id. 这个可以设置 variables , 不需要重启. MySQL [(none)]> SHOW variables LIKE 'server_id'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | server_id | 198 | +---------------+-------+ 1 row in set (0.001 sec) 创建一个用户用来让从库同步数据. 这个也不需要重启. MySQL [(none)]> CREATE USER 'replication-user'@'172.31.62.236' IDENTIFIED BY '123123'; Query OK, 0 rows affected (0.006 sec) MySQL [(none)]> GRANT REPLICATION SLAVE ON *.* TO 'replication-user'@'172.31.62.236'; Query OK, 0 rows affected (0.004 sec) MySQL [(none)]> SHOW GRANTS FOR 'replication-user'@'172.31.62.236'; +----------------------------------------------------------------------+ | Grants for replication-user@172.31.62.236 | +----------------------------------------------------------------------+ | GRANT REPLICATION SLAVE ON *.* TO `replication-user`@`172.31.62.236` | +----------------------------------------------------------------------+ 1 row in set (0.000 sec) MySQL [(none)]> FLUSH PRIVILEGES; Query OK, 0 rows affected (0.002 sec) MySQL [(none)]> SELECT Host,User,authentication_string FROM mysql.user; +---------------+------------------+------------------------------------------------------------------------+ | Host | User | authentication_string | +---------------+------------------+------------------------------------------------------------------------+ | % | root | $A$005$CK=G-6GdO6+*f4IpBYW8v6zGb62QvkVJIztfMwOYRKYK4VHdrnOcmM/jOzB | | 172.31.62.236 | replication-user | $A$005$t4642jI)eV[+@6NFBplxAFoO/a1p1sHIdDMiRHvtyvKD4mUFajpDiZKUPA | | localhost | mysql.infoschema | $A$005$THISISACOMBINATIONOFINVALIDSALTANDPASSWORDTHATMUSTNEVERBRBEUSED | | localhost | mysql.session | $A$005$THISISACOMBINATIONOFINVALIDSALTANDPASSWORDTHATMUSTNEVERBRBEUSED | | localhost | mysql.sys | $A$005$THISISACOMBINATIONOFINVALIDSALTANDPASSWORDTHATMUSTNEVERBRBEUSED | | localhost | root | $A$005$y,)(7](i&=Mm+kO9Re/i.ywV/7DTELTS9.DN9Or6yCVIPlaVm/NOkg/RA | +---------------+------------------+------------------------------------------------------------------------+ 6 rows in set (0.001 sec) 确保主库会记录binlog, 或者是特定的 database 会记录 binlog. MySQL 8.0 以后默认会启用 binlog, 所以这个实际也不会需要重启. 文档中有描述, 在这个位置应该上一个 主库 的 read lock, 是为了确保记住的 binlog 是不变的, 如果业务比较忙的情况下, 会导致 binlog position 一直动. mysql> FLUSH TABLES WITH READ LOCK; 在主库上面查询 binlog 的状态, 也就是之前的 show master status; 现在这个命令已经被替换成: MySQL [test]> SHOW BINARY LOG STATUS\G *************************** 1. row *************************** File: binlog.000002 Position: 1086 Binlog_Do_DB: test Binlog_Ignore_DB: Executed_Gtid_Set: 1 row in set (0.000 sec) 看完位置之后就可以 unlock 了. mysql> UNLOCK TABLES; 从库需要做的事情 确认可以连接到主库. 网络部分. 创建一个复制任务, 设置下面的信息.mysql> CHANGE REPLICATION SOURCE TO SOURCE_HOST='172.31.54.198', SOURCE_USER='replication-user', SOURCE_PASSWORD='123123', SOURCE_LOG_FILE='binlog.000002', SOURCE_LOG_POS=4, GET_SOURCE_PUBLIC_KEY=1; # 如果您使用的复制用户帐户通过 `caching_sha2_password` 插件(默认)进行身份验证,并且您没有使用安全连接,则必须指定此选项或 `SOURCE_PUBLIC_KEY_PATH` 选项向副本提供 RSA 公钥。 启动同步.mysql> START REPLICA; mysql> SHOW PROCESSLIST; +----+-----------------+---------------------+------+---------+------+----------------------------------------------------------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+-----------------+---------------------+------+---------+------+----------------------------------------------------------+------------------+ | 8 | event_scheduler | localhost | NULL | Daemon | 2062 | Waiting on empty queue | NULL | | 16 | root | 172.31.47.174:56736 | NULL | Query | 0 | init | SHOW PROCESSLIST | | 29 | system user | connecting host | NULL | Connect | 447 | Waiting for source to send event | NULL | | 35 | system user | | NULL | Query | 79 | Replica has read all relay log; waiting for more updates | NULL | | 36 | system user | | NULL | Query | 2149 | Waiting for an event from Coordinator | NULL | | 37 | system user | | NULL | Connect | 79 | Waiting for an event from Coordinator | NULL | | 38 | system ...
在 PVE 上使用 Ignition 文件启动 Fedora CoreOS
发表于2024-06-15|更新于2025-05-10|Linux
PVE 解决方案:https://forum.proxmox.com/threads/howto-startup-vm-using-an-ignition-file.63782/ CoreOS Butane Config Spec:https://coreos.github.io/butane/config-fcos-v1_5/ 之前使用使用了另一个方案 让 PVE coreos 可以直接使用 cloudinit 注入密钥。现在感觉也没有必要, 直接关闭 cloudinit 使用 ignition file 就可以了。 按照上面连接中的文档, 编辑 PVE VM 的配置文件, 将args 附加到QemuServer的配置文件里面即可。 vim /etc/pve/qemu-server/101.conf 添加 args 的配置, 这个配置会直接被记录和传递给 qemu 。 ```shell agent: 1 主要就是这一行, 直接添加这个就可以了 args: -fw_cfg name=opt/com.coreos/config,file=/mnt/pve/public_share/fedora_coreos/hayden-coreos.ign balloon: 0 bios: ovmf boot: order=virtio0 cores: 1 cpu: host efidisk0: local:101/vm-101-disk-0.qcow2,efitype=4m,pre-enrolled-keys=1,size=528K memory: 2048 meta: creation-qemu=8.0.2,ctime=1718427351 name: fcos-template net0: virtio=F6:65:0F:7B:FF:23,bridge=vmbr0 numa: 0 ostype: l26 scsihw: virtio-scsi-single smbios1: uuid=45ee31ce-bf31-4e18-ae68-803bbc8c31b2 sockets: 1 virtio0: local-lvm:vm-101-disk-0,iothread=1,size=10G vmgenid: c309f3f2-0bf0-42d6-bdaa-641122b82b54
使用 Snapper 管理快照
发表于2024-06-08|更新于2025-05-10|Linux
Snapper是一个快照管理工具, 可以自动创建和定期清理文件系统快照. Snapper 默认配置文件在 etc/snapper/configs.日志在: /var/log/snapper.log 创建 Snapper 配置文件使用 Snapper 的方法是首先为需要自动快照的子卷创建snapper config. # 创建快照 snapper -c harbor_data create-config /mnt/btrfs/root/@harbor_data/ # 列出所有配置 snapper list-configs 管理快照Snapper 提供了自动管理快照的功能, 也可以手动管理. 通常的操作包括: 创建新的快照, 删除快照, 列出所有快照. snapper -c harbor_data list Type | # | Pre # | Date | User | Cleanup | Description | Userdata -------+---+-------+--------------------------+------+----------+-------------+--------- single | 0 | | | root | | current | single | 1 | | Sat Jun 8 11:48:24 2024 | root | | init | single | 2 | | Sat Jun 8 11:56:05 2024 | root | timeline | timeline | single | 3 | | Sat Jun 8 12:01:01 2024 | root | timeline | timeline | snapper list -a Type | # | Pre # | Date | User | Cleanup | Description | Userdata -------+---+-------+--------------------------+------+----------+-------------+--------- single | 0 | | | root | | current | single | 1 | | Sat Jun 8 11:48:24 2024 | root | | init | single | 2 | | Sat Jun 8 11:56:05 2024 | root | timeline | timeline | single | 3 | | Sat Jun 8 12:01:01 2024 | root | timeline | timeline | snapper -c harbor_data delete 1-3 snapper list -a Type | # | Pre # | Date | User | Cleanup | Description | Userdata -------+---+-------+------+------+---------+-------------+--------- single | 0 | | | root | | current | 激活 Systemd Timer 进行 Timeline 备份看起来默认的情况下, 不会自动激活 systemd timer 来执行备份, 那么可能需要如下命令手动激活timer. systemctl enable snapper-cleanup.timer Created symlink from /etc/systemd/system/basic.target.wants/snapper-cleanup.timer to /usr/lib/systemd/system/snapper-cleanup.timer. systemctl enable snapper-timeline.timer Created symlink from /etc/systemd/system/basic.target.wants/snapper-timeline.timer to /usr/lib/systemd/system/snapper-timeline.timer. systemctl list-timers --all NEXT LEFT LAST PASSED UNIT ACTIVATES Sat 2024-06-08 17:29:56 CST 5h 14min left Fri 2024-06-07 17:29:56 CST 18h ago systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service Sun 2024-06-09 00:00:00 CST 11h left Sat 2024-06-08 00:00:00 CST 12h ago atop-rotate.timer atop-rotate.service n/a n/a n/a n/a snapper-cleanup.timer snapper-cleanup.service n/a n/a n/a n/a snapper-timeline.timer snapper-timeline.service n/a n/a n/a n/a systemd-readahead-done.timer systemd-readahead-done.service root@reg /m/b/r/@harbor_data# systemctl start snapper-cleanup.timer root@reg /m/b/r/@harbor_data# systemctl start snapper-timeline.timer root@reg /m/b/r/@harbor_data# systemctl list-timers --all NEXT LEFT LAST PASSED UNIT ACTIVATES Sat 2024-06-08 13:00:00 CST 43min left n/a n/a snapper-timeline.timer snapper-timeline.service Sat 2024-06-08 17:29:56 CST 5h 13min left Fri 2024-06-07 17:29:56 CST 18h ago systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service Sun 2024-06-09 00:00:00 CST 11h left Sat 2024-06-08 00:00:00 CST 12h ago atop-rotate.timer atop-rotate.service Sun 2024-06-09 12:16:04 CST 23h left Sat 2024-06-08 12:16:04 CST 6s ago snapper-cleanup.timer snapper-cleanup.service n/a n/a n/a n/a systemd-readahead-done.timer systemd-readahead-done.service 创建的快照都是CoW轻量快照, 并不占用非常多的空间. 对于创建出来的快照, 也可以直接使用 btrfs send 命令发送到其他设备, 这个是 Snapper 提供的钩子来实现的.[[Linux/Linux_btrfs#^08cd21|Linux_btrfs]] 这个里面也有记录 Snapper 刚刚开始用, 如果后面有任何相关于 Snapper 的信息再补充.
容器运行的NodeExporter出现异常的 CloseWait
发表于2024-05-30|更新于2025-05-10|Linux
我在自己的小主机上面部署了NodeExporter来收集指标, 并输出到 Prometheus 里面, 做持久化保存. 这两天看这个小主机的温度非常的不顺眼, 摸上去感觉有点烫手, 看 htop 显示这个温度大概是 50 +. 今天上午出门之前想了一下, 反正平时也没有什么负载, 调整了一下 PVE host 的 CPU 主频, 然后就出门了. cpupower frequency-set -g poweroff cpupower frequency-set -u 2GHz 就是因为这个调整, 我晚上回来的时候发现, Grafana 少了三个 PVE VM 的监控指标. 第一反应是 node exporter docker 容器退出了, 就登录到 VM 里面去查看, node exporter 容器并没有退出. ss 命令查看 连接数 以及 连接状态, 发现访问 9100 端口的连接都在 CloseWait 这个状态. CloseWait 状态表示, 当前它作为被关闭连接的一方没有及时的Close socket. 应用程序没有对这个连接的 socket 正常关闭. 然而对端已经完成的 FIN, 并且断开连接了. 我还没有意识到问题发生在什么地方,以为是 Node Exporter 的代码可能有问题.然后重启了一下 Node Exporter 的容器, 发现重启之后并不解决问题, CloseWait还是会慢慢的增加.随着更多的连接转为 CloseWait, 使用 curl 访问 NodeExporter 暴露数据的 9100 接口会直接报错. 新的连接无法正常建立, curl 命令会阻塞一直到超时.Node Exporter 的日志有下面的内容: Maximum allowed concurrent requests threshold(40) was breached 接下来我认为是 Node Exporter 出现了问题, 准备检查自己的容器配置. 都是 docker-compose 启动的, 应用的代码只能是镜像的问题, 因此重新pull 新镜像. 问题依旧. 继续查看 docker-compose 的 配置文件的时候我看到了问题的答案. 我在 NodeExporter 的配置文件里面设置了 limit 和 reservation. ( 这真的是个好习惯…limit 设置了 CPU : 0.1 deploy: resources: limits: cpus: '0.10' memory: 100M reservations: cpus: '0.10' memory: 100M 看到这个配置的时候我感觉我可能找到了 CPU 提供给应用程序的时间不足, 也会造成 CloseWait.在调整了CPU limit 之后, CloseWait 状态的连接就看不到了. Prometheus 抓取指标正常, curl 请求接口返回的数据正常. 那么问题的原因就找到了:之前的宿主机 CPU 主频够高, 效率比较高, 所以在 容器的 limit 里给了 0.1 也来得及处理完连接然后关闭socket.我在 PVE Host 上面直接给 CPU 降频, 原来的 0.1 CPU 不够了. 之前的数据传出之后, 下一个请求进来继续建立连接发送数据, 这样导致创建出来的多, 关闭的socket少, 在超出了NodeExporter的并发连接限制之后就, NodeExporter 直接就不接受新的请求了.
Linux open()调用的理解
发表于2024-05-19|更新于2025-05-10|Linux
原本的问题最近看 OSTEP 的内容, 慢慢看, 搞了一点点代码, 还挺有意思的。一边写一边尝试回答后面的思考问题。 在自己的环境中发现程序会报错 read failed: Bad file descriptor. 2.编写一个打开文件的程序(使用 open()系统调用),然后调用 fork()创建一个新进程。 子进程和父进程都可以访问 open()返回的文件描述符吗?当它我并发(即同时)写入文件时, 会发生什么? 第一个版本的代码这第一个版本的代码是我提需求 Copilot 写的. 所以我并不理解其中的意思, 解释的也比较粗糙, 我即便提供给他报错的部分, 它还是坚信这个代码没有问题, 子进程可以访问父进程的文件描述符…… #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <string.h> #include <fcntl.h> #include <sys/wait.h> /* *2.编写一个打开文件的程序(使用 open()系统调用),然后调用 fork()创建一个新进程。 *子进程和父进程都可以访问 open()返回的文件描述符吗?当它我并发(即同谁)写入文件谁, *会发生谁谁? */ void fork_process(int fd) { int rc = fork(); if (rc < 0) { fprintf(stderr, "fork failed.\n"); exit(1); } else if (rc == 0) { // 子进程: 读取父进程写入的内容 lseek(fd, 0, SEEK_SET); // 重置文件偏移量到文件开始 char buffer[1024]; ssize_t bytes_read = read(fd, buffer, sizeof(buffer) - 1); printf("Output FD Content: %d\n", fd); if (bytes_read < 0) { perror("read failed"); // fprintf(stderr, "read failed in child.\n"); exit(1); } buffer[bytes_read] = '\0'; // 确保字符串结束 // printf("Child (pid:%d) read: %s\n", (int)getpid(), buffer); } else { char buffer[1024]; ssize_t bytes_read = read(fd, buffer, sizeof(buffer) - 1); printf("Output FD Content: %d\n", fd); if (bytes_read < 0) { perror("read failed"); // fprintf(stderr, "read failed in child.\n"); exit(1); } buffer[bytes_read] = '\0'; // 确保字符串结束 // 父进程: 等待子进程结束 int wc = waitpid(rc, NULL, 0); printf("Parent (pid:%d) of child %d\n", (int)getpid(), rc); } } int main(int argc, char *argv[]) { printf("Hello world (pid:%d)\n", (int) getpid()); // 父进程: 打开 /etc/hostname 文件并读取内容 int fd_passwd = open("/etc/hostname", O_RDONLY); if (fd_passwd < 0) { fprintf(stderr, "open /etc/hostname failed.\n"); exit(1); } char buffer[1024]; ssize_t bytes_read = read(fd_passwd, buffer, sizeof(buffer) - 1); if (bytes_read < 0) { fprintf(stderr, "read /etc/hostname failed.\n"); exit(1); } buffer[bytes_read] = '\0'; // 确保字符串结束 close(fd_passwd); // 父进程: 创建 open_test.log 文件并写入读取的内容 int fd_log = open("./open_test.log", O_CREAT|O_WRONLY|O_TRUNC, S_IRWXU); if (fd_log < 0) { fprintf(stderr, "open open_test.log failed.\n"); exit(1); } write(fd_log, buffer, strlen(buffer)); // 父进程: 创建子进程并传递文件描述符 fork_process(fd_log); // 父进程: 关闭文件描述符 close(fd_log); return 0; } 执行报错: [root@arch ostep-hm-code]$ ./a.out Hello world (pid:68492) Output FD Content: 3 read failed: Bad file descriptor Output FD Content: 3 read failed: Bad file descriptor 尝试用 Perf Trace 去追踪[root@arch ostep-hm-code]$ perf trace -e syscalls:sys_enter_open* -e syscalls:sys_enter_close* -T -f ./a.out 2>&1 Hello world (pid:67852) 129919107.285 a.out/67852 syscalls:sys_enter_openat(dfd: CWD, filename: "", flags: RDONLY|CLOEXEC) 129919107.302 a.out/67852 syscalls:sys_enter_close(fd: 3</home/ec2-user/ostep-hm-code/open_test.log>) \\ 上面的两个 Open Close 明显是一组, 这个地方应该是我打开log文件的记录. 但是后面 fd 3 被关闭了. 在内核层面观察到了被关闭. \\ 但是并不能确定为什么被关闭. 129919107.310 a.out/67852 syscalls:sys_enter_openat(dfd: CWD, filename: "", flags: RDONLY|CLOEXEC) 129919107.354 a.out/67852 syscalls:sys_enter_close(Output FD Content: 3 fd: 3</home/ec2-user/ostep-hm-code/open_test.log>) \\ 这里是我打印出来的 文件描述符id , 可以看到是 3. 也能看到close关闭这个文件描述符. 129919107.497 a.out/67852 syscalls:sys_enter_openat(dfd: CWD, filename: "") 129919107.503 a.out/67852 syscalls:sys_enter_close(fd: 3</home/ec2-user/ostep-hm-code/open_test.log>) \\ 这里也是测试的输出, 输出文件描述符为 3. 也是close掉的. 129919107.506 a.out/67852 syscalls:sys_enter_openat(read failed: Bad file descriptor dfd: CWD, filename: "", flags: CREAT|TRUNC|WRONLY, mode: IRWXU) \\ 输出了报错, 这里其实已经写出了原因. openat() 去获取文件的时候发现这个文件是使用 WRONLY 写模式打开的, 不能读. 129919107.950 a.out/67852 syscalls:sys_enter_close(fd: 4) Output FD Content: 3 read failed: Bad file descriptor 改成正确的版本void fork_process(int fd) { int main(int argc, char *argv[]) { printf("Hello world (pid:%d)\n", (int) getpid()); // 父进程: 打开 /etc/hostname 文件并读取内容 int fd_hostname = open("./open_test.log", O_RDWR, S_IRWXU); ...
Harbor 的升级记录
发表于2024-05-06|更新于2025-05-10|Kubernetes
这次更新的目的是, 将原来的 LVM 切换成 btrfs (真香! 存储主要将两个部分迁移到 [[Linux_btrfs|btrfs]] 上。 docker daemon 的工作目录. harbor 的数据 日志和证书. 下载源代码从官方的项目下载这个版本的 offline 安装包.不用 online 的原因是 : online 还需要从 dockerhub 下载镜像, 国内实在一言难尽. https://github.com/goharbor/harbor/releases/tag/v2.9.4 cd /opt/ wget https://github.com/goharbor/harbor/releases/download/v2.9.4/harbor-offline-installer-v2.9.4.tgz 解压tar zxvf ./harbor-offline-installer-v2.10.2.tgz 复制之前版本的配置文件cp /opt/harbor-2.10.0/harbor.yml . 确认配置文件中的几个参数已经改为正确的路径, 其他位置不变: https: # https port for harbor, default is 443 port: 443 # The path of cert and key files for nginx certificate: /mnt/btrfs/harbor_data/certs/111.pem private_key: /mnt/btrfs/harbor_data/certs/111.key data_volume: /mnt/btrfs/harbor_data log: local: location: /mnt/btrfs/harbor_data/harbor_log Load 新版本的镜像docker load < ./harbor.v2.10.2.tar.gz btrfs]$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE goharbor/harbor-exporter v2.10.2 9befcab0cee2 4 weeks ago 111MB goharbor/redis-photon v2.10.2 9d1db211d49a 4 weeks ago 170MB goharbor/trivy-adapter-photon v2.10.2 8f9e0b6b43ce 4 weeks ago 509MB goharbor/harbor-registryctl v2.10.2 e5a807ba1f59 4 weeks ago 155MB goharbor/registry-photon v2.10.2 850d2b3f27f3 4 weeks ago 89MB goharbor/nginx-photon v2.10.2 9282c21c2fee 4 weeks ago 159MB goharbor/harbor-log v2.10.2 f288fe2baa96 4 weeks ago 168MB goharbor/harbor-jobservice v2.10.2 a3247b57a920 4 weeks ago 146MB goharbor/harbor-core v2.10.2 6cd434d62456 4 weeks ago 174MB goharbor/harbor-portal v2.10.2 7e5a522c7853 4 weeks ago 167MB goharbor/harbor-db v2.10.2 cd385df354d4 4 weeks ago 274MB goharbor/prepare v2.10.2 bf4632d26b65 4 weeks ago 214MB 创建存储并迁移数据mkfs.btrfs -L harbor -d raid1 -m raid1 -n 16k /dev/nvme1n1 /dev/nvme2n1 -f # 创建子卷 btrfs su cr @docker_data btrfs su cr @harbor_data # 创建挂载点 mkdir -v /mnt/btrfs/harbor_data mkdir -v /mnt/btrfs/harbor_data/certs/ mkdir -v /mnt/btrfs/docker_data # 同步历史数据 rsync -aP ./harbor_data/ /mnt/btrfs/harbor_data/ rsync -aP ./docker_data/ /mnt/btrfs/docker_data/ rsync -aP ./harbor_log/ /mnt/btrfs/harbor_data/harbor_log cp -prv ./certs /mnt/btrfs/harbor_data/certs ### 将挂载写入fstab UUID=519abb44-a6a3-4ed1-b99d-506e9443e73f /mnt/btrfs/docker_data btrfs defaults,compress=zstd,autodefrag,ssd,subvol=@docker_data UUID=519abb44-a6a3-4ed1-b99d-506e9443e73f /mnt/btrfs/harbor_data btrfs defaults,compress=zstd,autodefrag,ssd,subvol=@harbor_data 确认目录结构mnt]$ tree -L 2 /mnt/btrfs/ /mnt/btrfs/ ├── docker_data │ ├── buildkit │ ├── containers │ ├── image │ ├── network │ ├── overlay2 │ ├── plugins │ ├── runtimes │ ├── swarm │ ├── tmp │ ├── trust │ └── volumes └── harbor_data ├── ca_download ├── certs ├── database ├── harbor_log ├── job_logs ├── redis ├── registry └── secret 21 directories, 0 files 安装harbor./install.sh 确认结果harbor]$ docker-compose ps NAME COMMAND SERVICE STATUS PORTS harbor-core "/harbor/entrypoint.…" core running (healthy) harbor-db "/docker-entrypoint.…" postgresql running (healthy) harbor-jobservice "/harbor/entrypoint.…" jobservice running (healthy) harbor-log "/bin/sh -c /usr/loc…" log running (healthy) 127.0.0.1:1514->10514/tcp harbor-portal "nginx -g 'daemon of…" portal running (healthy) nginx "nginx -g 'daemon of…" proxy running (healthy) 0.0.0.0:80->8080/tcp, :::80->8080/tcp, 0.0.0.0:443->8443/tcp, :::443->8443/tcp registry "/home/harbor/entryp…" registry running (healthy) registryctl "/home/harbor/start.…" registryctl running (healthy) 重启系统可以正常重启,harbor 可以访问。
GKE 专有集群创建BastionHost连接apiserver的方式
发表于2024-05-03|更新于2025-05-10|Kubernetes
需求一个GKE集群使用了autopilot创建专有集群, 创建完成之后, 怎么在另一个VPC下的实例内使用 kubectl 管理集群.两个实例: instance-20240430-111240: 和 GKE 在相同的VPC内, 可以直接访问控制平面进行集群管理. instance-20240430-053004: 在另一个VPC内, 无法直接访问. 使用 Kubectl Proxy 直接暴露 Apiserver 端口 和GKE相同VPC内启动一个新的实例, 在实例中配置 kubectl 可以正常连接到控制平面. 使用 kubectl 命令创建 proxy 监听在本机的所有地址:root@instance-20240430-111240:~ kubectl proxy --address=0.0.0.0 --kubeconfig .kube/config --accept-hosts "^.*" & 在另一个VPC内的节点上, 使用kubectl 命令进行连接: root@instance-20240430-053004:~ kubectl get pods -o wide -A -s instance-20240430-111240:8001 使用 Tiny Proxy 代理Refer tohttps://cloud.google.com/kubernetes-engine/docs/tutorials/private-cluster-bastion?hl=zh-cn&cloudshell=false#connect 按照上面文档的步骤进行, 之前的步骤都是一致的, 将文档中下面的命令的部分进行替换. root@instance-20240430-053004:~ gcloud compute ssh INSTANCE_NAME \ --tunnel-through-iap \ --project=PROJECT_ID \ --zone=COMPUTE_ZONE \ --ssh-flag="-4 -L8888:localhost:8888 -N -q -f" 换成 root@instance-20240430-053004:~ ssh -i .ssh/google_compute_engine root@instance-20240430-111240 -4 -L8888:localhost:8888 -N -q -f 直接使用 ssh 隧道, 将 111240 实例的 8888 端口转发到当前实例的8888端口上. 然后配置环境变量设置localhost8888端口作为 https 流量的代理. 设置代理的环境变量 以及 测试 kubectl 命令 获取 ns. root@instance-20240430-053004:~ export HTTPS_PROXY=localhost:8888 root@instance-20240430-053004:~ kubectl get ns NAME STATUS AGE default Active 3d gke-gmp-system Active 3d gke-managed-cim Active 3d gke-managed-filestorecsi Active 3d gke-managed-system Active 3d gmp-public Active 3d kube-node-lease Active 3d kube-public Active 3d kube-system Active 3d Kubernetes 官方相关文档:KubeProxy:https://kubernetes.io/docs/reference/kubectl/generated/kubectl_proxy/ 使用 SOCKS5 代理访问 Kubernetes APIhttps://kubernetes.io/zh-cn/docs/tasks/extend-kubernetes/socks5-proxy-access-api/ 使用 HTTP 代理访问 Kubernetes APIhttps://kubernetes.io/zh-cn/docs/tasks/extend-kubernetes/http-proxy-access-api/
Portainer 使用记录
发表于2024-04-14|更新于2025-05-10|Linux
创建并启动 Portainer在这里直接使用了dockercompose直接运行, 这个dockercompose 是自己配置的, 其他的服务可以托管给 portainer , 但是 portainer 自己貌似不太能托管自己. 创建compose文件: touch /opt/portainer/docker-compose.yaml 写入配置文件: --- version: "3.8" services: portainer: image: portainer/portainer-ce:latest restart: always environment: - UUID=0 - GUID=0 - TZ=Asia/Shanghai volumes: - /run/docker.sock:/var/run/docker.sock - /etc/localtime:/etc/localtime:ro - /opt/Portainer/portainer_data:/data network_mode: host cap_add: - ALL 运行 dockercompose updocker-compose down --remove-orphans && \ docker-compose up -d 启动 Portainer Agent在需要管理的其他节点上面, 运行下面的命令: docker run -d \ -p 9001:9001 \ --name portainer_agent \ --restart=always \ -v /var/run/docker.sock:/var/run/docker.sock \ -v /var/lib/docker/volumes:/var/lib/docker/volumes \ reg.liarlee.site/docker.io/portainer/agent:2.19.4
MySQL 无法重连问题的分析
发表于2024-04-12|更新于2025-05-10|Database
复现方法我的测试环境是完全使用容器的, 还是遇到了一点点小差异. 案例来自一次故障的诊断过程–实验重现 2024年必做实验 的过程, 看看自己差在哪儿. 使用下面的命令运行并进行测试:分离了 server 和 client 在不同的实例上, 开始是放在一起的, 后来为了方便确认范围, 就给分开了. 创建 docker 容器, 运行 MySQL.docker run -it -d --net=host -e MYSQL_ROOT_PASSWORD=123 --name=mysql-server regprox.liarlee.site/docker.io/mysql 连接并创建数据库.mysql -h127.1 --ssl-mode=DISABLED -utest -p123 -e "create database test" sysbenchdocker run --net=host --privileged -it regprox.liarlee.site/docker.io/phantooom/plantegg:sysbench-lab bash sysbench --mysql-user='root' --mysql-password='123' --mysql-db='test' --mysql-host='127.0.0.1' --mysql-port='3306' --tables='16' --table-size='10000' --range-size='5' --db-ps-mode='disable' --skip-trx='on' --mysql-ignore-errors='all' --time='1180' --report-interval='1' --histogram='on' --threads=1 oltp_read_only prepare sysbench --mysql-user='root' --mysql-password='123' --mysql-db='test' --mysql-host='127.0.0.1' --mysql-port='3306' --tables='16' --table-size='10000' --range-size='5' --db-ps-mode='disable' --skip-trx='on' --mysql-ignore-errors='all' --time='1180' --report-interval='1' --histogram='on' --threads=1 oltp_read_only run 查看客户端进程.( 这个记录还是在同一个节点上面测试的记录.MySQL [(none)]> show processlist; +----+-----------------+---------------------+------+---------+------+------------------------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+-----------------+---------------------+------+---------+------+------------------------+------------------+ | 5 | event_scheduler | localhost | NULL | Daemon | 336 | Waiting on empty queue | NULL | | 11 | root | 127.0.0.1:40666 | test | Sleep | 0 | | NULL | | 12 | root | 172.31.47.174:53264 | NULL | Query | 0 | init | show processlist | +----+-----------------+---------------------+------+---------+------+------------------------+------------------+ 3 rows in set, 1 warning (0.000 sec) kill进程MySQL [(none)]> kill 11; Query OK, 0 rows affected (0.001 sec) MySQL [(none)]> show processlist; +-----+----------------------+---------------------+------+---------+------+------------------------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +-----+----------------------+---------------------+------+---------+------+------------------------+------------------+ | 5 | event_scheduler | localhost | NULL | Daemon | 435 | Waiting on empty queue | NULL | | 14 | root | 172.31.47.174:56052 | NULL | Query | 0 | init | show processlist | | 15 | unauthenticated user | 127.0.0.1:48256 | NULL | Connect | 3 | Receiving from client | NULL | | 16 | unauthenticated user | 127.0.0.1:48258 | NULL | Connect | 3 | Receiving from client | NULL | | 17 | unauthenticated user | 127.0.0.1:48274 | NULL | Connect | 3 | Receiving from client | NULL | | 18 | unauthenticated user | 127.0.0.1:48284 | NULL | Connect | 3 | Receiving from client | NULL | | 19 | unauthenticated user | 127.0.0.1:48294 | NULL | Connect | 3 | Receiving from client | NULL | | 20 | unauthenticated user | 127.0.0.1:48298 | NULL | Connect | 3 | Receiving from client | NULL | | 21 | unauthenticated user | 127.0.0.1:48308 | NULL | Connect | 3 | Receiving from client | NULL | | 22 | unauthenticated user | 127.0.0.1:48310 | NULL | Connect | 3 | Receiving from client | NULL | | 23 | unauthenticated user | 127.0.0.1:48316 | NULL | Connect | 3 | Receiving from client | NULL | | 24 | unauthenticated user | 127.0.0.1:48332 | NULL | Connect | 3 | Receiving from client | NULL | | 25 | unauthenticated user | 127.0.0.1:48338 | NULL | Connect | 3 | Receiving from client | NULL | | 26 | unauthenticated user | 127.0.0.1:48346 | NULL | Connect | 3 | Receiving from client | NULL | | 27 | unauthenticated user | 127.0.0.1:48360 | NULL | Connect | 3 | Receiving from client | NULL | | 28 | unauthenticated user | 127.0.0.1:48366 | NULL | Connect | 3 | Receiving from client | NULL | | 29 | unauthenticated user | 127.0.0.1:48372 | NULL | Connect | 3 | Receiving from client | NULL | | 30 | unauthenticated user | 127.0.0.1:48386 | NULL | Connect | 3 | Receiving from client | NULL | | 31 | unauthenticated user | 127.0.0.1:48394 | NULL | Connect | 3 | Receiving from client | NULL | | 32 | unauthenticated user | 127.0.0.1:48398 | NULL | Connect | 3 | Receiving from client | NULL | | 33 | unauthenticated user | 127.0.0.1:48404 | NULL | Connect ...
123…16
avatar
Liarlee
Archlinux User, Support Engineer
文章
155
标签
73
分类
16
Follow Me
公告
都道无人愁似我,今夜雪,有梅花,似我愁。
最新文章
CheatSheet_Kubernetes2333-12-08
CheatSheet_Linux2333-12-08
CheatSheet_awscli2333-12-08
CheatSheet_Databases2333-12-08
Fortio 笔记2025-05-09
TrueNAS Core 自动更新UI证书2024-11-15
分类
  • AWS2
  • Application5
  • Books5
  • Database8
  • Docker6
  • EKS5
  • ESXi1
  • ElasticSearch2
标签
TrueNasCore btrfs Repo Android ElasticSearch Git Redis KVM LogAgent Headscale MongoDB Cilium Orthanc Prometheus Ranger VPC FileSystem JAVA Network Steam NAS RPM Memory Ceph Fedora Filesystem Nvidia GRUB2 Strace AWS CPU Firefox 诗 EBS Ansible Python RIME SElinux Database Hexo
归档
  • 十二月 23334
  • 五月 20251
  • 十一月 20241
  • 九月 20242
  • 八月 20241
  • 七月 20243
  • 六月 20242
  • 五月 20244
网站资讯
文章数目 :
155
本站总字数 :
183k
本站访客数 :
本站总访问量 :
最后更新时间 :
©2020 - 2025 By Liarlee
框架 Hexo|主题 Butterfly
搜索
数据库加载中