zabbix深入自定义监控

zabbix深入自定义监控

邮件告警

img

先创建了触发器
触发器,触发动作
动作调用了指定的报警媒介类型

配置报警媒介类型

image-20230816100703246

image-20230816100938355

image-20230816101501083

配置收件人

image-20230816101715160

image-20230816101733233

image-20230816101850465

image-20230816101904876

配置动作

image-20230816101557245

image-20230816102049262

image-20230816102120431

image-20230816102140482

image-20230816102209758

image-20230816102232235

image-20230816102257392

邮件告警示例

image-20230816102354115

优化告警内容

image-20230816103436268

image-20230816103511944

# 故障告警
故障{TRIGGER.STATUS},服务器:{HOSTNAME1}发生: {TRIGGER.NAME}故障!

告警地址:{HOST.IP}

告警主机:{HOSTNAME1}

告警时间:{EVENT.DATE} {EVENT.TIME}

告警等级:{TRIGGER.SEVERITY}

告警信息: {TRIGGER.NAME}

告警项目:{TRIGGER.KEY1}

问题详情:{ITEM.NAME}:{ITEM.VALUE}

当前状态:{TRIGGER.STATUS}:{ITEM.VALUE1}

事件ID:{EVENT.ID}

# 恢复告警
恢复{TRIGGER.STATUS}, 服务器:{HOSTNAME1}: {TRIGGER.NAME}已恢复!

告警地址:{HOST.IP}

告警主机:{HOSTNAME1}

告警时间:{EVENT.DATE} {EVENT.TIME}

告警等级:{TRIGGER.SEVERITY}

告警信息: {TRIGGER.NAME}

告警项目:{TRIGGER.KEY1}

问题详情:{ITEM.NAME}:{ITEM.VALUE}

当前状态:{TRIGGER.STATUS}:{ITEM.VALUE1}

事件ID:{EVENT.ID}

image-20230816103638567

image-20230816103727899

使用html邮件告警

前提条件

# 授权目录
[root@zabbix-server conf.d]# chown -R apache.zabbix /var/opt/rh

故障

<head>
    <style type="text/css">
        body{
            background:url('https://seopic.699pic.com/photo/40007/7490.jpg_wh1200.jpg');
        }
    </style>
</head>
<body>
    <img src="https://blog.driverzeng.com/zenglaoshi/sos.png" alt="">
    <style type="text/css">
        table .guzhang {color: red;}
    </style>
    <table border="3"  bordercolor="black" cellspacing="0px" cellpadding="4px" width="500px">
        <tr class="guzhang" bgcolor="#0C1B3D" ><th colspan=2>
            {TRIGGER.STATUS} 故障!!!故障!!! 
            <div>&#128514; &#128514; &#128514;</div>
        </tr>
        <tr >
            <td bgcolor="#F9B602" width="20%">告警主机</td>
            <td bgcolor="#F9B602">{HOSTNAME1}</td>
        </tr>
        <tr >
            <td bgcolor="#F9B602">告警别名</td>
            <td bgcolor="#F9B602">{HOST.NAME} </td>
        <tr >
        <tr >
            <td bgcolor="#F9B602">告警地址</td>
            <td bgcolor="#F9B602">{HOST.IP}</td>
        </tr>
        <tr>
            <td bgcolor="#F9B602">告警时间</td>
            <td bgcolor="#F9B602">{EVENT.DATE} {EVENT.TIME}</td>
        </tr>

        <tr>
            <td bgcolor="#F9B602">告警等级</td>
            <td bgcolor="#F9B602">{TRIGGER.SEVERITY}</td>
        </tr>

        <tr>
            <td bgcolor="#F9B602">告警信息</td>
            <td bgcolor="#F9B602">{TRIGGER.NAME}</td>
        </tr>

        <tr>
            <td bgcolor="#F9B602">告警项目</td>
            <td bgcolor="#F9B602">{TRIGGER.KEY1}</td>
        </tr>
        <tr >
            <td class='guzhang2' bgcolor="#FF3333">问题详情</td>
            <td class='guzhang3' bgcolor="#FF3333">{ITEM.NAME}: {ITEM.VALUE} &#128520; &#128520; &#128520;</td>
        </tr>
        <tr>
            <td bgcolor="#F9B602">当前状态</td>
            <td bgcolor="#F9B602">{TRIGGER.STATUS}: {ITEM.VALUE1}</td>
        </tr>
        <tr>
            <td bgcolor="#F9B602">事件ID</td>
            <td bgcolor="#F9B602">{EVENT.ID}</td>
        </tr>
    </table>
</body>

image-20230816112735797

恢复

<head>
    <style type="text/css">
        table .guzhang {
            color: red;
        }
        body{
            background:url('https://seopic.699pic.com/photo/40007/7490.jpg_wh1200.jpg');
        }
    </style>
</head>
<body>
<img src="https://blog.driverzeng.com/zenglaoshi/huifu.png" alt="">
    <table border="1"  bordercolor="black" cellspacing="0px" cellpadding="4px" width="500px">
        <tr bgcolor="#49c208"><th colspan=2>
        {TRIGGER.STATUS} 哈哈哈哈哈哈,好了 
        <div>&#128512; &#128512; &#128512;</div>
        </tr>

        <tr >
            <td bgcolor="lightgreen" width="20%">恢复主机</td>
            <td bgcolor="yellow">{HOSTNAME1}</td>
        </tr>
        <tr>
            <td bgcolor="lightgreen">恢复别名</td>
            <td bgcolor="yellow">{HOST.NAME} </td>
        <tr >
            <td bgcolor="lightgreen">恢复地址</td>
            <td bgcolor="yellow">{HOST.IP}</td>
        </tr>
        <tr>
            <td bgcolor="lightgreen">恢复时间</td>
            <td bgcolor="yellow">{EVENT.DATE} {EVENT.RECOVERY.TIME}</td>
        </tr>

        <tr>
            <td bgcolor="lightgreen">恢复等级</td>
            <td bgcolor="yellow">{TRIGGER.SEVERITY}</td>
        </tr>

        <tr>
            <td bgcolor="lightgreen">恢复信息</td>
            <td bgcolor="yellow">{TRIGGER.NAME}</td>
        </tr>

        <tr>
            <td bgcolor="lightgreen">恢复项目</td>
            <td bgcolor="yellow">{TRIGGER.KEY1}</td>
        </tr>
        <tr >
            <td bgcolor="#49c208">恢复详情</td>
            <td bgcolor="#49c208">{ITEM.NAME}: {ITEM.VALUE} &#9889; &#9889; &#9889;</td>
        </tr>
        <tr>
            <td bgcolor="lightgreen">当前状态</td>
            <td bgcolor="yellow">{TRIGGER.STATUS}: {ITEM.VALUE1}</td>
        </tr>
        <tr>
            <td bgcolor="lightgreen">事件ID</td>
            <td bgcolor="yellow">{EVENT.ID}</td>
        </tr>
    </table>
</body>

image-20230816112802859

多条件触发器

[root@web01 ~]# vim /etc/zabbix/zabbix_agentd.d/mem.conf
UserParameter=mem.state,free -m|awk '/^Mem/{print $NF*100/$2}'
UserParameter=swap.state,free -m|awk '/^Swap/{print $NF*100/$2}'

内存监控

image-20230816174331723

虚拟内存监控

image-20230816174912653

总内存监控触发器

image-20230816175129113

image-20230816175211347

image-20230816175312759

image-20230816175408720

image-20230816175428582

zabbix自愈模式

添加ssh监控项

image-20230816180111637

添加触发器

image-20230816180245038

创建自愈动作

# 1.修改配置文件,允许执行远程命令
客户端zabbix配置
[root@web01 ~]# vim /etc/zabbix/zabbix_agentd.conf 
EnableRemoteCommands=1

# 2.客户端zabbix用户要sudo提权
visudo
%zabbix ALL=(ALL) NOPASSWD:ALL

# 3.客户端zabbix用户要/bin/bash登录
[root@web01 ~]# usermod zabbix -s /bin/bash

# 4.测试
su - zabbix -c 'sudo ls -l /'

# 5.重启客户端zabbix
[root@web01 ~]# systemctl restart zabbix-agent

image-20230816180337545

image-20230816180454059

image-20230816180710437

监控php状态页

# 安装php-fpm
[root@web01 ~]# yum -y install php-fpm.x86_64 

## 修改配置文件(PHP-FPM工作模式通常与Nginx结合使用,修改php-fpm.conf)
[root@Agent ~]# vim /etc/php-fpm.d/www.conf
pm.status_path = /phpfpm_status

[root@web01 ~]# vim /etc/nginx/nginx.conf
## 添加
        location /php_status {
            fastcgi_pass 127.0.0.1:9000;
            fastcgi_index index.php;
            fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
            include fastcgi_params;
        }

----------------------

...

        error_page 500 502 503 504 /50x.html;
        location = /50x.html {
        }
        location  /phpfpm_status{
        include fastcgi_params;
        fastcgi_pass    127.0.0.1:9000;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        }

.....

# 重启服务
[root@web01 ~]# systemctl restart nginx.service php-fpm.service 

## 浏览器访问
http://10.0.0.7/phpfpm_status

[root@web01 ~]# curl 127.0.0.1/phpfpm_status
pool:                 www
process manager:      dynamic
start time:           16/Aug/2023:18:34:06 +0800
start since:          83
accepted conn:        2
listen queue:         0
max listen queue:     0
listen queue len:     128
idle processes:       4
active processes:     1
total processes:      5
max active processes: 1
max children reached: 0
slow requests:        0

#PHP-FPM状态解释:
pool #fpm池名称,大多数为www
process manager #进程管理方式dynamic或者static
start time #启动日志,如果reload了fpm,时间会更新
start since #运行时间
accepted conn #当前池接受的请求数
listen queue #请求等待队列,如果这个值不为0,那么需要增加FPM的进程数量
max listen queue #请求等待队列最高的数量
listen queue len #socket等待队列长度
idle processes #空闲进程数量
active processes #活跃进程数量
total processes #总进程数量
max active processes #最大的活跃进程数量(FPM启动开始计算)
max children reached #程最大数量限制的次数,如果这个数量不为0,那说明你的最大进程数量过小,可以适当调整。

image-20230816183511736

## 编写php-fpm的Shell脚本
[root@web01 ~]# cd /etc/zabbix/scripts/
[root@web01 scripts]# cat phpfpm_status.sh 
#!/bin/bash
############################################################
# $Name:         phpfpm_status.sh
# $Version:      v1.0
# $Function:     Nginx Status
# $Author:       DriverZeng
# $organization: blog.driverzeng.com
# $Create Date:  2016-06-23
# $Description:  Monitor Nginx Service Status
############################################################

PHPFPM_COMMAND=$1
PHPFPM_PORT=80  #根据监听不同端口进行调整

start_since(){
    /usr/bin/curl -s "http://127.0.0.1:"$PHPFPM_PORT"/phpfpm_status" |awk '/^start since:/ {print $NF}'
}

accepted_conn(){
    /usr/bin/curl -s "http://127.0.0.1:"$PHPFPM_PORT"/phpfpm_status" |awk '/^accepted conn:/ {print $NF}'
}

listen_queue(){
    /usr/bin/curl -s "http://127.0.0.1:"$PHPFPM_PORT"/phpfpm_status" |awk '/^listen queue:/ {print $NF}'
}

max_listen_queue(){
    /usr/bin/curl -s "http://127.0.0.1:"$PHPFPM_PORT"/phpfpm_status" |awk '/^max listen queue:/ {print $NF}'
}

listen_queue_len(){
    /usr/bin/curl -s "http://127.0.0.1:"$PHPFPM_PORT"/phpfpm_status" |awk '/^listen queue len:/ {print $NF}'
}

idle_processes(){
    /usr/bin/curl -s "http://127.0.0.1:"$PHPFPM_PORT"/phpfpm_status" |awk '/^idle processes:/ {print $NF}'
}

active_processes(){
    /usr/bin/curl -s "http://127.0.0.1:"$PHPFPM_PORT"/phpfpm_status" |awk '/^active processes:/ {print $NF}'
}

total_processes(){
    /usr/bin/curl -s "http://127.0.0.1:"$PHPFPM_PORT"/phpfpm_status" |awk '/^total processes:/ {print $NF}'
}

max_active_processes(){
    /usr/bin/curl -s "http://127.0.0.1:"$PHPFPM_PORT"/phpfpm_status" |awk '/^max active processes:/ {print $NF}'
}

max_children_reached(){
    /usr/bin/curl -s "http://127.0.0.1:"$PHPFPM_PORT"/phpfpm_status" |awk '/^max children reached:/ {print $NF}'
}

slow_requests(){
    /usr/bin/curl -s "http://127.0.0.1:"$PHPFPM_PORT"/phpfpm_status" |awk '/^slow requests:/ {print $NF}'
}

case $PHPFPM_COMMAND in
    start_since)
        start_since;
        ;;
    accepted_conn)
        accepted_conn;
        ;;
    listen_queue)
        listen_queue;
        ;;
    max_listen_queue)
        max_listen_queue;
        ;;
    listen_queue_len)
        listen_queue_len;
        ;;
    idle_processes)
        idle_processes;
        ;;
    active_processes)
        active_processes;
        ;;
        total_processes)
                total_processes;
                ;;
        max_active_processes)
                max_active_processes;
                ;;
        max_children_reached)
                max_children_reached;
                ;;
        slow_requests)
                slow_requests;
                ;;
          *)
        echo $"USAGE:$0 {start_since|accepted_conn|listen_queue|max_listen_queue|listen_queue_len|idle_processes|active_processes|total_processes|max_active_processes|max_children_reached}"
    esac

# 给脚本增加可执行权限
[root@web01 scripts]# chmod +x phpfpm_status.sh

# 监控项的phpfpm_status.conf配置文件如下
[root@web01 scripts]# cat /etc/zabbix/zabbix_agentd.d/phpfpm_status.conf 
UserParameter=phpfpm_status[*],/bin/bash /etc/zabbix/scripts/phpfpm_status.sh "$1"

# 重启
[root@web01 ~]# systemctl restart zabbix-agent.service 

# 测试
[root@zabbix ~]# zabbix_get -s 172.16.1.7 -k phpfpm_status[start_since]
393
ok下面配置zabbix

start_since

image-20230816184856589

accepted_conn

image-20230816184923264

listen_queue

image-20230816184945256

max_listen_queue

image-20230816185014267

listen_queue_len

image-20230816185037036

idle_processes

image-20230816185114862

active_processes

image-20230816185144043

total_processes

image-20230816184427709

max_active_processes

image-20230816185212599

max_children_reached

image-20230816185231630

slow_requests

image-20230816185249470

成果展示

image-20230816185349773

创建图形

image-20230816185520056

image-20230816185703345

做MHA的自愈监控

# 在安装mha主机安装zabbix客户端
rpm -ivh https://mirrors.tuna.tsinghua.edu.cn/zabbix/zabbix/5.0/rhel/7/x86_64/zabbix-agent-5.0.36-1.el7.x86_64.rpm

# 修改配置文件
[root@db04 ~]# vim /etc/zabbix/zabbix_agentd.conf 
Server=127.0.0.1,172.16.1.71
Hostname=db04

# 启动并加入开机自启
[root@web01 ~]# systemctl start zabbix-agent
[root@web01 ~]# systemctl enable zabbix-agent

# 修改配置文件,允许执行远程命令
客户端zabbix配置
[root@db04 ~]# vim /etc/zabbix/zabbix_agentd.conf 
EnableRemoteCommands=1

# 客户端zabbix用户要sudo提权
visudo
%zabbix ALL=(ALL) NOPASSWD:ALL

# 客户端zabbix用户要/bin/bash登录
[root@db04 ~]# usermod zabbix -s /bin/bash

# 测试
su - zabbix -c 'sudo ls -l /'

# 重启客户端zabbix
[root@db04 ~]# systemctl restart zabbix-agent

# 创建脚本存放目录
[root@db04 ~]# mkdir /etc/zabbix/scripts

# 编辑脚本
[root@db04 ~]# vim /etc/zabbix/scripts/mha_status.sh
/usr/bin/ps aux | /usr/bin/grep -v grep | /usr/bin/grep -o masterha_manager &> /dev/null
if [ $? -eq 0 ];then
echo '1'
else
echo '0'
fi

# 编辑zabbix客户端监控项配置文件
[root@db04 ~]# vim /etc/zabbix/zabbix_agentd.d/mha.conf
UserParameter=mha.status, sh /etc/zabbix/scripts/mha_status.sh 

# 编辑mha启动脚本
[root@db04 ~]# vim manager.sh
nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /etc/mha/logs/manager.log 2>&1 &

# 重启客户端zabbix
[root@db04 ~]# systemctl restart zabbix-agent

创建主机

image-20230816201506542

配置主机

image-20230816201614096

创建监控项

image-20230816201716232

image-20230816201817350

配置监控项

image-20230816202123931

创建触发器

image-20230816202343269

配置触发器

image-20230816202425719

创建自愈动作

image-20230816202540761

配置自愈动作

image-20230816203107715

image-20230816211838682

暂无评论

发送评论 编辑评论


				
|´・ω・)ノ
ヾ(≧∇≦*)ゝ
(☆ω☆)
(╯‵□′)╯︵┴─┴
 ̄﹃ ̄
(/ω\)
∠( ᐛ 」∠)_
(๑•̀ㅁ•́ฅ)
→_→
୧(๑•̀⌄•́๑)૭
٩(ˊᗜˋ*)و
(ノ°ο°)ノ
(´இ皿இ`)
⌇●﹏●⌇
(ฅ´ω`ฅ)
(╯°A°)╯︵○○○
φ( ̄∇ ̄o)
ヾ(´・ ・`。)ノ"
( ง ᵒ̌皿ᵒ̌)ง⁼³₌₃
(ó﹏ò。)
Σ(っ °Д °;)っ
( ,,´・ω・)ノ"(´っω・`。)
╮(╯▽╰)╭
o(*////▽////*)q
>﹏<
( ๑´•ω•) "(ㆆᴗㆆ)
😂
😀
😅
😊
🙂
🙃
😌
😍
😘
😜
😝
😏
😒
🙄
😳
😡
😔
😫
😱
😭
💩
👻
🙌
🖕
👍
👫
👬
👭
🌚
🌝
🙈
💊
😶
🙏
🍦
🍉
😣
Source: github.com/k4yt3x/flowerhd
颜文字
Emoji
小恐龙
花!
上一篇
下一篇