最新下载
热门教程
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
linux下shell及awk统计日志中相同ip的访问次数
时间:2015-04-17 编辑:简简单单 来源:一聚教程网
awk统计ip访问次数
现在有一个文件,数据量大概在200多万条记录,想用shell的awk做统计,文件的格式如下
#关键字#URL#IP地址#
test|123|1
test|123|1
test|123|2
test2|12|1
test2|123|1
test2|123|2
现在想要统计的结果是:查看同一个关键字和URL总的访问的次数,以及多少个不同的IP,输出到一个文件中
SQL的实现就很简单 select keyword ,url ,count(1),count(distinct IP) group by keyword ,url ,但是数据量太大,报表跑不出来,想在shell下面实现,但是我shell不精通,不知道如何快捷的实现,尤其是那个distinct的那个
理想的结果是:
#关键字#URL#不同IP#搜索次数
test 123 2 3
test2 123 1 2
test2 12 1 1
wk -F"|" '{a[$1" "$2]++;b[$1" "$2" "$3]++}(b[$1" "$2" "$3]==1){++c[$1" "$2]}END{ for (i in a) print i,c[i],a[i]}' file
test2 123 2 2
test2 12 1 1
test 123 2 3
统计一天apache日志每小时每IP访问次数
日志格式如下:
127.0.0.1 - - [03/Feb/2013:14:18:10 +0800] "GET /ucenterrvicecenter/SCenterRequest.php HTTP/1.0" 302 242
127.0.0.1 - - [03/Feb/2013:14:18:10 +0800] "GET /ucenterrvicecenter/SCenterRequest.php HTTP/1.0" 200 -
111.111.111.35 - - [03/Feb/2013:14:18:32 +0800] "GET /myadmin/ HTTP/1.1" 401 933
111.111.111.35 - root [03/Feb/2013:14:18:33 +0800] "GET /myadmin/ HTTP/1.1" 200 1826
111.111.111.35 - root [03/Feb/2013:14:18:34 +0800] "GET /myadmin/main.php?token=67b1c9d29f9ac9107627bb991c8d2ca6 HTTP/1.1" 200 7633
111.111.111.35 - - [03/Feb/2013:14:18:34 +0800] "GET /myadmin/css/print.css?token=67b1c9d29f9ac9107627bb991c8d2ca6 HTTP/1.1" 200 1063
111.111.111.35 - root [03/Feb/2013:14:18:34 +0800] "GET /myadmin/css/phpmyadmin.css.php?token=67b1c9d29f9ac9107627bb991c8d2ca6&js_frame=right&nocache=1359872314 HTTP/1.1" 200 20322
111.111.111.35 - root [03/Feb/2013:14:18:34 +0800] "GET /myadmin/navigation.php?token=67b1c9d29f9ac9107627bb991c8d2ca6 HTTP/1.1" 200 1362
111.111.111.35 - root [03/Feb/2013:14:18:36 +0800] "GET /myadmin/css/phpmyadmin.css.php?token=67b1c9d29f9ac9107627bb991c8d2ca6&js_frame=left&nocache=1359872314 HTTP/1.1" 200 3618
111.111.111.35 - root [03/Feb/2013:14:18:38 +0800] "GET /myadmin/navigation.php?server=1&db=ucenter&table=&lang=zh-utf-8&collation_connection=utf8_unicode_ci HTTP/1.1" 200 9631
代码如下:
[root@localhost sampdb]# awk -vFS="[:]" '{gsub("-.*","",$1);num[$2" "$1]++}END{for(i in num)print i,num[i]}' data1
14 127.0.0.1 2
14 111.111.111.35 8
awk统计日志中相同ip的访问次数
现有一日志,需要统计出每个ip访问的次数
180.153.114.199 - - [03/Jul/2013:14:44:43 +0800] GET /wp-login.php?redirect_to=http%3A%2F%2Fdemo.catjia.com%2Fwp-admin%2Fplugin-install.php%3Ftab%3Dsearch%26s%3DVasiliki%26plugin-search-input%3D%25E6%2590%259C%25E7%25B4%25A2%25E6%258F%2592%25E4%25BB%25B6&reauth=1 HTTP/1.1 200 2355 - Mozilla/4.0 -
101.226.33.200 - - [03/Jul/2013:14:45:52 +0800] GET /wp-admin/plugin-install.php?tab=search&type=term&s=Photogram&plugin-search-input=%E6%90%9C%E7%B4%A2%E6%8F%92%E4%BB%B6 HTTP/1.1 302 0 - Mozilla/4.0 -
101.226.33.200 - - [03/Jul/2013:14:45:52 +0800] GET /wp-login.php?redirect_to=http%3A%2F%2Fdemo.catjia.com%2Fwp-admin%2Fplugin-install.php%3Ftab%3Dsearch%26type%3Dterm%26s%3DPhotogram%26plugin-search-input%3D%25E6%2590%259C%25E7%25B4%25A2%25E6%258F%2592%25E4%25BB%25B6&reauth=1 HTTP/1.1 200 2370 - Mozilla/4.0 -
113.110.176.131 - - [03/Jul/2013:15:03:57 +0800] GET /wp-content/themes/catjia-lio/images/menu_hover_bg.png HTTP/1.1 304 0 http://demo.catjia.com/wp-content/themes/catjia-lio/style.css Mozilla/5.0 (Windows NT 6.2; WOW64; rv:21.0) Gecko/20100101 Firefox/21.0 -
180.153.205.103 - - [03/Jul/2013:15:13:59 +0800] GET /wp-admin/options-general.php HTTP/1.1 302 0 - Mozilla/4.0 -
180.153.205.103 - - [03/Jul/2013:15:13:59 +0800] GET /wp-login.php?redirect_to=http%3A%2F%2Fdemo.catjia.com%2Fwp-admin%2Foptions-general.php&reauth=1 HTTP/1.1 200 2269 - Mozilla/4.0 -
101.226.51.227 - - [03/Jul/2013:15:14:07 +0800] GET /wp-admin/options-general.php?settings-updated=true HTTP/1.1 302 0 - Mozilla/4.0 -
101.226.51.227 - - [03/Jul/2013:15:14:07 +0800] GET /wp-login.php?redirect_to=http%3A%2F%2Fdemo.catjia.com%2Fwp-admin%2Foptions-general.php%3Fsettings-updated%3Dtrue&reauth=1 HTTP/1.1 200 2291 - Mozilla/4.0 -
咋看之下,日志记录的东西太多了,从何入手?
相信不少人知道可以通过awk提取第一列数据出来,即ip地址。
可是提取出来之后呢?怎么统计每个ip出现的次数?
要说复杂还挺复杂,不过用多了就简单了。
# awk '{a[$1]+=1;}END{for(i in a){print a[i]" " i;}}' demo.catjia.com_access.log
2 180.153.206.26
120 113.110.176.131
2 101.226.33.200
2 101.226.66.175
2 112.65.193.16
2 101.226.51.227
2 112.64.235.86
2 101.226.33.223
1 101.227.252.23
2 180.153.205.103
2 101.226.33.216
2 112.64.235.89
4 180.153.114.199
2 112.64.235.254
2 180.153.206.34
如果要保存结果,则可以通过重定向保存到文本里。
现在已经统计出每个相同ip的次数了,但是如果数据多的话看起来还比较混乱,比如想要知道访问次数最多的是哪个ip呢?
那就加个sort排序吧
# awk '{a[$1]+=1;}END{for(i in a){print a[i]" " i;}}' demo.catjia.com_access.log |sort
1 101.227.252.23
120 113.110.176.131
2 101.226.33.200
2 101.226.33.216
2 101.226.33.223
2 101.226.51.227
2 101.226.66.175
2 112.64.235.254
2 112.64.235.86
2 112.64.235.89
2 112.65.193.16
2 180.153.205.103
2 180.153.206.26
2 180.153.206.34
4 180.153.114.199
这样一看,貌似排序了,但仔细一看,出现120次的ip怎么排在第二位,不是应该排在最后么?
其实这里还需要加个参数-g,否则排序会按第一个字符来排序,就会出现如上的情况。
看看加个-g参数后的结果
# awk '{a[$1]+=1;}END{for(i in a){print a[i]" " i;}}' demo.catjia.com_access.log |sort -g
1 101.227.252.23
2 101.226.33.200
2 101.226.33.216
2 101.226.33.223
2 101.226.51.227
2 101.226.66.175
2 112.64.235.254
2 112.64.235.86
2 112.64.235.89
2 112.65.193.16
2 180.153.205.103
2 180.153.206.26
2 180.153.206.34
4 180.153.114.199
120 113.110.176.131
嗯,这才是想要的结果。。
相关文章
- win11内核隔离和内存完整性介绍 10-31
- win10全屏缩放设置教程 10-31
- win10系统备份出错解决教程 10-31
- win10打开软件每次都要询问解决教程 10-31
- win10更新驱动后设备出现异常解决教程 10-31
- win10一直提示找到可能不需要的应用解决教程 10-31