一聚教程网:一个值得你收藏的教程网站

最新下载

热门教程

AWK及PYTHON求和及平均值方法

时间:2016-04-28 编辑:简简单单 来源:一聚教程网

文件如下

# cat cesc
a,1
a,2
b,3
b,4
c,2
d,5

需要获取abcd出现的次数,逗号后面数字的和及平均值。

With shell:

# grep -E ^a cesc |awk -F ',' '{sum+=$2} END {print "a, Count:" NR " Sum: " sum " Average: " sum/NR}'
a, Count:2 Sum: 3 Average: 1.5
# grep -E ^b cesc |awk -F ',' '{sum+=$2} END {print "b, Count:" NR " Sum: " sum " Average: " sum/NR}'
b, Count:2 Sum: 7 Average: 3.5
# grep -E ^c cesc |awk -F ',' '{sum+=$2} END {print "c, Count:" NR " Sum: " sum " Average: " sum/NR}'
c, Count:1 Sum: 2 Average: 2
# grep -E ^d cesc |awk -F ',' '{sum+=$2} END {print "d, Count:" NR " Sum: " sum " Average: " sum/NR}'
d, Count:1 Sum: 5 Average: 5

或者写成一个for循环,这样可移植性更好,另外,在awk中引用shell的变量有两种办法,一个是用双引号和单引号包含变量,如:”‘var'”,还有就是使用awk的-v参数提前声明,如:awk -v var=”$var”

# for i in `cat cesc |cut -d, -f1|sort|uniq`;do grep -E ^$i cesc |awk -F ',' '{sum+=$2} END {print "'$i'" " Count: " NR ", Sum: " sum ", Average: " sum/NR}';done
a Count: 2, Sum: 3, Average: 1.5
b Count: 2, Sum: 7, Average: 3.5
c Count: 1, Sum: 2, Average: 2
d Count: 1, Sum: 5, Average: 5
或者:
# for i in `cat cesc |cut -d, -f1|sort|uniq`;do grep -E ^$i cesc |awk -v i="$i" -F ',' '{sum+=$2} END {print i " Count: " NR ", Sum: " sum ", Average: " sum/NR}';done
a Count: 2, Sum: 3, Average: 1.5
b Count: 2, Sum: 7, Average: 3.5
c Count: 1, Sum: 2, Average: 2
d Count: 1, Sum: 5, Average: 5
 


With python:(python的整形除法默认地板除,只返回一个整形,可以使用from __future__ import division来实现真正的除法)

from __future__ import division
alist = []
blist = []
clist = []
dlist = []
for i in open('cesc'):
    ss = i.split(',')
    if ss[0] == 'a':
        alist.append(int(ss[1]))
    elif ss[0] == 'b':
        blist.append(int(ss[1]))
    elif ss[0] == 'c':
        clist.append(int(ss[1]))
    elif ss[0] == 'd':
        dlist.append(int(ss[1]))
print 'a, Count: ' + str(len(alist)) + ', Sum: ' + str(sum(alist)) + '. Average: ' + str(sum(alist)//len(alist))
print 'b, Count: ' + str(len(blist)) + ', Sum: ' + str(sum(blist)) + '. Average: ' + str(sum(blist)//len(blist))
print 'c, Count: ' + str(len(clist)) + ', Sum: ' + str(sum(clist)) + '. Average: ' + str(sum(clist)//len(clist))
print 'd, Count: ' + str(len(dlist)) + ', Sum: ' + str(sum(dlist)) + '. Average: ' + str(sum(dlist)//len(dlist))

AWK 求和、平均值、最值


记录几条命令:(打包当前目录下的所有文件)
ls | awk '{ print "tar zcvf "$0".tar.gz " $0|"/bin/bash" }'

(取范围)
[root@VM-202 zhuo]# echo "abc#1233+232@jjjj?===" |awk -F '[#@]' '{print $2}'
1233+232

[root@VM-202 zhuo]# echo "abc#1233+232@jjjj?===" |awk -F '[@?]' '{print $2}'
jjjj

awk '/^[^$]/ {print $0}' test.txt        匹配非空行
awk '/^[^zhuo]/ {print $0}' test.txt        匹配非包含zhuo的

替换(将:替换成#)

[root@VM-202 zhuo]# echo "zhuo:x:503:504::/home/zhuo:/bin/bash" |awk 'gsub(/:/,"#") {print $0}'
zhuo#x#503#504##/home/zhuo#/bin/bash

you.txt文档内容

1
2
3
4

列求和: cat you.txt |awk '{a+=$1}END{print a}'
列求平均值:cat you.txt |awk '{a+=$1}END{print a/NR}'
列求最大值:cat you.txt |awk 'BEGIN{a=0}{if ($1>a) a=$1 fi}END{print a}'

设定一个变量开始为0,遇到比该数大的值,就赋值给该变量,直到结束。
求最小值:cat you.txt |awk 'BEGIN{a=11111}{if ($1

_求全文的最值

例:求test.txt的最值
12 34 56 78
24 65 87 90
76 11 67 87
100 89 78 99

for i in `cat test.txt` ;do echo $i; done |sort |sed -n '1p;2p'

例2:同样是test.txt
求总和:for i in `cat you.txt`;do echo $i ;done |awk '{a+=$1}END{print a}'
________

例3:
A     88
B     78
B     89
C     44
A     98
C     433
要求输出:A:88;98
          B:78;89
          C:44;433

awk '{a[$1]=a[$1]" "$2}END{for(i in a)print i,a[i]}' test.txt |awk '{print $1":",$2";",$3}'

热门栏目