Scriptome: scripts for systems biology

>

整个包为6Mb，除去文档其实只有不到1.4Mb。

附件中有一个更小的，不过功能不如gawk。

mawk-1.3.3-1-bin.zip (94.71k)

this thread is very nice.

mediocrebeing wrote:
I would like to recommend "awk" for most of the tasks given in that URL. It is much simpler than perl.

Examples:

1a. choose rows where column 3 larger than column 5:
awk '$3>$5' input.txt > output.txt

1b. calculate the sum of column 2 and 3 and put it at the end of a row:
awk '{print $0,$2+$3}' input.txt
or replace the first column:
awk '{$1=$2+$3;print}' input.txt

2. show rows between 20th and 80th (head):
awk 'NR>=20&&NR<=80' input.txt > output.txt

3. calculate the average of column 2:
awk '{x+=$2}END{print x/NR}' input.txt

4. extract column 2,4,5:
awk '{print $2,$4,$5}' input.txt > output.txt
or
awk 'BEGIN{OFS="\t"}{print $2,$4,$5}' input.txt

5. (more complicated) join two files on column 1:
awk 'BEGIN{while((getline<"file1.txt")>0)l[$1]=$0}$1 in l{print $0"\t"l[$1]}' file2.txt > output.txt

6. count number of occurrence of column 2 (uniq -c):
awk '{l[$2]++}END{for (x in l) print x,l[x]}' input.txt

7. apply "uniq" on column 2, only printing the first occurence:
awk '!($2 in l){print;l[$2]=1}' input.txt

8. work count (wc):
awk '{for(i=1;i!=NF;++i)c[$i]++}END{for (x in c) print x,c[x]}' input.txt

9. deal with simple CSV:
awk 'BEGIN{FS=","}{print $1,$2}'

10. egrep:
awk '/^test[0-9]+/' input.txt

11. substitution (sed is simpler):
awk 'BEGIN{OFS="\t"}{sub(/test/, "no", $0);print}' input.txt

All these examples only need 'awk' alone. Note that awk is not good at regular expression and sort. It can be more powerful with the help of other UNIX commands such as 'sort', 'tr', 'sed'.