Here is a situation I met:
I have 3 files with same format (5 columns: res, T2,T2-Error, R2, R2-Error) and I want to get: 1. the average value of the R2, 2 the standard deviation of the three R2 values and 3 the biggest error range among the three errors and the standard deviation.
I decide to use awk and shell tools to help me finish this kind of work. Excel definitely helps out, however, I don’t want to spend 2 hours to do the routine work when I deal with the similar issue. Also, if the script is perfect, I don’t need to worry the human error when I manually compare the 4 errors by eyes.
Here is the example file and the solution:
res T T_SD Rate R_err 3 104.40 11.30 9.58 1.04 5 120.50 5.79 8.30 0.40 6 87.16 3.69 11.47 0.49 8 121.10 7.08 8.26 0.48 9 143.80 8.00 6.95 0.39 11 93.51 4.79 10.69 0.55
The output is:
res R2-1 Err-1 R2-2 Err-2 R2-3 Err-3 Avg Std Big_sd 3 9.580 1.040 9.100 0.990 9.100 0.990 9.260 0.277 1.040 5 8.300 0.400 8.770 0.490 8.770 0.490 8.613 0.271 0.490 6 11.470 0.490 10.500 0.860 10.500 0.860 10.823 0.560 0.860 8 8.260 0.480 9.040 0.340 9.040 0.340 8.780 0.450 0.480 9 6.950 0.390 7.130 0.220 7.130 0.220 7.070 0.104 0.390
solutions:
- use paste to combine 3 files, the product is a file with 15 columns
- use grep -v res to filter out the first line (awk can’t do calculation with the first line)
- use awk to calculate the average, standard deviation and find out the biggest error values
- generate a final output
The content of script is:
————————————————————————–
#!/bin/sh
## Calculate the standard deviation of 3 R2 files from
## the fitting data which R2 was converted by get_rates.sh
## output: avg of 3 R2, std of 3 R2, the biggest errors
##
## avg = (R2-1 + R2-2 + R2-3)/3
## std= SQRT ( ( (r2-1 – avg)^2 + (r2-2 -avg)^2 + (r2-3 -avg)^2) /2)
##
## Date: March 14, 2007
## kpwu
## assign the location the 3 R2 file
file1=analyzed/r2-analyzed-1.txt
file2=analyzed/r2-analyzed-2.txt
file3=analyzed/r2-analyzed-2.txt
# make the first line, to indicate the info of each column
echo “res R2-1 Err-1 R2-2 Err-2 R2-3 Err-3 Avg Std Big_sd” > _temp
#step 1, 2, and 3
paste $file1 $file2 $file3 |grep -v res \
|awk ‘{avg=($4+$9+$14)/3; std=sqrt((($4-avg)^2 + ($9-avg)^2 + ($14-avg)^2 )/2);
i=0; if ($5 > $10) i=$5; else i=$10;
if ( i > $15) i = i; else i = $15;
printf “%1s%10.3f%10.3f%10.3f%10.3f%10.3f%10.3f%10.3f%10.3f%10.3f\n”,
$1,$4,$5,$9,$10,$14,$15,avg,std,i}’ >>_temp
mv _temp average_relaxation.txt
echo “DONE, saved as average_relaxation.txt”
———————————————————————–