Here is a situation I met:

I have 3 files with same format (5 columns: res, T2,T2-Error, R2, R2-Error) and I want to get: 1. the average value of the R2, 2 the standard deviation of the three R2 values and 3 the biggest error range among the three errors and the standard deviation.

I decide to use awk and shell tools to help me finish this kind of work. Excel definitely helps out, however, I don’t want to spend 2 hours to do the routine work when I deal with the similar issue. Also, if the script is perfect, I don’t need to worry the human error when I manually compare the 4 errors by eyes.

Here is the example file and the solution:

res T T_SD Rate R_err 3 104.40 11.30 9.58 1.04 5 120.50 5.79 8.30 0.40 6 87.16 3.69 11.47 0.49 8 121.10 7.08 8.26 0.48 9 143.80 8.00 6.95 0.39 11 93.51 4.79 10.69 0.55

The output is:

res R2-1 Err-1 R2-2 Err-2 R2-3 Err-3 Avg Std Big_sd 3 9.580 1.040 9.100 0.990 9.100 0.990 9.260 0.277 1.040 5 8.300 0.400 8.770 0.490 8.770 0.490 8.613 0.271 0.490 6 11.470 0.490 10.500 0.860 10.500 0.860 10.823 0.560 0.860 8 8.260 0.480 9.040 0.340 9.040 0.340 8.780 0.450 0.480 9 6.950 0.390 7.130 0.220 7.130 0.220 7.070 0.104 0.390

solutions:

- use paste to combine 3 files, the product is a file with 15 columns
- use grep -v res to filter out the first line (awk can’t do calculation with the first line)
- use awk to calculate the average, standard deviation and find out the biggest error values
- generate a final output

The content of script is:

————————————————————————–

#!/bin/sh

## Calculate the standard deviation of 3 R2 files from

## the fitting data which R2 was converted by get_rates.sh

## output: avg of 3 R2, std of 3 R2, the biggest errors

##

## avg = (R2-1 + R2-2 + R2-3)/3

## std= SQRT ( ( (r2-1 – avg)^2 + (r2-2 -avg)^2 + (r2-3 -avg)^2) /2)

##

## Date: March 14, 2007

## kpwu

## assign the location the 3 R2 file

file1=analyzed/r2-analyzed-1.txt

file2=analyzed/r2-analyzed-2.txt

file3=analyzed/r2-analyzed-2.txt

# make the first line, to indicate the info of each column

echo “res R2-1 Err-1 R2-2 Err-2 R2-3 Err-3 Avg Std Big_sd” > _temp

#step 1, 2, and 3

paste $file1 $file2 $file3 |grep -v res \

|awk ‘{avg=($4+$9+$14)/3; std=sqrt((($4-avg)^2 + ($9-avg)^2 + ($14-avg)^2 )/2);

i=0; if ($5 > $10) i=$5; else i=$10;

if ( i > $15) i = i; else i = $15;

printf “%1s%10.3f%10.3f%10.3f%10.3f%10.3f%10.3f%10.3f%10.3f%10.3f\n”,

$1,$4,$5,$9,$10,$14,$15,avg,std,i}’ >>_temp

mv _temp average_relaxation.txt

echo “DONE, saved as average_relaxation.txt”

———————————————————————–

## Leave a Reply