My software notes

March 15, 2007

Standard deviations and biggest errors

Filed under: softwares and scripts — kpwu @ 3:43 am

Here is a situation I met:
I have 3 files with same format (5 columns: res, T2,T2-Error, R2, R2-Error) and I want to get: 1. the average value of the R2, 2 the standard deviation of the three R2 values and 3 the biggest error range among the three errors and the standard deviation.

I decide to use awk and shell tools to help me finish this kind of work. Excel definitely helps out, however, I don’t want to spend 2 hours to do the routine work when I deal with the similar issue. Also, if the script is perfect, I don’t need to worry the human error when I manually compare the 4 errors by eyes.

Here is the example file and the solution:

res    T        T_SD      Rate     R_err
3    104.40     11.30      9.58      1.04
5    120.50      5.79      8.30      0.40
6     87.16      3.69     11.47      0.49
8    121.10      7.08      8.26      0.48
9    143.80      8.00      6.95      0.39
11    93.51      4.79     10.69      0.55

The output is:

res    R2-1    Err-1      R2-2       Err-2     R2-3      Err-3      Avg     Std    Big_sd
3     9.580     1.040     9.100     0.990     9.100     0.990     9.260     0.277     1.040
5     8.300     0.400     8.770     0.490     8.770     0.490     8.613     0.271     0.490
6    11.470     0.490    10.500     0.860    10.500     0.860    10.823     0.560     0.860
8     8.260     0.480     9.040     0.340     9.040     0.340     8.780     0.450     0.480
9     6.950     0.390     7.130     0.220     7.130     0.220     7.070     0.104     0.390

solutions:

  1. use paste to combine 3 files, the product is a file with 15 columns
  2. use grep -v res to filter out the first line (awk can’t do calculation with the first line)
  3. use awk to calculate the average, standard deviation and find out the biggest error values
  4. generate a final output

The content of script is:
————————————————————————–
#!/bin/sh

## Calculate the standard deviation of 3 R2 files from
## the fitting data which R2 was converted by get_rates.sh
## output: avg of 3 R2, std of 3 R2, the biggest errors
##
## avg = (R2-1 + R2-2 + R2-3)/3
## std= SQRT ( ( (r2-1 – avg)^2 + (r2-2 -avg)^2 + (r2-3 -avg)^2) /2)
##
## Date: March 14, 2007
## kpwu

## assign the location the 3 R2 file
file1=analyzed/r2-analyzed-1.txt
file2=analyzed/r2-analyzed-2.txt
file3=analyzed/r2-analyzed-2.txt

# make the first line, to indicate the info of each column
echo “res R2-1 Err-1 R2-2 Err-2 R2-3 Err-3 Avg Std Big_sd” > _temp
#step 1, 2, and 3
paste $file1 $file2 $file3 |grep -v res \
|awk ‘{avg=($4+$9+$14)/3; std=sqrt((($4-avg)^2 + ($9-avg)^2 + ($14-avg)^2 )/2);
i=0; if ($5 > $10) i=$5; else i=$10;
if ( i > $15) i = i; else i = $15;
printf “%1s%10.3f%10.3f%10.3f%10.3f%10.3f%10.3f%10.3f%10.3f%10.3f\n”,
$1,$4,$5,$9,$10,$14,$15,avg,std,i}’ >>_temp

mv _temp average_relaxation.txt

echo “DONE, saved as average_relaxation.txt”
———————————————————————–

Blog at WordPress.com.