My software notes

March 15, 2007

Standard deviations and biggest errors

Filed under: softwares and scripts — kpwu @ 3:43 am

Here is a situation I met:
I have 3 files with same format (5 columns: res, T2,T2-Error, R2, R2-Error) and I want to get: 1. the average value of the R2, 2 the standard deviation of the three R2 values and 3 the biggest error range among the three errors and the standard deviation.

I decide to use awk and shell tools to help me finish this kind of work. Excel definitely helps out, however, I don’t want to spend 2 hours to do the routine work when I deal with the similar issue. Also, if the script is perfect, I don’t need to worry the human error when I manually compare the 4 errors by eyes.

Here is the example file and the solution:

res    T        T_SD      Rate     R_err
3    104.40     11.30      9.58      1.04
5    120.50      5.79      8.30      0.40
6     87.16      3.69     11.47      0.49
8    121.10      7.08      8.26      0.48
9    143.80      8.00      6.95      0.39
11    93.51      4.79     10.69      0.55

The output is:

res    R2-1    Err-1      R2-2       Err-2     R2-3      Err-3      Avg     Std    Big_sd
3     9.580     1.040     9.100     0.990     9.100     0.990     9.260     0.277     1.040
5     8.300     0.400     8.770     0.490     8.770     0.490     8.613     0.271     0.490
6    11.470     0.490    10.500     0.860    10.500     0.860    10.823     0.560     0.860
8     8.260     0.480     9.040     0.340     9.040     0.340     8.780     0.450     0.480
9     6.950     0.390     7.130     0.220     7.130     0.220     7.070     0.104     0.390

solutions:

  1. use paste to combine 3 files, the product is a file with 15 columns
  2. use grep -v res to filter out the first line (awk can’t do calculation with the first line)
  3. use awk to calculate the average, standard deviation and find out the biggest error values
  4. generate a final output

The content of script is:
————————————————————————–
#!/bin/sh

## Calculate the standard deviation of 3 R2 files from
## the fitting data which R2 was converted by get_rates.sh
## output: avg of 3 R2, std of 3 R2, the biggest errors
##
## avg = (R2-1 + R2-2 + R2-3)/3
## std= SQRT ( ( (r2-1 – avg)^2 + (r2-2 -avg)^2 + (r2-3 -avg)^2) /2)
##
## Date: March 14, 2007
## kpwu

## assign the location the 3 R2 file
file1=analyzed/r2-analyzed-1.txt
file2=analyzed/r2-analyzed-2.txt
file3=analyzed/r2-analyzed-2.txt

# make the first line, to indicate the info of each column
echo “res R2-1 Err-1 R2-2 Err-2 R2-3 Err-3 Avg Std Big_sd” > _temp
#step 1, 2, and 3
paste $file1 $file2 $file3 |grep -v res \
|awk ‘{avg=($4+$9+$14)/3; std=sqrt((($4-avg)^2 + ($9-avg)^2 + ($14-avg)^2 )/2);
i=0; if ($5 > $10) i=$5; else i=$10;
if ( i > $15) i = i; else i = $15;
printf “%1s%10.3f%10.3f%10.3f%10.3f%10.3f%10.3f%10.3f%10.3f%10.3f\n”,
$1,$4,$5,$9,$10,$14,$15,avg,std,i}’ >>_temp

mv _temp average_relaxation.txt

echo “DONE, saved as average_relaxation.txt”
———————————————————————–

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: