My software notes

August 29, 2007

[web] online predictions of natively unfolded regions in protein

Filed under: servers — kpwu @ 8:36 am

Natively unfolded protein, or so called intrinsically disordered protein, natively disordered protein is a growing topic for protein chemist. Try to collect sever online servers doing sequence-based prediction of the natively unfolded regions in a protein.

  • Tango: A computer algorithm for prediction of aggregating regions in unfolded polypeptide chains

    Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins. Nat Biotechnol. 2004 Oct;22(10):1302-6. Epub 2004 Sep 12. Fernandez-Escamilla AM, Rousseau F, Schymkowitz J, Serrano L. [link]

  • AGGRESCAN: a server for the prediction and evaluation of “hot spots” of aggregation in polypeptides.

    reference: AGGRESCAN: a server for the prediction and evaluation of “hot spots” of aggregation in polypeptides. BMC Bioinformatics. 2007 Feb 27;8:65. Conchillo-Solé O, de Groot NS, Avilés FX, Vendrell J, Daura X, Ventura S. [link]

  • The Ucon server []


    Natively unstructured regions in proteins identified from contact predictions. Bioinformatics. 2007 Aug 20, Schlessinger A, Punta M, Rost B. [link]

  • PreLink Prediction of Linker Regions in Protein Sequence


    Prediction of unfolded segments in a protein sequence based on amino acid composition.

    Bioinformatics. 2005 May 1;21(9):1891-900. Epub 2005 Jan 18. Coeytaux K, Poupon A. [link]

More sites will be added soon….


August 20, 2007

[book] — dive into python

Filed under: book_list — kpwu @ 9:01 am


official site:

This book has online document for download as well as the book buying from book store.

August 15, 2007

.vimrc for Mac users

Filed under: mac — kpwu @ 8:11 pm

I really don’t know why Apple doesn’t do the similar things like other Linux distribution makers. The default vim is very simple–many setups are not switched on. For example: “syntax color”, and “ruler”.
For mac users, just edit .vimrc at HOME, add two lines:
set syntax enable
set ruler

Your vim will show syntax color when you are working on some program codes and the right bottom corner will show the current position. Then, I don’t need to run “set nu” to see what’s the number of the current line I am..

Here are the screenshots:
The simple mode, no color for the syntax and no position ruler is shown.

Better mode, syntax is colored and the ruler is shown!


August 14, 2007

[DOC]python course in bioinformatics

Filed under: book_list,news — kpwu @ 11:38 am

I am learning python now.  It’s a nice and more powerful tool than the UN*X tools. For example, awk can’t deal with “absolute value”, therefore I have to hack a fuction to remove/add a “-” sign to perform this kinda calcuation. Python definitely provides more numeric functions. Similar to shell script and perl, python is easy to learn and is useful for my daily work. There is a very great example I found on the website.

Clike here, you can see the full content of “python course in bioinformatics”.The author also provides PDF file.  I read the first two chapters and found many excellent examples that stimulate me a lots of ideas to work on my current projects. I should spend few weeks to learn python and rewrite most of the shell scripts I made in python code.

August 7, 2007

calculate proton-proton distances in a PDB

Filed under: softwares and scripts — kpwu @ 12:00 pm

Few months ago, I posted a blog article that I try to use MOLMOL + shell script to get the distance table I want. However, I always forgot how to use the syntax of MOLMOL, therefore, I wrote a shell script to carry out this purpose.

The shell script can calculate any distances between an assigned proton (e.g HB) and other protons, then gives you an output. The output is sorting by the distances or the residue numbers.

The current script DOES NOT deal with the multiple-chain PDB.

Here are the outputs (using 2GB1.pdb as an example):
1. sorting by distance:
~:>./ 2GB1.pdb 34 HA 3.6 yes
Res_atom <—> Res AA ATOM DIST
34-ALA-HA <—> 34 ALA 3HB 2.25
34-ALA-HA <—> 33 TYR HD1 2.48
34-ALA-HA <—> 34 ALA H 2.78
34-ALA-HA <—> 37 ASN 1HB 2.79
34-ALA-HA <—> 34 ALA 2HB 2.80
34-ALA-HA <—> 34 ALA 1HB 2.91
34-ALA-HA <—> 37 ASN 2HD2 2.95
34-ALA-HA <—> 7 LEU 2HD2 3.21
34-ALA-HA <—> 7 LEU 3HD2 3.30
34-ALA-HA <—> 39 VAL HB 3.31
34-ALA-HA <—> 35 ASN H 3.53
2. sorting by residues:
~:>./ 2GB1.pdb 34 HA 3.6 no
Res_atom <—> Res AA ATOM DIST
34-ALA-HA    <—>      7   LEU  2HD2    3.21
34-ALA-HA    <—>      7   LEU  3HD2    3.30
34-ALA-HA    <—>     33   TYR   HD1    2.48
34-ALA-HA    <—>     34   ALA     H    2.78
34-ALA-HA    <—>     34   ALA   1HB    2.91
34-ALA-HA    <—>     34   ALA   2HB    2.80
34-ALA-HA    <—>     34   ALA   3HB    2.25
34-ALA-HA    <—>     35   ASN     H    3.53
34-ALA-HA    <—>     37   ASN   1HB    2.79
34-ALA-HA    <—>     37   ASN  2HD2    2.95
34-ALA-HA    <—>     39   VAL    HB    3.31

Here is the shell script. If this script is helpful to you, please give me a response. Thanks.
Any comments are also welcome.

## A program to calculate the distance of an assigned atom
## and other intramolecular atoms
## see USAGE to know how to run it
## USAGE : ./this_script [PDB] [residue_number] [proton_name] [cutoff] [yes|no]
## e.g. ./ 2GB1.pdb 53 H 6 no
## this will calculate all distances for resdiesu_53_NH
## and other atoms of 2GB1.pdb
## distances within 6 Ang will be output and sort by the distance
## VERSON 0.3 Aug. 7th, 2007
## Kuen-Phon Wu
## See update:

usage=”Usage: [inputfile] [res_number] [proton] [cutoff] [yes|no]”
eg=”Example: ./ 2GB1.pdb 53 HA 6.0 yes”

if [ $# -lt 5 ] ; then
echo “$usage”
echo “$eg”
exit 1


## find x, y, z coordinate of the assigned atom
x=`cat $1 |grep ” $2 ” |grep “$3 ” |awk ‘/^ATOM/{print $7}’`
y=`cat $1 |grep ” $2 ” |grep “$3 ” |awk ‘/^ATOM/{print $8}’`
z=`cat $1 |grep ” $2 ” |grep “$3 ” |awk ‘/^ATOM/{print $9}’`
Res_type=`cat $1 |grep ” $2 ” |grep “$3 ” |awk ‘/^ATOM/{print $4}’`

## caluclate all distances related to the assigned atom
## filter out non H,HB,HD,HG atoms by grep
cat $1 |grep -v ” O.” | grep -v “C[ .A-Z]” |grep -v ” N ” | \
awk ‘/^ATOM/{printf “%5s%6s%6s%8.2f\n”, \
x1=$x y1=$y z1=$z >_temp1

## post-editing do sorting or not sorting, control by case
echo “Res_atom <—> Res AA ATOM DIST”
echo “=============================================”
case “$5” in
cat _temp1 |sort -k 4 |awk ‘$4 < cut {printf “%s-%s-%s <—> %s\n”,res,aa,atom,$0;}’ \
aa=$Res_type cut=$cutoff res=$RES atom=$ATOM |grep -v 0.00

cat _temp1 |awk ‘$4 < cut {printf “%s-%s-%s <—> %s\n”,res,aa,atom,$0;}’ \
aa=$Res_type cut=$cutoff res=$RES atom=$ATOM |grep -v 0.00

*) echo “$usage”
exit 0
rm -f _temp*

Blog at