Copy the following lines in a shell script file, make sure the file is executable, so far only work one direction from single letter code to 3-letter code, plan to cover both directions and DNA/RNA in the future.
The script was designed for NMRView sequence format, and Xplor sequence format,too.
————————————————————————————–
#!/bin/sh
# convert protein sequence from one letter code to three letter code
# Just set up your sequence file name for “xseq= ??”
# Kuen-Phon Wu, 06/11/2004 version 0.2
# 06/17/2004 version 0.3
#
# This program will ignore any other characters which are not standard 20 Amino
# acids, like: B,O…. Either lowercases or uppercases can be process via this
# program, have fun!
#
# Just use as: ./seq1to3.sh [FILENAME] > [OUT_FILE]
usage=”Usage: seq1to3.sh [inputfile] > [OUTPUT]“
if [ $# -lt 1 ] ; then
echo “$usage”
exit 1
fi
tr ‘A-Z’ ‘a-z’ ==== $1 | tr ‘[:punct:]‘ ‘b’ | tr ‘0-9′ ‘b’ |tr ‘\n’ ‘b’| \
sed -e ’s/[o|x|z|u|j|b| ]//g’ \
-e ’s/a/ALA\n/g’ -e ’s/c/CYS\n/g’ -e ’s/d/ASP\n/g’ \
-e ’s/e/GLU\n/g’ -e ’s/f/PHE\n/g’ -e ’s/g/GLY\n/g’ \
-e ’s/h/HIS\n/g’ -e ’s/i/ILE\n/g’ -e ’s/k/LYS\n/g’ \
-e ’s/l/LEU\n/g’ -e ’s/m/MET\n/g’ -e ’s/p/PRO\n/g’ \
-e ’s/r/ARG\n/g’ -e ’s/q/GLN\n/g’ -e ’s/n/ASN\n/g’ \
-e ’s/s/SER\n/g’ -e ’s/t/THR\n/g’ -e ’s/w/TRP\n/g’ \
-e ’s/y/TYR\n/g’ -e ’s/v/VAL\n/g’
You should worn people about encoding.
Since your html code does translate
wrongly characters as ”, “” to styled beggining and ending quotes.
To make work under unix. you should also make sure to save it with appropriate line ending and clean ASCII encoding
the script below will do the opposite. You may publish it in your page
Best wishes,
Rogelio
————-
#!/bin/sh
# convert protein sequence from three letter code to one letter code
# Just write down your sequence to a ascii file with
# three letter-words, upper or lower case
# separated by a puntuation sign like comma, space, dash or
# other. Avoid heading, comments or other information
# Rogelio Rodríguez-Sotres, 24/09/2007 adapted from
# Kuen-Phon Wu, 06/11/2004 version 0.3
#
# This program will ignore any other characters in stardar 20 Amino
# acids, Either lowercases or uppercases can be process via this
# program
#
# Just use as: ./seq1to3.sh [FILENAME] > [OUT_FILE]
usage=’Usage: seq3to1.sh [inputfile] > [OUTPUT]‘
if [ $# -lt 1 ] ; then
echo “$usage”
exit 1
fi
#### NOTE: change ==== to the other direction of “>” before running
cat $1 | tr ‘[:punct:]‘ ‘ ‘ | tr ‘[0-9]‘ ‘ ‘ |tr ‘\n’ ‘ ‘| tr ‘[A-Z]‘ ‘[a-z]‘ | \
sed -e ’s/[x|z|j|b| ]/ /g’ \
-e ’s/ ala / A /g’ -e ’s/ cys / C /g’ -e ’s/ asp / D /g’ \
-e ’s/ glu / E /g’ -e ’s/ phe / F /g’ -e ’s/ gly / G /g’ \
-e ’s/ his / H /g’ -e ’s/ ile / I /g’ -e ’s/ lys / K /g’ \
-e ’s/ leu / L /g’ -e ’s/ met / M /g’ -e ’s/ pro / P /g’ \
-e ’s/ arg / R /g’ -e ’s/ gln / Q /g’ -e ’s/ asn / N /g’ \
-e ’s/ ser / S /g’ -e ’s/ thr / T /g’ -e ’s/ trp / W /g’ \
-e ’s/ tyr / Y /g’ -e ’s/ val / V /g’ -e ’s/ //g’
Comment by rogelio rodriguez — August 24, 2007 @ 10:13 am |
Thanks for the comment.
I know I am not a good programmer. I am glad to know other ways to make the script better and better.
This script was the second or third shell script I wrote after I read “Teach yourself Shell Programming” in 2004. Just try to share some of my scripts, I will add such warns about encoding when I release new scripts or I will attach the script in the post.
Kuen-Phon Wu
Comment by kpwu — August 24, 2007 @ 11:48 am |