My software notes

July 19, 2012

[Xplor-NIH] modified residues (protein)

Filed under: xplor/xplor-nih/cns — kpwu @ 1:02 am
Tags: , , ,

I searched the Xplor-NIH mailing list to know how many modified amino acids are supported in Xplor-NIH.

So far, I can see N-terminal acetylation and spin labeled (MTSL) cysteine are documented in the mailing list. I wonder to know if acetylated lysine, phosphorylated serine, threonine and methylation are also supported in Xplor-NIH.

Here is the example of N-terminal acetylation and MTSL-labeled cysteine:

code: ACE = N-terminal acetylation. Xplor-NIH set it as an individual residue
CYSP = spin labeled cysteine.

Steps to do:

  1.  edit your sequence file: (3-letter code, residues are separated by space), add ACE as the first residue and change the desire CYS to CSYP
  2. in typical xplor-nih (see such example scripts in xplor-nih home/eginputs), around line 35-45, change to:
    # generate PSF data from sequence and initialize the correct parameters.
    from psfGen import seqToPSF
    seqToPSF(“xyza.seq”, seqType=’prot’, startResid=0)

    –> The original script was marked off and a similar script was shown here. I also alter the number start residue to “0”.  That will be easier for me while doing NOESY assignment without shifting my residue numbers

  3. Then, run “xplor -py” to check if the generated template PDB have correct residue numbers and the modified amino acids.

Here is the snapshot of my generated template PDB. (ACE as residue 0, CYSP as residue 2)

February 22, 2012

Things must be changed before Protein-RNA docking using Haddock

Filed under: xplor/xplor-nih/cns — kpwu @ 8:10 pm
Tags: , ,

Just a note for RNA-protein docking using Haddock 2.1

System: Mac OS 10.6.8, CNS 1.3, Haddock 2.1

1. RNA (or DNA) nomenclature in CNS is not same as the names in PDB.

e.g. ACU in PDB = ADE CYT URI in CNS

The “nucleic acid builder” developed in David  Case group can generate template structure 100% satisfied to CNS format

2. change all paramenters/toppar file from version 5.3 to 5.4 (if using Mac/CNS 1.3)

3. Assume molecule B is RNA/DNA, then in

A. “filenames”, change {===>} dna_B=false; to {===>} dna_B=true (for DNA);

B. “DNA-RNA restraints”, change {===>} dnarest_on=false;  to {===>} dnarest_on=true;
also copy file “dna-rna_restraints.def” to run1/data/sequence (if it’s run1 in your path)

The “dan-rna_restraints.def” can be found in the Haddock example folder — see 3cro
–> change the resid numbers at lines 30, 35
=== example ===
{========================== base planarity ===========================}

{* Nucleic acid residues to have base planarity restrained. This selection
must only include nucleotide residues *}
{===>} bases_planar=(resid 1:11 and segid B);

{========================== sugar puckers ============================}

{* residues with sugar pucker restrained – group 1 *}
{===>} pucker_1=(resid 1:11 and segid B);


C. “topology and parameter files”, change

C1. prot_top_B=”” to prot_top_B=”
C2. prot_link_B=”topallhdg5.4.pep to  to prot_link_B=”
C3.  prot_par_B=”” to prot_par_B=”dna-rna-allatom-hj-opls-1.3.param

February 8, 2012

water refinement in Xplor-NIH

Filed under: xplor/xplor-nih/cns — kpwu @ 1:23 pm

Starting at Xplor-NIH 2.26 (now it’s version 2.29 on Feb. 8th, 2012), official Xplor-NIH package provides an example script doing explicit solvent refinement. The short description made by Xplor-NIH author:

refinement with explicit solvent and full electrostatics. Includes rdcs, noes, jcoupling terms and dihedral restraints. This is a work-in-progress. Please compare with other protocols. In particular, this protocol seems to result in structures for which the DIHE term is large.

Note on J-coupling violations:

Calculated structures exhibit at least four consistently violated J-coupling restraints. These restraints are at or near mobile loop regions where the single structure approximation breaks down. In these regions an ensemble of structures must be used to fit all experimental data. Please see the gb3_ensemble directory for an example of ensemble refinement.

The script can be found (after version 2.26) at : XPLOR-NIH HOME/eginput/gb1_rdc

I did a quick try using a 90-aa protein–a small protein I am working on determining the solution structure. The input structure was originally refined by the refinement script provided by Xplor-NIH (with NOE, CDIH, RDC) from simulated annealed  templates. The structural quality after regular refinement has been improved a lot ( by checking the Ramachandran plot: from ~60% to 85%). But the water refinement makes the solution structure better (85% –> 95%) although a lot of DIHE violations are noted by Xplor-NIH (same issue as issued by the author).

Here is the analysis done by Molprobity ( showing improved scores and structural quality.

1. Ramachandran plot of structure refined by NOE, CDIH and RDC only.

2. Structure in #1 was refined by water refinement

3. Molprobity analysis of structures in 1 (up) and 2 (down) (click to see full sized snapshots)

March 29, 2011

HADDOCK on iMac and a quick benchmark

Filed under: softwares and scripts,xplor/xplor-nih/cns — kpwu @ 11:52 am
Tags: ,

Recently I requested a copy of HADDOCK from Dr. Alexandre Bonvin in order to generate some docked dimers for my colleagues. They are working on some dimeric proteins but they have no idea how to obtain the dimeric conformation from homologous known structures.

I spent few days reading the threads in the HADDOCK discussion group at Yahoo and I have made the HADDOCK 2.1 work well on my iMac. Here is the system environment I used:

iMac:  3.06GHz Intel i3 CPU
OS: 10.6. 5
CNS: 1.3

Quick notes:

1. Always change the toppar files from 5.3 (default in run.cns) to 5.4 (must be 5.4 if you use CNS 1.3).
2. The path of CNS in run.cns is probably not same as your own. Always modify it before run HADDOCK
3. Numbers of CPUs used for calculation are 2 by default, it’s changeable,too.

I use the AIR restraint file to generate a dimeric structure. The results generated by HADDOCK and Modeller are showing here.

The best 10 structures are superimposed.

The HADDOCK and Modeller generated dimer conformations are colored in orange and blue, respectively.

blue is from Haddock and orange is from Modeller 9v8

January 27, 2011

Names of Atoms of Amino acids

Filed under: softwares and scripts,xplor/xplor-nih/cns — kpwu @ 4:03 pm
Tags: ,

I really hate the inconsistent nomenclature of atoms of amino acids between different programs/database. I finished all NOESY assignment on Sparky using PDB nomenclature and the Sparky XPLOR constraint plugin (shortcut xf) doesn’t take care of the differences between XPLOR and PDB. Thus I have to find a table showing me the differences of names between XPLOR and PDB.

I found BMRB provides such table including few programs (XPLOR, DIANA, PDB, BMRB, SYBYL…) and I have extracted the BMRB/PDB/XPLOR/DIANA columns and reorganized the information. The new table I made is saved as PDF and attached.

Snapshot of part of this table is shown here.

Attached PDF: AA_atoms-nomemclature

December 27, 2010

Test reports of CS-Rosetta (3.1)

Filed under: mac,softwares and scripts,xplor/xplor-nih/cns — kpwu @ 1:08 pm
Tags: , ,

On Dec 23rd, I successfully install Rosetta 3.1 and CS-Rosetta on my iMac and run the test using GB3 protein which was provided within the CS-Rosetta.

Here are the rough steps I used to get a simulated GB3 protein.

  1. run “” as instructed in the online manual
  2. The has to be modified first to perform ab initio simulation by Rosetta3.
    (basically, just replace all double colon “::” in the by single colon “:”).
  3. After ~70 hours simulation using a single CPU (3.06 GHz) on my iMac, I got the simulation done.
  4. As instructed by CS-Rosetta, one has to generate a “silent output” (the
  5. Before running “”, you have to edit 2 files:
    1. go to the CS-Rosetta folder, in the “com” subfolder,edit “”, make sure no double colons are in the script same as step 3.
    2. edit,:
    at line 28, it was “mkdir output -p” and the new one should be “mkdir -p output”,
    at line 101, it was “rm ./pred/………  -f” and the new one should be “rm -f ./pred…….”.
    Then you can run “” and no error/warming messages will be shown.

I don’t give the program the defined seg.txt in this test.

The error message at step 5.2 look like the following lines:


userssxx:rosetta> new_silent.out
./output directory generated
mkdir: output: File exists
mkdir: -p: File exists

extracting PDB coodinates…

generate decoys raw score table file : ./output/name.rawscore.txt
decoy with the lowest energy score: S_j001_00000289
calculate rmsd to the lowest raw score decoy ( S_j001_00000289 ) and generate file : ./output/rms2LowRawScore.txt
generate rms_toLowestRawscoreModel verus score table file : ./output/name.rms.rawscore.txt
labadmin:rosetta> new_silent.out
./output directory generated
mkdir: output: File exists
mkdir: -p: File exists

extracting PDB coodinates…

generate decoys raw score table file : ./output/name.rawscore.txt
rescoring decoys using sparta calculated chemical shifts…
(input chemical shift shift )
calculating chemical shift score for decoy S_j001_00000001
rm: -f: No such file or directory
calculating chemical shift score for decoy S_j001_00000002
rm: -f: No such file or directory
calculating chemical shift score for decoy S_j001_00000003
rm: -f: No such file or directory


Now, I am waiting for the back CS-prediction by Sparta and getting the scores. But I check several simulated PDB coordinates of GB3 protein, the structures are quite similar with the published one. Here is a random one.

After the complete of back CS prediction by sparta, the script is supposed to collect all information and generate few text file which contains the CS_chi2, rms and other information. I don’t know why the script was looking for S_j001_0001000.pdb (because I don’t have it, only 0-999.pdb), so the “pdbrms doesn’t work properly. Therefore name.rms.rescore.txt” does not have the complete information (should have 3 columns).

I did the following steps to get the complete information. All the following commands are extracted from the

  1. set best_model = `cat name.rescore.txt | sort -g -k4 | head -n1 | awk ‘{print $1}’`
  2. pdbrms $best_model.pdb S*.pdb | awk ‘{print $2,$1}’ | sort -g > rms2LowReScore.txt
    if you have seg.txt, then the commands are:
    set seg = `cat ../seg.txt`
    pdbrms $best_model.pdb S*.pdb -seg $seg | awk ‘{print $2,$1}’ | sort -g > rms2LowReScore.txt
  3. paste name.rescore.txt rms2LowReScore.txt | awk ‘{ print $1,$6,$4 }’ > name.rms.rescore.txt

The 1000 simulated GB3 structures seem to have trend of convergence (see the figure below). If more structures as suggested by the CS-rosetta manual (e.g. 10,000 structures), the convergence should be clear.

The 20 lowest Calpha_RMSD structures (left) and 20 lowest energy structures are superimposed by the NMR plugin in pymol which RMSD of backbone atoms (residues 1-56) are 0.6  and 0.68 angstrong, respectively.

December 24, 2010

Install Xplor-NIH 2.26 on new iMac (10.6)

Filed under: mac,softwares and scripts,xplor/xplor-nih/cns — kpwu @ 9:52 am
Tags: , ,

A short note for installation of xplor-nih 2.26 on iMac.

Few months ago, I downloaded XPLOR-NIH 2.26 for Mac OS and installed it on my new iMac (core i3, 3.06 GHz). The default configuration script does not include “Intel Core i3”  in the processor list (at xplor-nih-xxx/arch/getDarwinCPU).

The solution to make the configuration is to edit “getDarwinCPU” script. I replaced i5 by i3 at line 21 and keep everything else unchaged.  The configuration script works and an extensive test were performed, no interruption occurred.

Second note on 2/5/2011.
For version of Xplor-NIH 2.27, same edition has to be done to make it run well on my iMac. Some Mac users may have the same problem.

June 2, 2006

NOE constraints counter

Filed under: softwares and scripts,xplor/xplor-nih/cns — kpwu @ 7:12 pm
Tags: , ,

Yesterday, I spent 1.5 hours writing a shell script which can take xplor-nih NOE table files as inputs and output the numbers of NOEs of each residue and report the numbers of intra residual, sequential, short, medium and long range NOEs after sorting. The current script doesn’t take ambiguous and hydrogen bond constraints as input. To run the script well, users have to remove those ambiguous and hydrogen bond constraints first. Oh, also, I won’t calculate complex structure, the script doesn’t support complex structure constraints as inputs (which means the file has segid “a”, segid “b” in each line).

The example results are like:

residue 5 has 1 NOEs
residue 6 has 2 NOEs
residue 7 has 1 NOEs
residue 8 has 4 NOEs
residue 9 has 0 NOEs
residue 10 has 1 NOEs
residue 11 has 4 NOEs
residue 12 has 7 NOEs
residue 13 has 9 NOEs
residue 14 has 5 NOEs
residue 15 has 6 NOEs
residue 16 has 7 NOEs
residue 17 has 7 NOEs
residue 18 has 13 NOEs
residue 19 has 8 NOEs
residue 20 has 11 NOEs
and the statistics look like:

total counted NOEs are: 717
intra_residual (j-i = 0): 205
sequential (j-i = 1): 263
medium range (j-i = 2 ): 34
medium range (j-i = 3 ): 6
medium range (j-i = 4 ): 3
long range (j-i >= 5 ): 206


February 3, 2006

making dimer PSF

Filed under: xplor/xplor-nih/cns — kpwu @ 10:34 pm

got from xplor-NIH discussion mailing list.

two ways to generate dimer PSF:

  1. If you have a PSF file for the monomer, it’s easy to create one for
    the dimer using the “classic”
    xplor interface. Try this:
    struct @monomer.psf end
    vector do (segid = “A”) (all)
    struct @monomer.psf end
    vector do (segid = “B”) (not segid A)
    write struct output = “dimer.psf” end
    (by JK)
  2. In the Python interface, easiest is:
    seq=”’ string containing three character residue names”’import psfGen

    #if you wish to save psf info:
    xplor.command(“write psf output=new.psf end”)
    (by CS)

January 25, 2006

convert protein sequence from 1 to 3 letter code for each residue

Filed under: NMRPipe and NMRview,xplor/xplor-nih/cns — kpwu @ 3:37 am

Copy the following lines in a shell script file, make sure the file is executable, so far only work one direction from single letter code to 3-letter code, plan to cover both directions and DNA/RNA in the future.

The script was designed for NMRView sequence format, and Xplor sequence format,too.

# convert protein sequence from one letter code to three letter code
# Just set up your sequence file name for “xseq= ??”
# Kuen-Phon Wu, 06/11/2004 version 0.2
# 06/17/2004 version 0.3
# This program will ignore any other characters which are not standard 20 Amino
# acids, like: B,O…. Either lowercases or uppercases can be process via this
# program, have fun!
# Just use as: ./ [FILENAME] > [OUT_FILE]

usage=”Usage: [inputfile] > [OUTPUT]”

if [ $# -lt 1 ] ; then
echo “$usage”
exit 1

tr ‘A-Z’ ‘a-z’ ==== $1 | tr ‘[:punct:]’ ‘b’ | tr ‘0-9’ ‘b’ |tr ‘\n’ ‘b’| \
sed -e ‘s/[o|x|z|u|j|b| ]//g’ \
-e ‘s/a/ALA\n/g’ -e ‘s/c/CYS\n/g’ -e ‘s/d/ASP\n/g’ \
-e ‘s/e/GLU\n/g’ -e ‘s/f/PHE\n/g’ -e ‘s/g/GLY\n/g’ \
-e ‘s/h/HIS\n/g’ -e ‘s/i/ILE\n/g’ -e ‘s/k/LYS\n/g’ \
-e ‘s/l/LEU\n/g’ -e ‘s/m/MET\n/g’ -e ‘s/p/PRO\n/g’ \
-e ‘s/r/ARG\n/g’ -e ‘s/q/GLN\n/g’ -e ‘s/n/ASN\n/g’ \
-e ‘s/s/SER\n/g’ -e ‘s/t/THR\n/g’ -e ‘s/w/TRP\n/g’ \
-e ‘s/y/TYR\n/g’ -e ‘s/v/VAL\n/g’

Next Page »

Blog at