我有一个文本文件,如下图所示
ATOM 920 CA GLN A 203 39.292 -13.354 17.416 1.00 55.76 C
ATOM 929 CA HIS A 204 38.546 -15.963 14.792 1.00 29.53 C
ATOM 939 CA ASN A 205 39.443 -17.018 11.206 1.00 54.49 C
ATOM 947 CA GLU A 206 41.454 -13.901 10.155 1.00 26.32 C
ATOM 956 CA VAL A 207 43.664 -14.041 13.279 1.00 40.65 C
.
.
.
ATOM 963 CA GLU A 208 45.403 -17.443 13.188 1.00 40.25 C
我想计算两个α碳原子间的距离,即计算的第一和第二原子之间,然后和之间的第二和第三原子等的距离.....两个原子之间的距离可以被表示为: distance = sqrt((x1-x2)^2+(y1-y2)^2+(z1-z2)^2) .
列7,8和9表示在x,y和z坐标respectively.I需要打印的距离和相应的残余物对(第4栏),如下所示。(距离的值不是真实)
GLN-HIS 4.5
HIS-ASN 3.2
ASN-GLU 2.5
我怎样才能做到这一点的计算与Perl或Python?
Answer 1:
如果你的数据是由空格隔开,简单的split
可以做的工作。 缓冲线将它们相互比较顺序。
use strict;
use warnings;
my @line;
while (<>) {
push @line, $_; # add line to buffer
next if @line < 2; # skip unless buffer is full
print proc(@line), "\n"; # process and print
shift @line; # remove used line
}
sub proc {
my @a = split ' ', shift; # line 1
my @b = split ' ', shift; # line 2
my $x = ($a[6]-$b[6]); # calculate the diffs
my $y = ($a[7]-$b[7]);
my $z = ($a[8]-$b[8]);
my $dist = sprintf "%.1f", # format the number
sqrt($x**2+$y**2+$z**2); # do the calculation
return "$a[3]-$b[3]\t$dist"; # return the string for printing
}
输出(与样品数据):
GLN-HIS 3.8
HIS-ASN 3.8
ASN-GLU 3.9
GLU-VAL 3.8
如果您的数据是制表符分隔,可以拆分上/\t/
,而不是' '
。
Answer 2:
请在空格不拆
这里给出其他答案做一个有缺陷的假设 - 即坐标将是空间分隔的。 每的PDB规范ATOM
,这不是 necessarilly的情况下:记录PDB值由列索引指定,并且可以流入彼此。 例如,你的第一个ATOM
记录上写着:
ATOM 920 CA GLN A 203 39.292 -13.354 17.416 1.00 55.76 C
但是,这是完全合法的,以及:
ATOM 920 CA GLN A 203 39.292-13.3540 17.416 1.00 55.76 C
更好的办法
由于指定的列项指数,以及可能发生在PDB文件中的其他问题的数量,你不应该写自己的解析器。 在PDB格式是凌乱的,并且有很多的特殊情况和错误格式的文件来处理。 相反,使用已经为你写一个解析器。
我喜欢Biopython的PDB.PDBParser
。 它将分析结构为你为Python对象,完成与方便的功能。 如果你喜欢Perl中,检查出的BioPerl 。
PDB.Residue
对象允许通过名称原子键访问,并PDB.Atom
对象重载-
操作员两个原子之间的距离返回。 我们可以用它来写干净,简洁的代码:
码
from Bio import PDB
parser = PDB.PDBParser()
# Parse the structure into a PDB.Structure object
pdb_code = "1exm"
pdb_path = "pdb1exm.ent"
struct = parser.get_structure(pdb_code, pdb_path)
# Grab the first two residues from the structure
residues = struct.get_residues()
res_one = residues.next()
res_two = residues.next()
try:
alpha_dist = res_one['CA'] - res_two['CA']
except KeyError:
print "Alpha carbon missing, computing distance impossible!"
Answer 3:
假设你的数据在“atoms.txt”,这由线读取它的线并拆分项目分成列表:
import csv
with open("atoms.txt") as f:
reader = csv.reader(f)
for line, in reader:
entries = line.split()
print entries
现在,每个列表提取所需的列,并计算距离等(记住,Python中的列表是从零开始)。
Answer 4:
理论上,应该使用MDAnalysis包的“切片”原子和细分,并计算它们之间的距离测量的Python的方式。 事实上,MDAnalysis支持多种MD模拟和化学结构格式。
对于一个小更详细的例子,也看到了上Biostars.org以下条目 。
Answer 5:
如果你有兴趣在短短的一对,bash的工作就好了。 这是一个脚本,我使用,我有它设置为在年底重新启动(如果你想关闭这个功能)。 它会提示你哪些原子。 PDB文件可以有不同的列设置,因此对于AWK线确保列匹配。 用新的PDB文件在使用前做手工测试用例。 这是微不足道的,但在我的剧本PDB文件更改为你的。
#!/usr/bin/env bash
echo " "
echo "Use one letter abbreviations. Case doesn't matter."
echo "Example: A 17 CA or n 162 cg"
echo " - - - - - - - - - - - - - - - - - -"
#------------First Selection------------
read -e -p "What first atom? " sel1
# echo $sel1
sel1caps=${sel1^^}
# echo "sel1caps="$sel1caps
arr1=($sel1caps)
# echo "arr1[0]= "${arr1[0]}
# echo "arr1[1]= "${arr1[1]}
# echo "arr1[2]= "${arr1[2]}
#To convert one to three letters
if [ ${arr1[0]} = A ] ; then
SF1=ALA
elif [ ${arr1[0]} = H ] ; then
SF1=HIS
elif [ ${arr1[0]} = R ] ; then
SF1=ARG
elif [ ${arr1[0]} = K ] ; then
SF1=LYS
elif [ ${arr1[0]} = I ] ; then
SF1=ILE
elif [ ${arr1[0]} = F ] ; then
SF1=PHE
elif [ ${arr1[0]} = L ] ; then
SF1=LEU
elif [ ${arr1[0]} = W ] ; then
SF1=TRP
elif [ ${arr1[0]} = M ] ; then
SF1=MET
elif [ ${arr1[0]} = P ] ; then
SF1=PRO
elif [ ${arr1[0]} = C ] ; then
SF1=CYS
elif [ ${arr1[0]} = N ] ; then
SF1=ASN
elif [ ${arr1[0]} = V ] ; then
SF1=VAL
elif [ ${arr1[0]} = G ] ; then
SF1=GLY
elif [ ${arr1[0]} = S ] ; then
SF1=SER
elif [ ${arr1[0]} = Q ] ; then
SF1=GLN
elif [ ${arr1[0]} = Y ] ; then
SF1=TYR
elif [ ${arr1[0]} = D ] ; then
SF1=ASP
elif [ ${arr1[0]} = E ] ; then
SF1=GLU
elif [ ${arr1[0]} = T ] ; then
SF1=THR
else
echo "use one letter codes"
echo "exiting"
exit
fi
# echo "SF1 ="$SF1
#If there is nothing printing for line 1, check the expression for your pdb file. The spaces may need adjustment at the end.
line1=$(grep -E "${arr1[2]} *?${SF1}(.*?) ${arr1[1]} " 1A23.pdb)
# echo $line1
ar_l1=($line1)
# echo "ar_l1[1]="${ar_l1[1]}
echo " - - - - - - - - - - - - - - - - - -"
#------------Second Selection------------
read -e -p "What second atom? " sel2
# echo $sel2
sel2caps=${sel2^^}
# echo "sel2caps="$sel2caps
arr2=($sel2caps)
# echo "arr2[0]= "${arr2[0]}
# echo "arr2[1]= "${arr2[1]}
# echo "arr2[2]= "${arr2[2]}
#To convert one to three letters
if [ ${arr2[0]} = A ] ; then
SF2=ALA
elif [ ${arr2[0]} = H ] ; then
SF2=HIS
elif [ ${arr2[0]} = R ] ; then
SF2=ARG
elif [ ${arr2[0]} = K ] ; then
SF2=LYS
elif [ ${arr2[0]} = I ] ; then
SF2=ILE
elif [ ${arr2[0]} = F ] ; then
SF2=PHE
elif [ ${arr2[0]} = L ] ; then
SF2=LEU
elif [ ${arr2[0]} = W ] ; then
SF2=TRP
elif [ ${arr2[0]} = M ] ; then
SF2=MET
elif [ ${arr2[0]} = P ] ; then
SF2=PRO
elif [ ${arr2[0]} = C ] ; then
SF2=CYS
elif [ ${arr2[0]} = N ] ; then
SF2=ASN
elif [ ${arr2[0]} = V ] ; then
SF2=VAL
elif [ ${arr2[0]} = G ] ; then
SF2=GLY
elif [ ${arr2[0]} = S ] ; then
SF2=SER
elif [ ${arr2[0]} = Q ] ; then
SF2=GLN
elif [ ${arr2[0]} = Y ] ; then
SF2=TYR
elif [ ${arr2[0]} = D ] ; then
SF2=ASP
elif [ ${arr2[0]} = E ] ; then
SF2=GLU
elif [ ${arr2[0]} = T ] ; then
SF2=THR
else
echo "use one letter codes"
echo "exiting"
exit
fi
# echo "SF2 ="$SF2
line2=$(grep -E "${arr2[2]} *?${SF2}(.*?) ${arr2[1]} " 1A23.pdb)
# echo $line2
ar_l2=($line2)
# echo "ar_l2[1]="${ar_l2[1]}
# echo "ar_l2[1]="${ar_l2[1]}
atom1=${ar_l1[1]}
atom2=${ar_l2[1]}
echo "==========================="
echo ${arr1[0]}${arr1[1]}${arr1[2]}" to "${arr2[0]}${arr2[1]}${arr2[2]}":"
# 6, 7, 8 are column numbers in the pdb file.
# If there are multiple molecules it should be 7, 8, 9.
awk '$2=='$atom1'{x1=$7;y1=$8;z1=$9} # get the ATOM 1
$2=='$atom2'{x2=$7;y2=$8;z2=$9} # get the ATOM 2
END{print sqrt((x1-x2)*(x1-x2) + (y1-y2)*(y1-y2) + (z1-z2)*(z1-z2))}' 1A23.pdb # calculate the distance.
echo "Angstroms"
echo "==========================="
echo " "
echo "-_-_-_-_Running Script Again_-_-_-_-"
./distance_soln.sh
Answer 6:
一个简单的Python代码可以做的工作。 我假设你所有的内容都在文件“input.txt中”。
def process(line1, line2):
content1 = line1.split()
content2 = line2.split()
x1, y1, z1 = float(content1[6]), float(content1[7]), float(content1[8])
x2, y2, z2 = float(content2[6]), float(content2[7]), float(content2[8])
distance = math.sqrt(math.pow(x1-x2, 2) + math.pow(y1-y2, 2) + math.pow(z1-z2, 2))
return content1[3]+"-"+content2[3]+" "+ "{0:.2f}".format(distance)
with open("input.txt") as f:
line1 = f.readline()
for line in f:
line2 = line
print(process(line1, line2))
line1 = line2
请不要让我知道如果你发现在使用该脚本任何差异或问题。
文章来源: Calculating the distance between atomic coordinates