I would like to extract chains from pdb files. I have a file named pdb.txt which contains pdb IDs as shown below. The first four characters represent PDB IDs and last character is the chain IDs.
1B68A
1BZ4B
4FUTA
I would like to 1) read the file line by line
2) download the atomic coordinates of each chain from the corresponding PDB files.
3) save the output to a folder.
I used the following script to extract chains. But this code prints only A chains from pdb files.
for i in 1B68 1BZ4 4FUT
do
wget -c "http://www.pdb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId="$i -O $i.pdb
grep ATOM $i.pdb | grep 'A' > $i\_A.pdb
done
It is probably a little late for asnwering this question, but I will give my opinion. Biopython has some really handy features that would help you achieve such a think easily. You could use something like a custom selection class and then call it for each one of the chains you want to select inside a for loop with the original pdb file.
The following BioPython code should suit your needs well.
It uses
PDB.Select
to only select the desired chains (in your case, one chain) andPDBIO()
to create a structure containing just the chain.One final note: don't write your own parser for PDB files. The format specification is ugly (really ugly), and the amount of faulty PDB files out there is staggering. Use a tool like BioPython that will handle parsing for you!
Furthermore, instead of using
wget
, you should use tools that interact with the PDB database for you. They take FTP connection limitations into account, the changing nature of the PDB database, and more. I should know - I updatedBio.PDBList
to account for changes in the database. =)Lets say you have the following file pdb_structures
Then have your code in load_pdb.sh
uncomment the last line if you don't need the original pdb's.
execute