How to extract specific columns from a space separ

I'm trying to process a file from the protein data bank which is separated by spaces (not \t). I have a .txt file and I want to extract specific rows and, from that rows, I want to extract only a few columns.

I need to do it in Python. I tried first with command line and used awk command with no problem, but I have no idea of how to do the same in Python.

Here is an extract of my file:

[...]
SEQRES   6 B   80  ALA LEU SER ILE LYS LYS ALA GLN THR PRO GLN GLN TRP          
SEQRES   7 B   80  LYS PRO                                                      
HELIX    1   1 THR A   68  SER A   81  1                                  14    
HELIX    2   2 CYS A   97  LEU A  110  1                                  14    
HELIX    3   3 ASN A  122  SER A  133  1                                  12    
[...]

For example, I'd like to take only the 'HELIX' rows and then the 4th, 6th, 7th and 9th columns. I started reading the file line by line with a for loop and then extracted those rows starting with 'HELIX'... and that's all.

EDIT: This is the code I have right now, but the print doesn't work properly, only prints the first line of each block (HELIX SHEET AND DBREF)

#!/usr/bin/python
import sys

for line in open(sys.argv[1]):
 if 'HELIX' in line:
   helix = line.split()
 elif 'SHEET'in line:
   sheet = line.split()
 elif 'DBREF' in line:
   dbref = line.split()

print (helix), (sheet), (dbref)

标签： python extract pdb

4条回答

三岁会撩人

2楼-- · 2019-02-16 01:48

If you already have extracted the line, you can split it using line.split(). This will give you a list, of which you can extract all the elements you need:

>>> test='HELIX 2 2 CYS A 97'
>>> test.split()
['HELIX', '2', '2', 'CYS', 'A', '97']
>>> test.split()[3]
'CYS'

0人赞添加讨论(0) 举报

兄弟一词,经得起流年.

3楼-- · 2019-02-16 01:51

you can expend the key words as you want. the result is list contained line with key words you can do further process of result to get what you want

with open("your file") as f:
     keyWords = ['HELIX','SHEET','DBREF']
     result = [ line  for line in f for key in keyWords if key in line]

0人赞添加讨论(0) 举报

\"骚年 ilove

4楼-- · 2019-02-16 01:52

Have a look at the CSV library. https://docs.python.org/2/library/csv.html The following code should do the trick

>>> import csv
>>> with open('my-file.txt', 'rb') as myfile:
...     spamreader = csv.reader(myfile, delimiter=' ', )
...     for row in spamreader:
...         print row[3]

0人赞添加讨论(0) 举报

SAY GOODBYE

5楼-- · 2019-02-16 01:54

Is there a reason you can't just use split?

for line in open('myfile'):
  if line.startswith('HELIX')
    cols = line.split(' ')
    process(cols[3], cols[5], cols[6], cols[8])

0人赞添加讨论(0) 举报

How to extract specific columns from a space separ

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间