Search for specific characters in specific positio

2019-09-22 10:47发布

I'm fairly new in linux world and i need your help. i need a code to search for specific characters in spcific positions in a text file. i.e

The file sequences.txt looks like this:

ACGTCAGTCAG**T**CAGCATC**G**ATCGACTACGACCGTAGCTAGCTATACGACT**G**ATCAGCTACGATCAGCTACGATCAGCTACGAT
ACGTCAGTCAG**A**CAGCATC**C**ATCGACCATGCTAGCCGTACGATTAGCGACT**C**ATCAGCTACGATCAGCTACGATCAGCTACGAT
ACGTCAGTCAG**T**CAGCATCATCGACTACGACTACGATCGATCGATCGGACT**G**ATCAGCTACGATCAGCTACGATCAGCTACGATG
ACGTCAGTCAG**A**CAGCATC**G**ATCGACTACGACGATCGATCGATCTACGACT**C**ATCAGCTACGATCAGCTACGATCAGCTACGAT

What i want is to split the dataset in different output files grouping the equal lines containing the same specific charactrs.

hope someone can help me, all the best

标签： linux shell

2条回答

我欲成王，谁敢阻挡

2楼-- · 2019-09-22 11:18

To search for "foo" at position 42:

egrep '^.{42}foo'

You can run a command like this multiple times on your input:

egrep '^.{42}foo' inputfile.txt > lineswithfoo.txt
egrep '^.{42}bar' inputfile.txt > lineswithbar.txt
...

or as a loop:

for pattern in foo bar qux; do
  egrep "^.{42}$pattern" inputfile.txt > lineswith$pattern.txt
done

0人赞添加讨论(0) 举报

男人必须洒脱

3楼-- · 2019-09-22 11:20

awks substring operations might be useful here. Something along these lines:

awk '{ x=substr($0, 42, 3); print > "output" x ".txt"}'

This would take the 3-character substring of each line starting at position 42 (0-based indexing, remember), and form an output file name "outputXYZ.txt" from that substring, and then append that line to it.

0人赞添加讨论(0) 举报

Search for specific characters in specific positio

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间