Search for specific characters in specific positio

2019-09-22 10:47发布

I'm fairly new in linux world and i need your help. i need a code to search for specific characters in spcific positions in a text file. i.e

The file sequences.txt looks like this:

ACGTCAGTCAG**T**CAGCATC**G**ATCGACTACGACCGTAGCTAGCTATACGACT**G**ATCAGCTACGATCAGCTACGATCAGCTACGAT
ACGTCAGTCAG**A**CAGCATC**C**ATCGACCATGCTAGCCGTACGATTAGCGACT**C**ATCAGCTACGATCAGCTACGATCAGCTACGAT
ACGTCAGTCAG**T**CAGCATCATCGACTACGACTACGATCGATCGATCGGACT**G**ATCAGCTACGATCAGCTACGATCAGCTACGATG
ACGTCAGTCAG**A**CAGCATC**G**ATCGACTACGACGATCGATCGATCTACGACT**C**ATCAGCTACGATCAGCTACGATCAGCTACGAT

What i want is to split the dataset in different output files grouping the equal lines containing the same specific charactrs.

hope someone can help me, all the best

标签: linux shell
2条回答
我欲成王,谁敢阻挡
2楼-- · 2019-09-22 11:18

To search for "foo" at position 42:

egrep '^.{42}foo'

You can run a command like this multiple times on your input:

egrep '^.{42}foo' inputfile.txt > lineswithfoo.txt
egrep '^.{42}bar' inputfile.txt > lineswithbar.txt
...

or as a loop:

for pattern in foo bar qux; do
  egrep "^.{42}$pattern" inputfile.txt > lineswith$pattern.txt
done
查看更多
男人必须洒脱
3楼-- · 2019-09-22 11:20

awks substring operations might be useful here. Something along these lines:

awk '{ x=substr($0, 42, 3); print > "output" x ".txt"}'

This would take the 3-character substring of each line starting at position 42 (0-based indexing, remember), and form an output file name "outputXYZ.txt" from that substring, and then append that line to it.

查看更多
登录 后发表回答