从一个文件中读取字符串，在另一个文件中grep的第一次出现(Read string from one

我在读从文件中的字符串，appliances_list.txt。

appliances_list.txt包含

fridge
dryer
ironbox
microwave

我读的文件是myappliances.txt。内容

I have a fridge
I have another fridge
I have a refridgerator
I have a microwave
I have ironbox at home
I have another microwave
I have a hairdryer

我正在使用

grep -o -m1 -f appliances_list.txt myappliances.txt

输出

fridge

我的期望输出是，每个串的第一次出现（完全匹配）

fridge
microwave
ironbox

有人能指出我朝着正确的方向吗？

Answer 1:

$ cat tst.awk
NR==FNR { strings[$0]; ++numStrings; next }
{
    for (i=1;i<=NF;i++) {
        if ($i in strings) {
            print $i
            delete strings[$i]
            if (--numStrings == 0) {
                exit
            }
            break
        }
    }
}

$ awk -f tst.awk appliances_list.txt myappliances.txt
fridge
microwave
ironbox

因为它的发现，将是非常有效的，因为它会删除字符串列表中的每个发现的字符串，所以有较少的比较需要在每一行，当出现在列表中没有更多的字符串将退出程序，所以它不会浪费时间阅读第二个文件的剩余行。

Answer 2:

awk的解决方案：

awk 'NR==FNR{ a[$0]; next }{ 
              gsub(/<\/?[^<>]+>/,"",$0); for(i=1;i<=NF;i++) 
              if ($i in a && !a[$i]){ a[$i]++; print $i; break } 
    }' appliances_list.txt myappliances.txt

a[$0]; -从捕获图案词语appliances_list.txt
for(i=1;i<=NF;i++) - ，从行的通过领域迭代/词语myappliances.txt

输出：

fridge
microwave
ironbox

Answer 3:

修改代码如下。这将非常有效地工作（考虑你的文件大小为2GB）

while read -r appliance; do grep -m1 -ow $appliance myappliances.txt;done<appliances_list.txt

-w ：对完全相符的文字

输出：

fridge
ironbox
microwave

说明：
首先，在你的代码中， -m1导致一旦找到第一个匹配停止匹配，并停止读取文件导致程序退出。

你所要做的是叠代file1和每个词在它的grep在文件2，用你的逻辑。

其他的解决办法是：

使用头-1使用grep一起找到第一个匹配后停止搜索。

while read -r appliance; do grep -ow $appliance myappliances.txt | head -1; done<appliances_list.txt

Answer 4:

删除-m1和管道sort -u ：

grep -owf appliances_list.txt myappliances.txt | sort -u

sort -u将进行排序，然后独特的线条。如果排序是不希望的，你可能需要使用像AWK。 Perl或Python的。

请注意，只得到dryer ，而不是hairdryer ，你需要grep -w ，所以上面的建议有-o W上。

文章来源: Read string from one file, grep the first occurrence in another file