从文件中提取字(extract words from a file)

我试图创建字的字典文件的集合。有没有一种简单的方法来打印所有词语的一个文件，每行一个？

Answer 1:

你可以使用grep ：

-E '\w+'搜索词
-o只打印匹配线的部分

% cat temp
Some examples use "The quick brown fox jumped over the lazy dog,"
rather than "Lorem ipsum dolor sit amet, consectetur adipiscing elit"
for example text.
# if you don't care whether words repeat
% grep -o -E '\w+' temp
Some
examples
use
The
quick
brown
fox
jumped
over
the
lazy
dog
rather
than
Lorem
ipsum
dolor
sit
amet
consectetur
adipiscing
elit
for
example
text

如果你想只打印一次每个字，无论情况下，你可以使用sort

-u只打印每个词一次
-f告诉sort比较词时忽略大小写

# if you only want each word once
% grep -o -E '\w+' temp | sort -u -f
adipiscing
amet
brown
consectetur
dog
dolor
elit
example
examples
for
fox
ipsum
jumped
lazy
Lorem
over
quick
rather
sit
Some
text
than
The
use

Answer 2:

一个良好的开端是简单地使用sed替换用换行的所有空间，剥离出空行（再次用sed ），则sort与-u （uniquify）标志删除重复，如在这个例子中：

$ echo "the quick brown dog and fox jumped
over the lazy   dog" | sed 's/ /\n/g' | sed '/^$/d' | sort -u

and
brown
dog
fox
jumped
lazy
over
quick
the

然后你就可以开始担心标点符号和喜欢。

Answer 3:

假设用空格分隔单词

awk '{for(i=1;i<=NF;i++)print $i}' file

要么

 tr ' ' "\n" < file

如果你想独特性：

awk '{for(i=1;i<=NF;i++)_[$i]++}END{for(i in _) print i}' file

tr ' ' "\n" < file | sort -u

一些标点符号去掉。

awk '{
    gsub(/["*^&()#@$,?~]/,"")
    for(i=1;i<=NF;i++){  _[$i]  }
}
END{    for(o in _){ print o }  }' file

Answer 4:

tr命令可以做到这一点？

tr [:blank:] '\n' < test.txt

此问TR计划用一个新行来代替空格。输出为标准输出，但它可能会被重定向到另一个文件，的Result.txt：

tr [:blank:] '\n' < test.txt > result.txt

请参考这里。

Answer 5:

肯教会的“UNIX（TM）的诗人”（PDF）描述的正是这种类型的应用程序-提取话说出来的文本文件，整理和对其计数等。

文章来源: extract words from a file

从文件中提取字(extract words from a file)

Answer 1:

Answer 2:

Answer 3:

Answer 4:

Answer 5:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮