Find Unique Characters in a File-第4页回答

I have a file with 450,000+ rows of entries. Each entry is about 7 characters in length. What I want to know is the unique characters of this file.

For instance, if my file were the following;

Entry
-----
Yabba
Dabba
Doo

Then the result would be

Unique characters: {abdoy}

Notice I don't care about case and don't need to order the results. Something tells me this is very easy for the Linux folks to solve.

Update

I'm looking for a very fast solution. I really don't want to have to create code to loop over each entry, loop through each character...and so on. I'm looking for a nice script solution.

Update 2

By Fast, I mean fast to implement...not necessarily fast to run.

标签： search parsing scripting

22条回答

你好瞎i

2楼-- · 2020-06-01 02:04

Here's a PowerShell example:

gc file.txt | select -Skip 2 | % { $_.ToCharArray() } | sort -CaseSensitive -Unique

which produces:

D
Y
a
b
o

I like that it's easy to read.

EDIT: Here's a faster version:

$letters = @{} ; gc file.txt | select -Skip 2 | % { $_.ToCharArray() } | % { $letters[$_] = $true } ; $letters.Keys

0人赞添加讨论(0) 举报

你好瞎i

3楼-- · 2020-06-01 02:05

Try this file with JSDB Javascript (includes the javascript engine in the Firefox browser):

var seenAlreadyMap={};
var seenAlreadyArray=[];
while (!system.stdin.eof)
{
  var L = system.stdin.readLine();
  for (var i = L.length; i-- > 0; )
  {
    var c = L[i].toLowerCase();
    if (!(c in seenAlreadyMap))
    {
      seenAlreadyMap[c] = true;
      seenAlreadyArray.push(c);
    }
  }
}
system.stdout.writeln(seenAlreadyArray.sort().join(''));

0人赞添加讨论(0) 举报

劳资没心，怎么记你

4楼-- · 2020-06-01 02:05

Use a set data structure. Most programming languages / standard libraries come with one flavour or another. If they don't, use a hash table (or generally, dictionary) implementation and just omit the value field. Use your characters as keys. These data structures generally filter out duplicate entries (hence the name set, from its mathematical usage: sets don't have a particular order and only unique values).

0人赞添加讨论(0) 举报

三岁会撩人

5楼-- · 2020-06-01 02:07

s=open("text.txt","r").read()
l= len(s)
unique ={}
for i in range(l):
 if unique.has_key(s[i]):
  unique[s[i]]=unique[s[i]]+1
 else:
  unique[s[i]]=1
print unique

0人赞添加讨论(0) 举报

上一页 1 2 3 4

Find Unique Characters in a File

Update

Update 2

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间