Find Unique Characters in a File

2020-06-01 01:33发布

I have a file with 450,000+ rows of entries. Each entry is about 7 characters in length. What I want to know is the unique characters of this file.

For instance, if my file were the following;

Entry
-----
Yabba
Dabba
Doo

Then the result would be

Unique characters: {abdoy}

Notice I don't care about case and don't need to order the results. Something tells me this is very easy for the Linux folks to solve.

Update

I'm looking for a very fast solution. I really don't want to have to create code to loop over each entry, loop through each character...and so on. I'm looking for a nice script solution.

Update 2

By Fast, I mean fast to implement...not necessarily fast to run.

22条回答
你好瞎i
2楼-- · 2020-06-01 02:04

Here's a PowerShell example:

gc file.txt | select -Skip 2 | % { $_.ToCharArray() } | sort -CaseSensitive -Unique

which produces:

D
Y
a
b
o

I like that it's easy to read.

EDIT: Here's a faster version:

$letters = @{} ; gc file.txt | select -Skip 2 | % { $_.ToCharArray() } | % { $letters[$_] = $true } ; $letters.Keys
查看更多
你好瞎i
3楼-- · 2020-06-01 02:05

Try this file with JSDB Javascript (includes the javascript engine in the Firefox browser):

var seenAlreadyMap={};
var seenAlreadyArray=[];
while (!system.stdin.eof)
{
  var L = system.stdin.readLine();
  for (var i = L.length; i-- > 0; )
  {
    var c = L[i].toLowerCase();
    if (!(c in seenAlreadyMap))
    {
      seenAlreadyMap[c] = true;
      seenAlreadyArray.push(c);
    }
  }
}
system.stdout.writeln(seenAlreadyArray.sort().join(''));
查看更多
劳资没心,怎么记你
4楼-- · 2020-06-01 02:05

Use a set data structure. Most programming languages / standard libraries come with one flavour or another. If they don't, use a hash table (or generally, dictionary) implementation and just omit the value field. Use your characters as keys. These data structures generally filter out duplicate entries (hence the name set, from its mathematical usage: sets don't have a particular order and only unique values).

查看更多
三岁会撩人
5楼-- · 2020-06-01 02:07
s=open("text.txt","r").read()
l= len(s)
unique ={}
for i in range(l):
 if unique.has_key(s[i]):
  unique[s[i]]=unique[s[i]]+1
 else:
  unique[s[i]]=1
print unique
查看更多
登录 后发表回答