I have a file with 450,000+ rows of entries. Each entry is about 7 characters in length. What I want to know is the unique characters of this file.
For instance, if my file were the following;
Entry ----- Yabba Dabba Doo
Then the result would be
Unique characters: {abdoy}
Notice I don't care about case and don't need to order the results. Something tells me this is very easy for the Linux folks to solve.
Update
I'm looking for a very fast solution. I really don't want to have to create code to loop over each entry, loop through each character...and so on. I'm looking for a nice script solution.
Update 2
By Fast, I mean fast to implement...not necessarily fast to run.
Here's a PowerShell example:
which produces:
I like that it's easy to read.
EDIT: Here's a faster version:
Try this file with JSDB Javascript (includes the javascript engine in the Firefox browser):
Use a
set
data structure. Most programming languages / standard libraries come with one flavour or another. If they don't, use a hash table (or generally, dictionary) implementation and just omit the value field. Use your characters as keys. These data structures generally filter out duplicate entries (hence the nameset
, from its mathematical usage: sets don't have a particular order and only unique values).