I have a file with 450,000+ rows of entries. Each entry is about 7 characters in length. What I want to know is the unique characters of this file.
For instance, if my file were the following;
Entry ----- Yabba Dabba Doo
Then the result would be
Unique characters: {abdoy}
Notice I don't care about case and don't need to order the results. Something tells me this is very easy for the Linux folks to solve.
Update
I'm looking for a very fast solution. I really don't want to have to create code to loop over each entry, loop through each character...and so on. I'm looking for a nice script solution.
Update 2
By Fast, I mean fast to implement...not necessarily fast to run.
A very fast solution would be to make a small C program that reads its standard input, does the aggregation and spits out the result.
Why the arbitrary limitation that you need a "script" that does it?
What exactly is a script anyway?
Would Python do?
If so, then this is one solution:
in c++ i would first loop through the letters in the alphabet then run a strchr() on each with the file as a string. this will tell you if that letter exists, then just add it to the list.
Python w/sets (quick and dirty)
Python w/sets (with nicer output)
This answer above mentioned using a dictionary.
If so, the code presented there can be streamlined a bit, since the Python documentation states:
Therefore, this line of the code can be removed, since the dictionary keys will always be unique anyway:
And that should make it a little faster.
Print unique characters (ASCII and Unicode UTF-8)
Save as
unique.py
, and run aspython unique.py
.Quick and dirty C program that's blazingly fast:
Compile it, then do
To get a list of the unique printable characters in
file
.