Given an input file of text lines, I want duplicate lines to be identified and removed. Please show a simple snippet of C# that accomplishes this.
相关问题
- Sorting 3 numbers without branching [closed]
- Graphics.DrawImage() - Throws out of memory except
- Why am I getting UnauthorizedAccessException on th
- 求获取指定qq 资料的方法
- How to know full paths to DLL's from .csproj f
Here's a streaming approach that should incur less overhead than reading all unique strings into memory.
For a long file (and non consecutive duplications) I'd copy the files line by line building a hash // position lookup table as I went.
As each line is copied check for the hashed value, if there is a collision double check that the line is the same and move to the next. (
Only worth it for fairly large files though.
This should do (and will copy with large files).
Note that it only removes duplicate consecutive lines, i.e.
will end up as
If you want no duplicates anywhere, you'll need to keep a set of lines you've already seen.
Note that this assumes
Encoding.UTF8
, and that you want to use files. It's easy to generalize as a method though:(Note that that doesn't close anything - the caller should do that.)
Here's a version that will remove all duplicates, rather than just consecutive ones:
I am new to .net & have written something more simpler,may not be very efficient.Please fill free to share your thoughts.
For small files: