After referring many blogs and articles, I have reached at the following code for searching for a string in all files inside a folder. It is working fine in my tests.
QUESTIONS
- Is there a faster approach for this (using C#)?
- Is there any scenario that will fail with this code?
Note: I tested with very small files. Also very few number of files.
CODE
static void Main()
{
string sourceFolder = @"C:\Test";
string searchWord = ".class1";
List<string> allFiles = new List<string>();
AddFileNamesToList(sourceFolder, allFiles);
foreach (string fileName in allFiles)
{
string contents = File.ReadAllText(fileName);
if (contents.Contains(searchWord))
{
Console.WriteLine(fileName);
}
}
Console.WriteLine(" ");
System.Console.ReadKey();
}
public static void AddFileNamesToList(string sourceDir, List<string> allFiles)
{
string[] fileEntries = Directory.GetFiles(sourceDir);
foreach (string fileName in fileEntries)
{
allFiles.Add(fileName);
}
//Recursion
string[] subdirectoryEntries = Directory.GetDirectories(sourceDir);
foreach (string item in subdirectoryEntries)
{
// Avoid "reparse points"
if ((File.GetAttributes(item) & FileAttributes.ReparsePoint) != FileAttributes.ReparsePoint)
{
AddFileNamesToList(item, allFiles);
}
}
}
REFERENCE
- Using StreamReader to check if a file contains a string
- Splitting a String with two criteria
- C# detect folder junctions in a path
- Detect Symbolic Links, Junction Points, Mount Points and Hard Links
- FolderBrowserDialog SelectedPath with reparse points
- C# - High Quality Byte Array Conversion of Images
I wrote somthing very similar, a couple of changes I would recommend.
I was creating a binary search tool, here is some snippets of what I wrote to give you a hand
This uses two nested parallel loops. This design is terribly inefficient, and could be greatly improved by using the Booyer-Moore search algorithm but I could not find a binary implementation and I did not have the time when I wrote it originally to implement it myself.
Instead of
Contains
better use algorithm Boyer-Moore search.Fail scenario: file have not read permission.
the main problem here is that you are searching all the files in real time for every search. there is also the possibility of file access conflicts if 2+ users are searching at the same time.
to dramtically improve performance I would index the files ahead of time, and as they are edited/saved. store the indexed using something like lucene.net and then query the index (again using luence.net) and return the file names to the user. so the user never queries the files directly.
if you follow the links in this SO Post you may have a head start on implementing the indexing. I didn't follow the links, but it's worth a look.
Just a heads up, this will be an intense shift from your current approach and will require
I think your code will fail with an exception if you lack
permission to open a file
.Compare it with the code here: http://bgrep.codeplex.com/releases/view/36186
That latter code supports
-- things you should probably consider.
Instead of File.ReadAllText() better use
It returns
IEnumerable
(yielded) so you will not have to read the whole file if your string is found before the last line of the text file is reached