Multithread file reading

2019-08-11 15:59发布

I'm trying to write code in Java to read a file by couple of threads and count the words in them. Each thread should read different lines. It counts words well (when I let 1 thread run) but my threads are reading same line and increments line counter at the same time. I was sure that the synchronized keyword in read method will fix it,but it didn't. What should I do to fix it?

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.*;
import java.util.concurrent.atomic.AtomicInteger;


public class WordCounterr implements Runnable {
    private static Hashtable<String, Integer> ht = new Hashtable<String, Integer>();
    private int lineCounter;
    private String path;
    private int tNumber;
    //private final AtomicInteger whichLine = new AtomicInteger();
    private static int whichLine;
    private static boolean flag;

    public WordCounterr(String path,int num){
        lineCounter = 0;
        //whichLine = 0;
        flag= false;
        this.path=path;
        tNumber = num;
    }

    public void countWords(String s) throws IOException{
        char[] c = s.toCharArray();
        String str="";  
        char ch;        
        for(int k=0;k<c.length;k++){                        

            ch=c[k];                    
            if((ch>40 && ch<91) ||(ch>96 && ch<123)){       
                if(ch>40 && ch<91)
                    ch+=32;             
                str+=ch;
            }           
            else if(ch==32 ||k==c.length-1){
                if(str.length()>1){ //sprawdzamy czy funkcja znalazla juz 
                    if(ht.containsKey(str))     //takie slowo               
                        ht.put(str,ht.get(str)+1); //znalazla - powiekszamy wartosc przy kluczu
                    else
                        ht.put(str,1);  //nie znalazla - dodajemy slowo do Hashtable            

                }
                str="";
            }
        }
    }

    public synchronized void read(String path) throws IOException{  
        BufferedReader buf=new BufferedReader(new FileReader(path));

        String linia ;
        for(int i=0;i<whichLine;i++){
            linia=buf.readLine();
        }

        if((linia=buf.readLine())!=null){
            System.out.println(linia);
            countWords(linia);
            lineCounter++;
            System.out.println("watek nr:"+tNumber+"ktora linia:"+whichLine);               
            whichLine++;
            /*try{
                    Thread.sleep(100);

                }catch(InterruptedException el){
                    System.out.println(el.toString());
                }*/
        } else
            setFlag(true);

        buf.close();    //pamietamy o zamknieciu pliku

    }

    public synchronized void print(){
        if(getFlag()){
            setFlag(false);         
            System.out.println(ht);
        }   
        System.out.println("watek nr: "+tNumber+", przeanalizowano "+ lineCounter+ "linii tekstu");
    }

    public void setFlag(boolean val){
        flag=val;
    }

    public boolean getFlag(){
        return flag;
    }

    @Override
    public void run() {
        try{    

            while(getFlag()==false) {   
                read(path);
                Thread.yield(); //let other thread read
                try {
                    Thread.sleep(100);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }
        }catch(IOException ex){
            System.out.println(ex.toString());
        }//catch(InterruptedException el){
        //  System.out.println(el.toString());
        //}     
        print();
    }   

    public static void main(String[] args) throws IOException, InterruptedException{
        String path = args[0];
        int tNum = Integer.parseInt(args[1]);

        Thread[] thread = new Thread[tNum]; // tablica w?tków
        for (int i = 0; i < tNum; i++){
            thread[i] =new Thread(new WordCounterr(path,i));
        }   

        for (int i = 0; i < tNum; i++) 
            thread[i].start();
        }
}

2条回答
手持菜刀,她持情操
2楼-- · 2019-08-11 16:11

I am guessing that it still will be inefficiently reading file content. Try change the synchronization point. It should be to placed in read method. This method reads whole file content. Rather that try synchronized just reading next line of this file. You can achieve it by putting to each WordCounterr instance the same reader file instance and synchronized only process of moving pointer to next line read content of this line. Counting words in the line can be done without synchronization and only updating HashTable should be synchronized. Reading file content in parallel can be synchronized as below:

static class Reader implements Runnable {
    int lineReaded = 0;
    final Scanner scanner;

    Reader(Scanner scanner) {
        this.scanner = scanner;
    }

    public void run() {
        boolean hasNext = true;
        while (hasNext) {
            hasNext = false;
            synchronized (scanner) {
                if (scanner.hasNext()) {
                    hasNext = true;
                    String line = scanner.nextLine();
                    ++lineReaded;
                }
            }
            try {
                Thread.sleep((long) (Math.random() * 100));
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
    }
}
查看更多
女痞
3楼-- · 2019-08-11 16:23

The synchronized modifier is defined so: it is not possible for two invocations of synchronized methods on the same object to interleave.

You are calling the method read in each of your Threads.

However you are not calling the same read method because you are passing new instances of WordCounterr to each new Thread. This means you are calling the method on different objects which will not be effected by the synchronized modifier.

To fix this try:

WordCounterr reader = new WordCounterr(path,0); //I changed i to 0 because it can't differentiate between threads with a simple int. This is because each Thread now references the same object.
Thread[] thread = new Thread[tNum]; // tablica w?tków
for (int i = 0; i < tNum; i++){
    thread[i] =new Thread(reader);
} 

Rather than:

Thread[] thread = new Thread[tNum]; // tablica w?tków
for (int i = 0; i < tNum; i++){
    thread[i] =new Thread(new WordCounterr(path,i));
} 

I hope this helps :)

查看更多
登录 后发表回答