How can I create and access an index to go in a sp

2019-09-15 23:06发布

问题:

I have this large file with the follow format:

Unique String \t Information

In my program I need to read this file to get the Information through the Unique String key. Since the performance is important, I can't read each line looking for the key everytime, besides I can't load the file in memory because it is too heavy. Then I'd like to read the file only once and then build an index with the String key and the position(in byte) of that in file. This index is something like a HashMap with the key been the Unique String and the value been the bytes in file where the key appears.

Seems that RandomAccessFile could do this, but I don't know how.

So, how can I build this index and then access an specific line by this index?

回答1:

The way I am going to suggest is to read the file, and keep track of the position. Store the position along the way in a map so you can look it up later.

The first way to do this is to use your file as a DataInput, and use the RandomAccessFile#readline

RandomAccessFile raf = new RandomAccessFile("filename.txt", "r");
Map<String, Long> index = new HashMap<>();

Now, how is your data stored? If it is stored line by line, and the ecoding conforms to the DataInput standards, then you can use.

long start = raf.getFilePointer();
String line = raf.readLine();
String key = extractKeyFromLine(line);
index.put(key, start);

Now anytime you need to go back and get the data.

long position = index.get(key);
raf.seek(position);
String line = raf.readLine();

Here is a complete example:

package helloworld;

import java.io.IOException;
import java.io.RandomAccessFile;
import java.util.HashMap;
import java.util.Map;

/**
 * Created by matt on 07/02/2017.
 */
public class IndexedFileAccess {
    static String getKey(String line){
        return line.split(":")[0];
    }
    public static void main(String[] args) throws IOException {
        Map<String, Long> index = new HashMap<>();
        RandomAccessFile file = new RandomAccessFile("junk.txt", "r");
        //populate index and read file.
        String s;
        do{
            long start = file.getFilePointer();
            s = file.readLine();
            if(s!=null){
                String key = getKey(s);
                index.put(key, start);
            }
        }while(s!=null);

        for(String key: index.keySet()){
            System.out.printf("key %s has a pos of %s\n", key, index.get(key));
            file.seek(index.get(key));
            System.out.println(file.readLine());
        }
        file.close();

    }
}

junk.txt contains:

dog:1, 2, 3
cat:4, 5, 6
zebra: p, z, t

Finally the output is:

key zebra has a pos of 24
zebra: p, z, t
key cat has a pos of 12
cat:4, 5, 6
key dog has a pos of 0
dog:1, 2, 3

There are many caveats to this. For example, if you need a more robust encoding, then the first time you read it you'll want to create a reader that can manage the encoding, and just use your RandomAccessFile as an input stream. The readLine() method will fail if the lines are too large. Then you would have to devise your own strategy for extracting the key/data pair.



回答2:

I need to read this file to get the Information through the Unique String key.

With respect to above question of yours, you have to read file line by line, split the read string using split() and put the values in Map as follows,

try {
  FileReader fileReader = new FileReader(fileName);

  BufferedReader bufferedReader = new BufferedReader(fileReader);

  Map<String, int> map = new HashMap<String, int>();
  int byte = 0;

  while((line = bufferedReader.readLine()) != null) {

           String arr[] = line.split("\t");  //make sure your file conatins data as you specified.
           map.put(arr[0], byte);

           byte += line.length() + 1;

  }   

  bufferedReader.close();         
 }
 catch(Exception ex) {
            System.out.println("unable to open file '" + fileName + "'");                
 }

Now you can access any information when you have specificString as follows,

 map.get("specificString"); // will return corresponding information as int type.


标签: java indexing