-->

Jena TDB , see how many triple stored during tdb c

2019-09-17 02:10发布

问题:

Hi is possible to see the number of triple in storing during tdb creation with java api? I run the TDB factory with a rar file in turtle , but during the creation of files in my directory i cant see how many triple it has stored. How can i solve this problem?

回答1:

You can access the bulk-loader through java code (to view triples introduced) as follows:

final Dataset tdbDataset = TDBFactory.createDataset( /*location*/ );
try( final InputStream in = /*get input stream for your large file*/) {
    TDBLoader.load( ((DatasetGraphTransaction)tdbDataset.asDatasetGraph()).getBaseDatasetGraph() , in, true);
}

If you have multiple files in your archive (for simplicity, I'll not do rar, but rather a zip), then as per an answer to this question, you can get optimized performance by concatenating the files into a single file prior to passing them to the bulk loader. The improved performance arises from delaying index creation until all triples have been introduced. I'm sure there are other formats that are supported, but I have only tested N-TRIPLES.

The following example utilizes IOUtils from commons-io for copying streams:

final Dataset tdbDataset = TDBFactory.createDataset( /*location*/ );
final PipedOutputStream concatOut = new PipedOutputStream();
final PipedInputStream concatIn = new PipedInputStream(concatOut);

final ExecutorService workers = Executors.newFixedThreadPool(2);
final Future<Long> submitter = workers.submit(new Callable<Long>(){
    @Override
    public Long call() throws Exception {
        long filesLoaded = 0;
        try( final ZipFile zipFile = new ZipFile( /* Archive Location */ ) {
            final Enumeration< ? extends ZipEntry> zipEntries = zipFile.entries();
            while( zipEntries.hasMoreElements() ) {
                final ZipEntry entry = zipEntries.nextElement();
                try( final InputStream singleIn = zipFile.getInputStream(entry) ) {
                    // If your file is in a supported format already
                    IOUtils.copy(singleIn, concatOut); 
                    /*(final Model m = ModelFactory.createDefaultModel();
                    m.read(singleIn, null, "lang");
                    m.write(concatOut, "N-TRIPLES");*/
                }
                filesLoaded++;
            }
        }
        concatOut.close();
        return filesLoaded;
    }});

final Future<Void> comitter = workers.submit(new Callable<Void>(){
    @Override
    public Void call() throws Exception {
        TDBLoader.load( ((DatasetGraphTransaction)tdbDataset.asDatasetGraph()).getBaseDatasetGraph() , concatIn, true);
        return null;
    }});

workers.shutdown();
System.out.println("submitted "+submitter.get()+" input files for processing");
comitter.get();
System.out.println("completed processing");
workers.awaitTermination(1, TimeUnit.SECONDS); // NOTE this wait is redundant


标签: jena tdb