how to retrain the model for sequence of files in

2019-09-02 19:53发布

问题:

I am trying to run the vowpal wabbit on a set of files(approximately 10 as of now). My experiment is as follows:

  1. Convert the first train file to VW format

  2. Train the VW model with this first training file and store the model.

  3. Validate the accuracy on the test file with stored model

  4. Now take the second file convert it to VW format and retrain the model stored in step 2 with this second file and store the updated model

  5. Validate the test file on retrained model and report the accuracy.

  6. Repeat steps 4-5 for remaining set of files using for loop(test file is same in each iteration)

When I did this experiment I got some error. Here I am pasting train, retrain and validation commands as well error.

Can any of you please helps me in reproducing this scenario without getting any error.

Commands:

here 'i' is ranging from 1 to 10

$idec = i -1(index of previous model)

vw -d ${i}_processed_binary_compressed.vw --loss_function logistic -i ${idec}_processed_binary_compressed.model.vw --quiet --save_resume -f ${i}_processed_binary_compressed.model.vw

echo echo "Model trainiing completed for day_$i"

echo "${i}_day model validation is under progress..." echo

vw 10_processed_binary_compressed_test.vw -t -i ${i}_processed_binary_compressed.model.vw --quiet --hash strings -p 10_processed_binary_compressed_test_${i}_day_result.csv -r 10_processed_binary_compressed_test_${i}_day_raw.txt

error:

vw: option '--data' cannot be specified more than once

回答1:

I cannot replicate the problem (but TOC_cmi asked to paste the commonads I used):

git clone https://github.com/JohnLangford/vowpal_wabbit.git
cd vowpal_wabbit
make
cd test/train-sets

vw -d rcv1_smaller.dat --loss_function=logistic --save_resume -f day1.model
vw -d rcv1_small.dat --loss_function=logistic --save_resume -i day1.model -f day2.model
vw -t -d rcv1_smaller.dat --loss_function=logistic -i day2.model -p day2.predictions -r day2.raw