Create a random txt file of approximate size in Ja

2019-09-19 16:38发布

问题:

OK, I have found the following code somewhere that generate a random txt file. Basically I want random words separated by some whitespace in order to run MapReduce word counting simulations.

import java.io.IOException;
import java.io.PrintWriter;
import java.util.Random;

public class MainClass {


    public static void main(String[] args) {
        // TODO Auto-generated method stub


        try{
            PrintWriter writer = new PrintWriter("bigfile.txt", "UTF-8");


            Random random = new Random();
            for(int i = 0; i < 23695522; i++)
            {           
                char[] word = new char[random.nextInt(8)+3]; // words of length 3 through 10. (1 and 2 letter words are boring.)
                for(int j = 0; j < word.length; j++)
                {
                    word[j] = (char)('a' + random.nextInt(26));
                }
                writer.print(new String(word) + ' ');

                if (i % 10 == 0){
                    writer.println();
                }
            }


            writer.close();
        } catch (IOException e) {
           // do something
        }

    }

}

Now I want to alter this code a bit in order to have as much iterations as needed for the file to have approximately a predefined size. So, every iteration produces about 6.5 characters (due to uniform selection) each of 2 bytes. So, I divide the size of file I want in bytes by (6.5*2), set the result as the number of for loop iteration and get a file much smaller than I expect it to be.

回答1:

import java.io.File;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.Random;

public class MainClass {


public static void main(String[] args) {
    // TODO Auto-generated method stub

    long count=0;
    try{

        File file = new File("bigfile.txt");
        PrintWriter writer = new PrintWriter(file, "UTF-8");


        Random random = new Random();
        for(int i = 0; i < 23695522; i++)
        {           
            char[] word = new char[random.nextInt(8)+3]; // words of length 3 through 10. (1 and 2 letter words are boring.)
            count+=word.length;
            for(int j = 0; j < word.length; j++)
            {
                word[j] = (char)('a' + random.nextInt(26));

            }
            writer.print(new String(word) + ' ');
            count+=1;
            if (i % 10 == 0){
                writer.println();
                count+=2;

            }
        }


        writer.close();
    } catch (IOException e) {
       // do something
    }




    System.out.println(count);

}

}

Try this one. Newline char is 2 byte and the others are 1 byte.



回答2:

how about counting bytes and loop until you get the right amount of bytes?

int writtenBytes = 0;
do{
    String randomWords = ....;
    writtenBytes += randomWords.getBytes(StandardCharsets.UTF_8).length;
    writer.print(randomWords);
}while(writtenBytes < 123456);


标签: java text char