I would like to use the Python Faker library to generate 500 lines of data, however I get repeated data using the code I came up with below. Can you please point out where I'm going wrong. I believe it has something to do with the for loop. Thanks in advance:
from faker import Factory
import pandas as pd
import random
def create_fake_stuff(fake):
df = pd.DataFrame(columns=('name'
, 'email'
, 'bs'
, 'address'
, 'city'
, 'state'
, 'date_time'
, 'paragraph'
, 'Conrad'
,'randomdata'))
stuff = [fake.name()
, fake.email()
, fake.bs()
, fake.address()
, fake.city()
, fake.state()
, fake.date_time()
, fake.paragraph()
, fake.catch_phrase()
, random.randint(1000,2000)]
for i in range(10):
df.loc[i] = [item for item in stuff]
print(df)
if __name__ == '__main__':
fake = Factory.create()
create_fake_stuff(fake)
Following scripts can remarkably enhance the pandas performance.
It takes 5.55s.
Disclaimer: this answer is added much after the question and adds some new info not directly answering the question.
Now there is a fast new library Mimesis - Fake Data Generator.
faker
(see below my test of data similar to one in question).pip install mimesis
The same with developed earlier faker:
pip install faker
Below it my recent timing of Mimesis vs. Faker based on code provided in answer from forzer0eight:
CPU times: user 3.51 s, sys: 2.86 ms, total: 3.51 s Wall time: 3.51 s
CPU times: user 178 ms, sys: 1.7 ms, total: 180 ms Wall time: 179 ms
Below is resulting data for comparison:
I placed the fake stuff array inside my for loop to achieve the desired result: