I would like to play with a larger database to test my knowledge on SQL.
Is there a huge .sql out there that I can use to play with SQL queries?
I would like to play with a larger database to test my knowledge on SQL.
Is there a huge .sql out there that I can use to play with SQL queries?
You could try the classic MySQL world database.
The world.sql file is available for download here:
http://dev.mysql.com/doc/index-other.html
Just scroll down to Example Databases and you will find it.
This is an online database but you can try with the stackoverflow database: http://data.stackexchange.com/stackoverflow/query/new
You also can download its dumps here:
https://archive.org/download/stackexchange
Check out CodePlex for Microsoft SQL Server Community Projects & Samples
On top of the link above you might look at
Why not download the English Wikipedia? There are compressed SQL files of various sizes, and it should certainly be large enough for you
The main articles are XML, so inserting them into the db is a bit more of a problem, but you might find there are other files there that suit you. For example, the inter-page links SQL file is 2.3GB compressed. Have a look at https://en.wikipedia.org/wiki/Wikipedia:Database_download for more info.
Oskar
This is what I am using for learning sql: employees-db
this is a sample database with an integrated test suite, used to test your applications and database servers
According to launchpad.net the database has moved to github.
The database contains about 300,000 employee records with 2.8 million salary entries. The export data is 167 MB, which is not huge, but heavy enough to be non-trivial for testing.
The data was generated, and as such there are inconsistencies and subtle problems. Rather than removing them, we decided to leave the contents untouched, and use these issues as data cleaning exercises.
If you want a big database of real data to play with, you could sign up for the Netflix Prize contest and get access to their data, which is pretty large (a few gigs of entries).
The URL above does not contain the dataset anylonger (october 2016). The wikipedia page about the Netflix Prize reports that a law suit was settled regarding privacy concerns.
You want huge?
Here's a small table: create table foo (id int not null primary key auto_increment, crap char(2000));
insert into foo(crap) values ('');
-- each time you run the next line, the number of rows in foo doubles. insert into foo( crap ) select * from foo;
run it twenty more times, you have over a million rows to play with.
Yes, if he's looking for looks of relations to navigate, this is not the answer. But if by huge he means to test performance and his ability to optimize, this will do it. I did exactly this (and then updated with random values) to test an potential answer I had for another question. (And didn't answer it, because I couldn't come up with better performance than what that asker had.)
Had he asked for "complex", I'd have gien a differnt answer. To me,"huge" implies "lots of rows".
Because you don't need huge to play with tables and relations. Consider a table, by itself, with no nullable columns. How many different kinds of rows can there be? Only one, as all columns must have some value as none can be null.
Every nullable column multiples by two the number of different kinds of rows possible: a row where that column is null, an row where it isn't null.
Now consider the table, not in isolation. Consider a table that is a child table: for every child that has an FK to the parent, that, is a many-to-one, there can be 0, 1 or many children. So we multiply by three times the count we got in the previous step (no rows for zero, one for exactly one, two rows for many). For any grandparent to which the parent is a many, another three.
For many-to-many relations, we can have have no relation, a one-to-one, a one-to-many, many-to-one, or a many-to-many. So for each many-to-many we can reach in a graph from the table, we multiply the rows by nine -- or just like two one-to manys. If the many-to-many also has data, we multiply by the nullability number.
Tables that we can't reach in our graph -- those that we have no direct or indirect FK to, don't multiply the rows in our table.
By recursively multiplying the each table we can reach, we can come up with the number of rows needed to provide one of each "kind", and we need no more than those to test every possible relation in our schema. And we're nowhere near huge.