How to write a simple database engine [closed]

2019-01-20 21:14发布

问题:

I am interested in learning how a database engine works (i.e. the internals of it). I know most of the basic data structures taught in CS (trees, hash tables, lists, etc.) as well as a pretty good understanding of compiler theory (and have implemented a very simple interpreter) but I don't understand how to go about writing a database engine. I have searched for tutorials on the subject and I couldn't find any, so I am hoping someone else can point me in the right direction. Basically, I would like information on the following:

  • How the data is stored internally (i.e. how tables are represented, etc.)
  • How the engine finds data that it needs (e.g. run a SELECT query)
  • How data is inserted in a way that is fast and efficient

And any other topics that may be relevant to this. It doesn't have to be an on-disk database - even an in-memory database is fine (if it is easier) because I just want to learn the principals behind it.

Many thanks for your help.

回答1:

If you're good at reading code, studying SQLite will teach you a whole boatload about database design. It's small, so it's easier to wrap your head around. But it's also professionally written.

http://sqlite.org/



回答2:

The answer to this question is a huge one. expect a PHD thesis to have it answered 100% ;) but we can think of the problems one by one:

  • How to store the data internally: you should have a data file containing your database objects and a caching mechanism to load the data in focus and some data around it into RAM assume you have a table, with some data, we would create a data format to convert this table into a binary file, by agreeing on the definition of a column delimiter and a row delimiter and make sure such pattern of delimiter is never used in your data itself. i.e. if you have selected <*> for example to separate columns, you should validate the data you are placing in this table not to contain this pattern. you could also use a row header and a column header by specifying size of row and some internal indexing number to speed up your search, and at the start of each column to have the length of this column like "Adam", 1, 11.1, "123 ABC Street POBox 456" you can have it like <&RowHeader, 1><&Col1,CHR, 4>Adam<&Col2, num,1,0>1<&Col3, Num,2,1>111<&Col4, CHR, 24>123 ABC Street POBox 456<&RowTrailer>

  • How to find items quickly try using hashing and indexing to point at data stored and cached based on different criteria taking same example above, you could sort the value of the first column and store it in a separate object pointing at row id of items sorted alphabetically, and so on

  • How to speed insert data I know from Oracle is that they insert data in a temporary place both in RAM and on disk and do housekeeping on periodic basis, the database engine is busy all the time optimizing its structure but in the same time we do not want to lose data in case of power failure of something like that. so try to keep data in this temporary place with no sorting, append your original storage, and later on when system is free resort your indexes and clear the temp area when done

good luck, great project.



回答3:

There are books on the topic a good place to start would be Database Systems: The Complete Book by Garcia-Molina, Ullman, and Widom



回答4:

SQLite was mentioned before, but I want to add some thing.

I personally learned a lot by studying SQlite. The interesting thing is, that I did not go to the source code (though I just had a short look). I learned much by reading the technical material and specially looking at the internal commands it generates. It has an own stack based interpreter inside and you can read the P-Code it generates internally just by using explain. Thus you can see how various constructs are translated to the low-level engine (that is surprisingly simple -- but that is also the secret of its stability and efficiency).



回答5:

I would suggest focusing on www.sqlite.org

It's recent, small (source code 1MB), open source (so you can figure it out for yourself)...

Books have been written about how it is implemented:

http://www.sqlite.org/books.html

It runs on a variety of operating systems for both desktop computers and mobile phones so experimenting is easy and learning about it will be useful right now and in the future.

It even has a decent community here: https://stackoverflow.com/questions/tagged/sqlite



回答6:

Okay, I have found a site which has some information on SQL and implementation - it is a bit hard to link to the page which lists all the tutorials, so I will link them one by one:

  • http://c2.com/cgi/wiki?CategoryPattern
  • http://c2.com/cgi/wiki?SliceResultVertically
  • http://c2.com/cgi/wiki?SqlMyopia
  • http://c2.com/cgi/wiki?SqlPattern
  • http://c2.com/cgi/wiki?StructuredQueryLanguage
  • http://c2.com/cgi/wiki?TemplateTables
  • http://c2.com/cgi/wiki?ThinkSqlAsConstraintSatisfaction


回答7:

may be you can learn from HSQLDB. I think they offers small and simple database for learning. you can look at the codes since it is open source.



回答8:

If MySQL interests you, I would also suggest this wiki page, which has got some information about how MySQL works. Also, you might want to take a look at Understanding MySQL Internals.

You might also consider looking at a non-SQL interface for your Database engine. Please take a look at Apache CouchDB. Its what you would call, a document oriented database system.

Good Luck!



回答9:

I am not sure whether it would fit to your requirements but I had implemented a simple file oriented database with support for simple (SELECT, INSERT , UPDATE ) using perl.
What I did was I stored each table as a file on disk and entries with a well defined pattern and manipulated the data using in built linux tools like awk and sed. for improving efficiency, frequently accessed data were cached.