what is the proper way to backup/restore a mnesia

2019-03-15 06:23发布

问题:

WARNING: the background info is pretty long. Skip to the bottom if you think you need the question before the background info. Appreciate the time this is gonna take!

I've been all over the web (read google) and I have not found a good answer. YES, there are plenty of links and references to the Mnesia documentation on the erlang.org site but even those links suffer from version-itis.

So in the simplest case where the node() you are currently connected to is the same as the owner of the table set then the backup/restore is going to work. For example:

$ erl -sname mydatabase

> mnesia:start().
> mnesia:create_schema(...).
> mnesia:create_table(...).
> mnesia:backup("/tmp/backup.bup").
> mnesia:restore("/tmp/backup.bup", [{default_op, recreate_tables}]).

Hey this works great!

However, if the database is actually running on a remote node() or a remote node() on a remote mating then you have to initiate the backup this way:

$ erl -sname mydbadmin

> rpc:call(mydatabase@host, mnesia, backup, ["/tmp/backup.bup"]).
> rpc:call(mydatabase@host, mnesia, restore, ["/tmp/backup.bup", [{default_op, recreate_tables}]]).

Of course this was simple too. Now here are the tricky things....

  • Let's say that you are taking daily backups. And you mnesia database server dies and you are forced to replace the hardware. If you want to restore the DB as-is then you need to name the NEW hardware with the same name that it had previously and you also need to name the nodes the same.
  • if you want to change the name of the hardware and/or the node()... or you want to restore on a different machine, then you need to go through the node_change process. (described here and in the mnesia docs)

But here is where things get complicated. While acquaintances of mine, who are erlang and mnesia experts suggest that mnesia's replication is severely flawed and that you should not use it (there are currently no alternatives that I know of and what are the chances that you are going to implement better version; not likely)

So you have two nodes() that are replicating ram and disc based tables. You have been maintaining a policy of backing up the database regularly with the standard backup using the default BackupMod. And one day a manager asks you to verify the backups. Only when you attempt to restore the database you get:

{atomic,[]}

And according to the documentation this means that there were no errors... and yet no tables were restored.

Not wanting to run the change_node procedure you remember that the node() and hostname must match so you change the hostname and the -sname param to match the machine where the data was backed up. This time however you get a strange error:

{aborted,{'EXIT',{aborted,{bad_commit,{missing_lock,mydatabase@otherhost}}}}}

Still not wanting to run the change_node procedure I quickly clone restore my server so that I have two similar machines. I name then appropriately to match the production servers. And I begin the restore process. Eureka! I now have real working data on the restore servers.

I'd like to say that this was the end of the road... but I have not asked a question yet and that the point of SO.... so here it is?

QUESTION: if I want to restore a backup which was taken from a cluster of replicated mnesia nodes, how do I modify the file (similar to the change_node procedure) so that the other nodes are either ignored or removed from the backup?

Asked slightly differently: How do I restore a replicated-multi-node() mnesia database on a single node()?

回答1:

I think that this problem falls in the broader category of Mnesia questions that are related to a simple one:

How do I rename a Mnesia node?

The first and simplest solution, if your db is not huge, is to use the mnesia:traverse_backup function (see Mnesia User guide). Following is an example from the Mnesia User guide:

change_node_name(Mod, From, To, Source, Target) ->
    Switch =
        fun(Node) when Node == From -> To;
           (Node) when Node == To -> throw({error, already_exists});
           (Node) -> Node
        end,
    Convert =
        fun({schema, db_nodes, Nodes}, Acc) ->
                {[{schema, db_nodes, lists:map(Switch,Nodes)}], Acc};
           ({schema, version, Version}, Acc) ->
                {[{schema, version, Version}], Acc};
           ({schema, cookie, Cookie}, Acc) ->
                {[{schema, cookie, Cookie}], Acc};
           ({schema, Tab, CreateList}, Acc) ->
                Keys = [ram_copies, disc_copies, disc_only_copies],
                OptSwitch =
                    fun({Key, Val}) ->
                            case lists:member(Key, Keys) of
                                true -> {Key, lists:map(Switch, Val)};
                                false-> {Key, Val}
                            end
                    end,
                {[{schema, Tab, lists:map(OptSwitch, CreateList)}], Acc};
           (Other, Acc) ->
                {[Other], Acc}
        end,
    mnesia:traverse_backup(Source, Mod, Target, Mod, Convert, switched).

view(Source, Mod) ->
    View = fun(Item, Acc) ->
                   io:format("~p.~n",[Item]),
                   {[Item], Acc + 1}
           end,
    mnesia:traverse_backup(Source, Mod, dummy, read_only, View, 0).

The most important part here is the manipulation of the {schema, db_nodes, Nodes} tuple which let you rename or replace the db nodes.

BTW, I've used that function in the past and one thing I noticed is that the backup terms format changes between mnesia versions, but maybe it was simply me writing bad code. Just print a backup log for a small mnesia database to check backup term format, if you wanna be sure.

Hope this helps!



回答2:

I had a pretty hard time getting this to work, so I'll share the steps I went through.

Start by backing up the node in the distributed system that you want to restore (to a single node):

> mnesia:backup("/path/to/backup").

Make sure the following adaptation of change_node_name is available on the node you want to restore to:

-module(move_backup).
-export([set_node_name/4]).                                                                                                    

set_node_name(From, To, Source, Target) ->
    Switch =
        fun (Nodes) ->
                case lists:member(From, Nodes) of
                    true -> [To];
                    false -> []
                end
        end,
    Convert =
        fun({schema, db_nodes, Nodes}, Acc) ->
                {[{schema, db_nodes, Switch(Nodes)}], Acc};
           ({schema, version, Version}, Acc) ->
                {[{schema, version, Version}], Acc};
           ({schema, cookie, Cookie}, Acc) ->
                {[{schema, cookie, Cookie}], Acc};
           ({schema, Tab, CreateList}, Acc) ->
                Keys = [ram_copies, disc_copies, disc_only_copies],
                OptSwitch =
                    fun({Key, Val}) ->
                            case lists:member(Key, Keys) of
                                true -> {Key, Switch(Val)};
                                false-> {Key, Val}
                            end
                    end,
                {[{schema, Tab, lists:map(OptSwitch, CreateList)}], Acc};
           (Other, Acc) ->
                {[Other], Acc}
        end,
    mnesia:traverse_backup(Source, Target, Convert, switched).

Convert the backup:

> move_backup:set_node_name('before@host', 'after@host', "/path/to/backup", "/path_to_backup_converted").

I'm going to assume that the new node is completely empty (if this is not the case, you might want to change the default_op argument). There are two options, one for a live restore:

> mnesia:restore("/path/to/backup_converted", [{default_op, recreate_tables}]).

which is great but might use a lot of memory if you have a large database (mine was ~10GB so this caused an out of memory exception). The alternative is to install a fallback, and restart your shell:

> mnesia:install_fallback("/path/to/backup_converted").
> q().

then when you restart the shell (assuming you're using the right node name) it will import the full database.