Importing Entities Into Local GCP Datastore Emulat

2020-02-15 05:35发布

问题:

I was able to export entities into a storage bucket without much difficulty with this command:

gcloud datastore export --kinds="KIND1,KIND2" --namespaces="NAMESPACE1,NAMESPACE2" gs://${BUCKET}

And according to the docs importing can be done like this:

gcloud datastore import gs://${BUCKET}/[PATH]/[FILE].overall_export_metadata

or like this:

curl \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://datastore.googleapis.com/v1/projects/${PROJECT_ID}:import \
-d '{
"inputUrl": "gs://'${BUCKET}'/[PATH]/[FILE].overall_export_metadata",
}'

My datastore emulator is running on localhost:8081 is there anyway I can use this curl command to import the data to the emulator? There's nothing in the docs about it and I've tried guessing many urls but nothing works.

If this is impossible, is there another way I can fill my local emulator or better yet connect the local app engine to the production datastore?

Apparently there used to be a way to export and import using csv files:

Google cloud datastore emulator init data

but that has since been deprecated.

回答1:

The Datastore Emulator now supports import and export:

Import:

curl -X POST localhost:8081/v1/projects/[PROJECT_ID]:import \
-H 'Content-Type: application/json' \
-d '{"input_url":"[ENTITY_EXPORT_FILES]"}'

Export:

curl -X POST localhost:8081/v1/projects/[PROJECT_ID]:export \
-H 'Content-Type: application/json' \
-d '{"output_url_prefix":"EXPORT_DIRECTORY"}'

https://cloud.google.com/datastore/docs/tools/emulator-export-import



回答2:

Since there seems to be no import functionality for the Datastore emulator, you can build your own.

It's something as simple as creating two clients within your script, one for the remote (cloud) Datastore, and one for the local Datastore emulator. Since the Cloud Client Libraries support the emulator, you can dig into the code to see how to establish the connection properly.

I did exactly that for the go Cloud Client Libraries, and came up with this script:

package main

import (
        "context"
        "fmt"
        "os"
        "time"

        "cloud.google.com/go/datastore"
        "google.golang.org/api/iterator"
        "google.golang.org/api/option"
        "google.golang.org/grpc"
)

const (
        projectId = "<PROJECT_ID>"
        namespace = "<NAMESPACE>"
        kind      = "<KIND>"
        emulatorHost = "<EMULATOR_HOST>:<EMULATOR_PORT>"
)

func main() {

        ctx := context.Background()

        // Create the Cloud Datastore client
        remoteClient, err := datastore.NewClient(ctx, projectId, option.WithGRPCConnectionPool(50))
        if err != nil {
                fmt.Fprintf(os.Stderr, "Could not create remote datastore client: %v \n", err)
        }

        // Create the local Datastore Emulator client
        o := []option.ClientOption{
                option.WithEndpoint(emulatorHost),
                option.WithoutAuthentication(),
                option.WithGRPCDialOption(grpc.WithInsecure()),
                option.WithGRPCConnectionPool(50),
        }
        localClient, err := datastore.NewClient(ctx, projectId, o...)
        if err != nil {
                fmt.Fprintf(os.Stderr, "Could not create local datastore client: %v \n", err)
        }

        // Create the query
        q := datastore.NewQuery(kind).Namespace(namespace)

        //Run the query and handle the received entities
        start := time.Now() // This is just to calculate the rate
        for it, i := remoteClient.Run(ctx, q), 1; ; i++ {
                x := &arbitraryEntity{}

                // Get the entity
                key, err := it.Next(x)
                if err == iterator.Done {
                        break
                }
                if err != nil {
                        fmt.Fprintf(os.Stderr, "Error retrieving entity: %v \n", err)
                }

                // Insert the entity into the emulator
                _, err = localClient.Put(ctx, key, x)
                if err != nil {
                        fmt.Fprintf(os.Stderr, "Error saving entity: %v \n", err)
                }

                // Print stats
                go fmt.Fprintf(os.Stdout, "\rCopied %v entities. Rate: %v/s", i, i/int(time.Since(start).Seconds()))
        }
        fmt.Fprintln(os.Stdout)
}


// Declare a struct capable of handling any type of entity.
// It implements the PropertyLoadSaver interface
type arbitraryEntity struct {
        properties []datastore.Property
}

func (e *arbitraryEntity) Load(ps []datastore.Property) error {
        e.properties = ps
        return nil
}

func (e *arbitraryEntity) Save() ([]datastore.Property, error) {
        return e.properties, nil
}

With this, I'm getting a rate of ~700 entities/s, but it could change a lot depending of the entities you have.

Do not set the DATASTORE_EMULATOR_HOST env variable, since the script is creating the connection manually to the local emulator, and you want the library to connect automatically to the Cloud Datastore.

The script could be greatly improved: both the remote and the local use GRPC, so you could use some proto-magic to avoid encoding-decoding the messages. Using batching for uploading would also help, as well as using Go's concurrency trickery. You could even get the namespaces and kinds programatically so you don't need to run this for each entity.

However, I think this simple proof of concept can help understand how you could develop your own tool to run an import.