How to translate from SQL to NoSQL/MapReduce?

2019-03-09 04:16发布

问题:

I have a background working with relational databases but recently started to dabble in CouchDB and was surprised by how some non-relational operations, which would be simple in SQL, were not first-class functions in CouchDB.

I would appreciate you taking a moment to map each SQL statement below to its MapReduce equivalent.

SELECT COUNT(*) FROM products WHERE price < 20.00;
SELECT category, SUM(price) FROM products GROUP BY category;
UPDATE products SET price = 19.99 WHERE price = 20.00;
DELETE FROM products WHERE expires_at <= NOW();

回答1:

The SELECT commands are pretty easy. Bulk writes are a bit more complicated. Generally, you'll use some view to retrieve the documents that need to be changed, then you'll use the _bulk_docs API to send all the changes at once.

Also, consult the documentation regarding views for details for how to issue queries. This includes ordering, grouping, etc.


SELECT COUNT(*) FROM products WHERE price < 20.00;

Map

function (doc) {
  if (doc.price < 20) {
    emit(doc.price);
  }
}

Reduce

_count

If you need this to work with an arbitrary amount, not just 20, then you'll need to emit the price in all cases, and use startkey and endkey to narrow down your resultset.


SELECT category, SUM(price) FROM products GROUP BY category;

Map

function (doc) {
  emit(doc.category, doc.price);
}

Reduce

_sum

This map function essentially uses the category as the key, with the price as the value in your key/value pair. The reduce function will add up the prices for each different key.


UPDATE products SET price = 19.99 WHERE price = 20.00;

Map

function (doc) {
  if (doc.price == 20) {
    emit(doc.price);
  }
}

Once your application pulls down the contents of this view, you'll perform all the manipulations in your application code, then send back the results into the database via the _bulk_docs API.


DELETE FROM products WHERE expires_at <= NOW();

Map

function (doc) {
  emit(doc.expires_at);
}

Depending on how your store your date-time values, you may need to adjust the map function as well as your query to the view. Using a timestamp (JS uses milliseconds instead of seconds) is probably the fastest way to accomplish this. Once you've set up your query, you'll add a new field to each of these documents. _deleted: true. Once you send this list back into the database (again with _bulk_docs) all the specified documents will be deleted.