I'm setting up a simple Proof of Concept to learn some of the concepts in Google Cloud, specifically PubSub and Dataflow.
I have a PubSub topic greeting
I've created a simple cloud function that sends publishes a message to that topic:
const escapeHtml = require('escape-html');
const { Buffer } = require('safe-buffer');
const { PubSub } = require('@google-cloud/pubsub');
exports.publishGreetingHTTP = async (req, res) => {
let name = 'no name provided';
if (req.query && req.query.name) {
name = escapeHtml(req.query.name);
} else if (req.body && req.body.name) {
name = escapeHtml(req.body.name);
}
const pubsub = new PubSub();
const topicName = 'greeting';
const data = JSON.stringify({ hello: name });
const dataBuffer = Buffer.from(data);
const messageId = await pubsub.topic(topicName).publish(dataBuffer);
res.send(`Message ${messageId} published. name=${name}`);
};
I set up a different cloud function that it triggered by the topic:
const { Buffer } = require('safe-buffer');
exports.subscribeGreetingPubSub = (data) => {
const pubSubMessage = data;
const passedData = pubSubMessage.data ? JSON.parse(Buffer.from(pubSubMessage.data, 'base64').toString()) : { error: 'no data' };
console.log(passedData);
};
This works great and I see it registered as a subscription on the topic.
Now I want to send the use Dataflow to send the data to BigQuery
There appear to be 2 template to accomplish this:
- Cloud Pub/Sub Subscription to BigQuery
- Cloud Pub/Sub Topic to BigQuery
I don't understand the difference between Topic and Subscription in this context.
https://medium.com/google-cloud/new-updates-to-pub-sub-to-bigquery-templates-7844444e6068 sheds a little bit of lights:
Note that a caveat of using subscriptions over topics is that subscriptions are only read once while topics can be read multiple times. Therefore a subscription template cannot support multiple concurrent pipelines reading the same subscription.
But I must say I'm still lost to understand the real implications of this.