How to set a future insert date in Google Cloud Bi

2019-08-27 13:37发布

I have a table with only one column family, this column has a TTL of 172800 SECONDS (2 DAYS), I need some data to be deleted before the deadline. If I want the value to expire in 5mins, I calculate the expiry time and set the insert date to be 5 mins before expiry time.

I am using the HBase Client for Java to do this.

But the value doesn't seem to expire. Any suggestions on the same?

I used cbt to create the table:

cbt createtable my_table families=cf1:maxage=2d

HColumnDescriptor:

{NAME => 'cf1', BLOOMFILTER => 'ROW', VERSIONS => '2147483647', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => '172800 SECONDS (2 DAYS)', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}

Java Code:

import com.google.cloud.bigtable.hbase.BigtableConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.Calendar;
import java.util.Date;

public class BigTable {
    public static void main(String... args) {
        String projectId = "my-gcp-project-id";
        String instanceId = "my-bigtable-instance-id";
        String tableId = "my-table";    // my-bigtable-table-id

        try (Connection connection = BigtableConfiguration.connect(projectId, instanceId)) {
            try (Table table = connection.getTable(TableName.valueOf(tableId))) {

                HTableDescriptor hTableDescriptor = table.getTableDescriptor();
                hTableDescriptor.setCompactionEnabled(true);

                byte[] cf1 = Bytes.toBytes("cf1");
                byte[] rk1 = Bytes.toBytes("rowkey1");
                byte[] q1 = Bytes.toBytes("q1");

                HColumnDescriptor cfDescriptor1 = hTableDescriptor.getFamily(cf1);
                System.out.println("\n " + cfDescriptor1);

                Calendar now = Calendar.getInstance();
                Calendar now1 = Calendar.getInstance();
                now1.setTime(now.getTime());

                long nowMillis = now.getTimeInMillis(); // Current time

                now.add(Calendar.SECOND, cfDescriptor1.getTimeToLive()); // Adding 172800 SECONDS (2 DAYS) to current time
                long cfTTLMillis = now.getTimeInMillis(); // Time the values in the column family will expire at

                now1.add(Calendar.SECOND, 300); // Adding 300 secs (5mins)
                long expiry = now1.getTimeInMillis(); // Time the value should actually live

                long creationTime = nowMillis + cfTTLMillis - expiry;

                System.out.println("\n Date nowMillis:\t" + new Date(nowMillis) + "\n Date creationTime:\t" + new Date(creationTime) + "\n Date cfTTLMillis:\t" + new Date(cfTTLMillis));

                //Add Data
                Put p = new Put(rk1, creationTime);
                p.addColumn(cf1, q1, Bytes.toBytes("CFExpiry_2d_ExpTime_5mins"));
                //p.setTTL(creationtime); // What does this do?
                table.put(p);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }}

Calculated dates:

 Date nowMillis:    Wed Oct 03 10:34:15 EDT 2018
 Date creationTime: Fri Oct 05 10:29:15 EDT 2018
 Date cfTTLMillis:  Fri Oct 05 10:34:15 EDT 2018

The Value is inserted correctly with the correct calculated dates. But doesn't seem to expire? Please correct my concepts if wrong.

Edit:

After the below correction in date calculation, the values do expire.

long nowMillis = System.currentTimeMillis() / 1000;
long cfTTLMillis = nowMillis - cfDescriptor1.getTimeToLive();
long creationTime = (cfTTLMillis + 300) * 1000;

1条回答
爱情/是我丢掉的垃圾
2楼-- · 2019-08-27 14:30

Cloud Bigtable does not garbage collect rows until a compaction occurs. That may happen hours (or possibly a few days) after the expected expiration.

If you want to make sure to not read data that should have expired, please set a timestamp range filter on the data read so that values outside of the allowed range aren't returned in the query.

Alternatively, you'll have to filter them out after the data is returned, but it's much more efficient to filter it out server-side so that the client does not have to download or process it.

查看更多
登录 后发表回答