Range update and querying in a 2D matrix

2019-03-14 04:12发布

问题:

I don't have a scenario, but here goes the problem. This is one is just driving me crazy. There is a nxn boolean matrix initially all elements are 0, n <= 10^6 and given as input. Next there will be up to 10^5 queries. Each query can be either set all elements of column c to 0 or 1, or set all elements of row r to 0 or 1. There can be another type of query, printing the total number of 1's in column c or row r.

I have no idea how to solve this and any help would be appreciated. Obviously a O(n) solution per query is not feasible.

回答1:

The idea of using a number to order the modifications is taken from Dukeling's post.

We will need 2 maps and 4 binary indexed tree (BIT, a.k.a. Fenwick Tree): 1 map and 2 BITs for rows, and 1 map and 2 BITs for columns. Let us call them m_row, f_row[0], and f_row[1]; m_col, f_col[0] and f_col[1] respectively.

Map may be implemented with array, or tree like structure, or hashing. The 2 maps are used to store the last modification to a row/column. Since there can be at most 105 modification, you may use that fact to save space from simple array implementation.

BIT has 2 operations:

  • adjust(value, delta_freq), which adjusts the frequency of the value by delta_freq amount.
  • rsq(from_value, to_value), (rsq stands for range sum query) which finds the sum of the all the frequencies from from_value to to_value inclusive.

Let us declare global variable: version

Let us define numRow to be the number of rows in the 2D boolean matrix, and numCol to be the number of columns in the 2D boolean matrix.

The BITs should have size of at least MAX_QUERY + 1, since it is used to count the number of changes to the rows and columns, which can be as many as the number of queries.

Initialization:

version = 1
# Map should return <0, 0> for rows or cols not yet
# directly updated by query
m_row = m_col = empty map
f_row[0] = f_row[1] = f_col[0] = f_col[1] = empty BIT

Update algorithm:

update(isRow, value, idx):
    if (isRow):
        # Since setting a row/column to a new value will reset
        # everything done to it, we need to erase earlier
        # modification to it.
        # For example, turn on/off on a row a few times, then
        # query some column
        <prevValue, prevVersion> = m_row.get(idx)
        if ( prevVersion > 0 ):
            f_row[prevValue].adjust( prevVersion, -1 )

        m_row.map( idx, <value, version> )
        f_row[value].adjust( version, 1 )
    else:
        <prevValue, prevVersion> = m_col.get(idx)
        if ( prevVersion > 0 ):
            f_col[prevValue].adjust( prevVersion, -1 )

        m_col.map( idx, <value, version> )
        f_col[value].adjust( version, 1 )

    version = version + 1

Count algorithm:

count(isRow, idx):
    if (isRow):
        # If this is row, we want to find number of reverse modifications
        # done by updating the columns
        <value, row_version> = m_row.get(idx)
        count = f_col[1 - value].rsq(row_version + 1, version)
    else:
        # If this is column, we want to find number of reverse modifications
        # done by updating the rows
        <value, col_version> = m_col.get(idx)
        count = f_row[1 - value].rsq(col_version + 1, version)

    if (isRow):
       if (value == 1):
           return numRow - count
       else:
           return count
    else:
       if (value == 1):
           return numCol - count
       else:
           return count

The complexity is logarithmic in worst case for both update and count.



回答2:

Take version just to mean a value that gets auto-incremented for each update.

Store the last version and last update value at each row and column.

Store a list of (versions and counts of zeros and counts of ones) for the rows. The same for the columns. So that's only 2 lists for the entire grid.

When a row is updated, we set its version to the current version and insert into the list for rows the version and if (oldRowValue == 0) zeroCount = oldZeroCount else zeroCount = oldZeroCount + 1 (so it's not the number of zero's, rather the number of times a value was updated with a zero). Same for oneCount. Same for columns.

If you do a print for a row, we get the row's version and last value, we do a binary search for that version in the column list (first value greater than). Then:

if (rowValue == 1)
  target = n*rowValue
           - (latestColZeroCount - colZeroCount)
           + (latestColOneCount - colOneCount)
else
  target = (latestColOneCount - colOneCount)

Not too sure whether the above will work.

That's O(1) for update, O(log k) for print, where k is the number of updates.