Adding multiple rows of a table to another table

2019-07-25 08:09发布

问题:

I have a table as below : (This is a few lines from my table)

T = table({'A';'A';'A';'B';'B';'B';'C';'C';'C';'C'}, {'x';'y';'z';'x';'w';'t';'z';'x';'t';'o'},[5;1;2;2;4;2;2;5;4;1], ...
      'VariableNames', {'memberId', 'productId','Rating'});

T:

A  x  5
A  y  1
Z  z  2
B  x  2
B  w  4
B  t  2
C  z  2
C  x  5
C  t  4
C  o  1
C  u  3
D  r  1
D  t  2
D  w  5
.
.
.
.

I need to take the user A then Create a table like Previous table (Table T) and All rows are related to the user A to enter that table.At this point in the table are the following lines:

A  x  5
A  y  1
A  z  2

Next, consider products related to this user i.e x,y,z . then All lines that contain x and then y and z are adding to the table. At this point in the table are the following lines:

A  x  5
A  y  1
A  z  2
B  x  2
C  z  2
C  x  5

Then, other users have been added to the table to consider i.e B,C . Then The same thing was done for the first user (A) is done for this user (Respectively for B then C). This is done so that the required number of rows add in the table. Here, for example, 8 rows is required. i.e The end result is as follows:

A  x  5
A  y  1
A  z  2
B  x  2
C  z  2
C  x  5
B  w  4
B  t  2

i.e when work is finished the requested number of rows in the second table row to be imported.

I would be grateful if any body help me in this regard.

回答1:

Here is a way for doing what you ask for (though some cases are not well defined in your question):

% I added user 'D' for the scenario of an unconnected node
T = table({'A';'A';'A';'B';'B';'B';'C';'C';'C';'C';'D';'D';'D';'D'},...
    {'x';'y';'z';'x';'w';'t';'z';'x';'t';'o';'q';'p';'f';'v'},...
    [5;1;2;2;4;2;2;5;4;1;4;5;2;1], ...
    'VariableNames', {'memberId', 'productId','Rating'});
% initial preparations:
rows_limit = 8;
first_user = 'B'; % this is just for readability
newT = table(cell(rows_limit,1),cell(rows_limit,1),zeros(rows_limit,1),...
    'VariableNames',{'memberId', 'productId','Rating'});
% We need an index vector so we won't add the same row twice:
added = false(height(T),1);
row_count = 1;
users_list = {first_user};

% now we start adding rows to newT until it's full: 
while row_count<rows_limit
    while numel(users_list)>=1
        % get all the user's rows
        next_to_add = strcmp(T.memberId,users_list{1}) & ~added;
        % if this user has any rows to be added:
        if sum(next_to_add)>0
            % if there's enough empty rows in newT add them to it:
            if  sum(next_to_add) <= rows_limit-row_count+1
                newT(row_count:row_count+sum(next_to_add)-1,:) = T(next_to_add,:)
                % and update the index vector:
                added = added | strcmp(T.memberId,users_list{1});
            else
                % otherwise - fill the empty rows and quit the loop:
                if row_count <= rows_limit
                    end_to_add = find(next_to_add,rows_limit-row_count+1);
                    newT(row_count:rows_limit,:) = T(end_to_add,:)
                end
                row_count = rows_limit+1; % to exit the outer loop
                break
            end
            row_count = row_count+sum(next_to_add);

            % Add related products:
            % ====================
            % save the first new user to be addaed by related products:
            last_user_row = row_count;
            % get all the products we already added to newT:
            products = unique(newT.productId(1:row_count-1),'stable');
            % although we want only the last user products, because we add all the
            % products the before, our index vector ('added') will eliminate them
            for p = 1:numel(products)
                % get all the product's rows
                next_to_add = strcmp(T.productId,products{p}) & ~added;
                % if there's enough empty rows in newT add them to it:
                if sum(next_to_add)>0
                    if sum(next_to_add) <= rows_limit-row_count+1
                        newT(row_count:row_count+sum(next_to_add)-1,:) = T(next_to_add,:);
                        % and update the index vector:
                        added = added | strcmp(T.productId,products{p});
                    else
                        % otherwise - fill the empty rows and quit the loop:
                        if row_count <= rows_limit
                            end_to_add = find(next_to_add,rows_limit-row_count+1);
                            newT(row_count:rows_limit,:) = T(end_to_add,:);
                        end
                        row_count = rows_limit+1; % to exit the outer loop
                        break
                    end
                end
                row_count = row_count+sum(next_to_add);
            end
        end
        % get the list of new users we just added, and concat to the users
        % left in the original list:
        users_list = [unique(newT.memberId(last_user_row:row_count-1),'stable');
            unique(T.memberId(~added),'stable')];
    end
end

Which gives newT:

memberId    productId    Rating
________    _________    ______
'B'         'x'          2     
'B'         'w'          4     
'B'         't'          2     
'A'         'x'          5     
'C'         'x'          5     
'C'         't'          4     
'A'         'y'          1     
'A'         'z'          2     

In this implementation, the rows are added user by user, and product by product, and if the next user/product to be added has more rows then what's available in newT, then we add as much rows as we cen, until we get to the rows_limit and then the loop quits.

So for a rows_limit = 4;, you will get newT as:

memberId    productId    Rating
________    _________    ______
'B'         'x'          2     
'B'         'w'          4     
'B'         't'          2     
'A'         'x'          5     

As long as there are connections between users, so each user's related products brings new users to the list, the loop continues with the new users in newT. However, it could be that we start from a node that not all other nodes are parts of its network. For instance, have a look a the following graph figure that illustrates the connections in the extended example I used in the code above:

Node D is not connected to all others, so unless we actively look for new unrelated users in T, we will never get to it. The implementation above does look for this kind of users.