matlab - How can I merge together two co-occurrence matrices with overlapping but not identical vocabularies? -
i'm looking @ word co-occurrence in number of documents. each set of documents, find vocabulary of n frequent words. make nxn matrix each document representing whether words occur in same context window (sequence of k words). sparse matrix, if have m documents, have nxnxm sparse matrix. because matlab cannot store sparse matrices more 2 dimensions, flatten matrix (nxn)xm sparse matrix.
i face problem generated 2 of these co-occurrence matrices different sets of documents. because sets different, vocabularies different. instead of merging sets of documents , recalculating co-occurrence matrix, i'd merge 2 existing matrices together.
for example,
n = 5; % size of vocabulary m = 5; % number of documents = ones(n*n, m); % flattened (n, n, m) matrix b = 2*ones(n*n, m); % b flattened (n, n, m) matrix a_ind = {'a', 'b', 'c', 'd', 'e'}; % vocabulary labels b_ind = {'a', 'f', 'b', 'c', 'g'}; % vocabulary labels b
should merge produce (49, 5) matrix, each (49, 1) slice can reshaped (7,7) matrix following structure.
b c d e f g __________________________________________ a| 3 3 3 1 1 2 2 b| 3 3 3 1 1 2 2 c| 3 3 3 1 1 2 2 d| 1 1 1 1 1 0 0 e| 1 1 1 1 1 0 0 f| 2 2 2 0 0 2 2 g| 2 2 2 0 0 2 2
where , b overlap, co-occurrence counts should added together. otherwise, elements should counts or counts b. there elements (0's in example) don't have count statistics because of vocabulary exclusively in , exclusively in b.
the key use ability of logical indices flattened.
a = ones(25, 5); b = 2*ones(25,5); a_ind = {'a', 'b', 'c', 'd', 'e'}; b_ind = {'a', 'f', 'b', 'c', 'g'}; new_ind = [a_ind, b_ind(~ismember(b_ind, a_ind))]; new_size = length(new_ind)^2; new_array = zeros(new_size, 5); % find indices correspond elements of a_overlap = double(ismember(new_ind, a_ind)); a_mask = (a_overlap'*a_overlap)==1; % find indices correspond elements of b b_overlap = double(ismember(new_ind, b_ind)); b_mask = (b_overlap'*b_overlap)==1; % flatten logical indices assign elements new array new_array(a_mask(:), :) = a; new_array(b_mask(:), :) = new_array(b_mask(:), :) + b;
Comments
Post a Comment