I have X as a csr_matrix that I obtained using scikit's tfidf vectorizer, and y which is an array
My plan is to create features using LDA, however, I failed to find how to initialize a gensim's corpus variable with X as a csr_matrix. In other words, I don't want to download a corpus as shown in gensim's documentation nor convert X to a dense matrix, since it would consume a lot of memory and the computer could hang.
In short, my questions are the following,
- How do you initialize a gensim corpus given that I have a csr_matrix (sparse) representing the whole corpus?
- How do you use LDA to extract features?