Sparse Principal Component Analysis using sklearn

2019-08-19 04:15发布

I'm trying to replicate an application from this paper, where the authors download the 20 newsgroups data and use SPCA to extract the principal components that in some sense best describe the text corpus [see section 4.1]. This is for a high dimensions class project where we were asked to pick a topic and replicate/apply it.

The output should be K principal components, which each have a short list of words that all intuitively correspond to a certain theme (for example, the paper finds the first PC is all about politics and religion).

From my research it seems like the best way to reproduce the application from this paper is using this algorithm: sklearn.decomposition.MiniBatchSparsePCA.

I have found only one example of how this alogrithm works, here.

So my question is this: Is it, in principal, possible to follow the steps in the above linked example, using text data to reproduce the application from section 4.1 in the paper linked in the first paragraph?

If it is, I would then be able to ask more concrete question regarding the code.

标签： python machine-learning pca

0条回答

Sparse Principal Component Analysis using sklearn

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间