Fuzzy c-means tcp dump clustering in matlab

2019-06-01 04:41发布

问题:

Hi I have some data thats represented like this:

0,tcp,http,SF,239,486,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,19,19,1.00,0.00,0.05,0.00,0.00,0.00,0.00,0.00,normal.

Its from the kdd cup 1999 which was based on the darpa set.

the text file I have has rows and rows of data like this, in matlab there is the generic clustering tool you can use by typing findcluster but it only accepts .dat files.

Im also not very sure if it will accept the format like this. Im also not sure why there is so many trailing zeros in the dump files.

Can anyone help how I can utilise the text document and run it thru a fcm clustering method in matlab? Code help is really needed.

回答1:

FINDCLUSTER is simply a GUI interface for two clustering algorithms: FCM and SUBCLUST

You first need to read the data from file, look into the TEXTSCAN function for that.

Then you need to deal with non-numeric attributes; either remove them or convert them somehow. As far as I can tell, the two algorithms mentioned only support numeric data.

Visit the original website of the KDD cup dataset to find out the description of each attribute.