I want to record all used ports within huge pcaps. There are 65535 ports available, and each port is able to talk each other port: 65535 x 65535 links in total
The matrix will be very sparse (many 0 entries). Additionally, I think the edges don't have to be directed, so Port1->Port2 may be added to Port2->Port1 (which reduces our amount of values to 65535 * 65536 / 2). How would you store this using python? In numpy? What will be the estimated amount of memory consumption for this?
Afterwards, I want to find the greatest sum for one port and pop() it (the whole row and column while). This means, i want to find e.g. that Port1 was used 500 times (100 times from Port2 to Port1, 300 times from Port3 to Port1, Port4 to Port1 100times)...
Graphically spoken, I want to have 65535 nodes that could be connected with each other. Then I want to find the node that has the highest sum of values on connected edges. Afterwards, I want to pop the node (and delete the corresponding edges, which will decrease the sum of other nodes).
Thanks!
In Python, and depending on how sparse is sparse, a dict-of-dicts will handle this quite well.
If I have understood what you are doing correctly, then the count of connections to port p is
removing port p is
If you want to try to save memory by storing only half the pairs, then simplicity suffers greatly.
Look into the adjacency list representation of Graph, it will most probably suits your needs.
However, a graph containing 65535 vertices is not that big. Even if you cannot represent it with a simple matrix.
The memory consumption is O(E+V) with V number of vertices (65535) and E number of edges (on a sparse graph, it has the same magnitude order than V).