How to filter dataframe based on ip range

2019-08-24 20:35发布

I have dataframe which has 2 columns. I want to filter this dataframe based on ip ranges present in json file.

ip_ranges.json

[
    {"start": "45.43.144.0", "end": "45.43.161.255"}
    {"start": "104.222.130.0", "end": "104.222.191.255"}
    ...
]

Dataframe:

ip,p_value
97.98.173.96,3.7
73.83.192.21,6.9
...

Note: ip_range.json contains 100k elements and my dataframe has 300k rows.

Currently, I implemented like this

Created python list to store all ips in each range. For example ["45.43.144.0", "45.43.144.1", "45.43.144.2", ..., "45.43.161.255"]. Similar way for all ip ranges.
Removed duplicate elements from this list
Constructed dataframe using this list
Merged two dataframes on 'ip'

This process works fine for small set of ip_ranges. But for large set of ip_ranges, the process takes longer time to complete.

Is there any better approach to perform this more efficiently?

标签： pandas dataframe filter python-3.6 ip-address

1条回答

仙女界的扛把子

2楼-- · 2019-08-24 21:42

Just an idea: Put you ranges into a dataframe ip_range with columns From and To. Convert all ip-addresses (the ones in df, too) to decimal numbers with the fast code provided for example here.

Now generating the ranges can be done fast:

ip_range['Rng'] = ip_range.apply(lambda x: np.arange(x.From, x.To+1), axis=1)

These ranges can be converted into a DataFrame:

ips = pd.DataFrame(itertools.chain(*ip_range['Rng']))

This DataFrame can easily be merged with df.

0人赞添加讨论(0) 举报

How to filter dataframe based on ip range

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间