Why ARP requests a non-local address?

2020-07-13 09:07发布

问题:

I have a Linux virtual server with 2 NICs.

eth0 <IP1>
eth1 <IP2>

arp_filter is turned on and rp_filter is set to be 2(loose mode).
Policy-routing is configured as the following:

table T1
default via <GW> dev eth0 src <IP1>
127.0.0.0/8 dev lo
<LAN> dev eth0 src <IP1>

table T2
default via <GW> dev eth1 src <IP2>
127.0.0.0/8 dev lo
<LAN> dev eth1 src <IP2>

ip rule add from <IP1> table T1
ip rule add from <IP2> table T2

After that, I can ping both binding floatingips of <IP1> and <IP2> from outside. However ping -I eth1 <some_domain> dosen't work. tcpdump shows that when I ping from eth1 to outside, Linux directly asks MAC of the outside address, which is incorrect because they are not in the same LAN.

Here is tcpdump data:

root@rm-2:~# tcpdump -i eth1 arp
tcpdump: verbose output suppressed, use -v or -vv for full protocol     decode
listening on eth1, link-type EN10MB (Ethernet), capture size 65535     bytes
17:53:08.696191 ARP, Request who-has 172.30.250.119 tell 172.30.248.2, length 46
17:53:08.728482 ARP, Request who-has 172.30.251.144 tell 172.30.251.138, length 46
17:53:09.447252 ARP, Request who-has 61.135.169.125 tell 172.30.251.43, length 28
17:53:09.551514 ARP, Request who-has 172.30.250.127 tell 172.30.248.2, length 46
17:53:09.698076 ARP, Request who-has 172.30.250.119 tell 172.30.248.2, length 46
17:53:09.859046 ARP, Request who-has 172.30.248.246 tell 172.30.248.245, length 46
17:53:10.446009 ARP, Request who-has 61.135.169.125 tell 172.30.251.43, length 28
17:53:10.477104 ARP, Request who-has 172.30.250.128 tell 172.30.248.2, length 46

As you can see, 61.135.169.125 is a foreign address, is this a bug or something?

EDIT
Output of route: // 172.30.248.1 is the GW

Destination     Gateway         Genmask         Flags Metric Ref    Use      Iface  
default         172.30.248.1    0.0.0.0         UG    0      0        0      eth0

回答1:

Answer:

You need to add an output interface rule (ip rule add oif ...) in addition to ip rule add from ..., because ping is binding to the interface, not the IP.

Example:

ip rule add from <IP1> table T1
ip rule add oif <eth0> table T1
ip rule add from <IP2> table T2
ip rule add oif <eth1> table T2

Explanation:

The ping example in your question is using the interface as the source (ping -I eth1 <some domain>), which does not have any matching policy routes. So ping is behaving exactly like it would if the there were no routes defined for the interface at all.

Example to test/prove (starting without policy routing):

Using my phone USB tethered as an alternate route, I have the following base configuration.

Linux desktop:
$ ip addr show usb0
...
inet 192.168.42.1/32 ...
..

Android phone:
$ ip addr show rndis0
...
inet 192.168.42.129/24 ...
...

Because the desktop usb0 interface is assigned a /32 address, if I try to ping 192.168.42.129 -I 192.168.42.1, it will fail because there is no route defined for that address, and it is not within the broadcast domain of the usb0 address. However, with ping 192.168.42.129 -I usb0 -- I am telling ping to use the interface itself, and there are no routes matching the interface (thus, no concept of a broadcast domain), so it will blindly trigger an ARP request for any IP that is not it's own.

Lets attempt to ping using the interface (no routes). This will cause an ARP request to occur even though it is not within the same broadcast domain:

desktop$ ping 192.168.42.129 -I usb0
phone# tcpdump -i rndis0 -n icmp or arp
ARP, Request who-has 192.168.42.129 tell 192.168.42.1, length 28
ARP, Reply 192.168.42.129 is-at 3e:04:37:23:05:0e, length 28
IP 192.168.42.1 > 192.168.42.129: ICMP echo request, id 24641, seq 1, length 64
IP 192.168.42.129 > 192.168.42.1: ICMP echo reply, id 24641, seq 1, length 64

Using the source IP of the interface (no routes), it does not make an ARP request because the source is not within the broadcast domain:

desktop$ ping 192.168.42.129 -I 192.168.42.1
phone# tcpdump -i rndis0 -n icmp or arp
... nothing comes over the wire, as expected ...

Now if I add a route to the host over the interface, ping knows it can make an ARP request for the 192.168.42.129 address:

desktop$ ip route add 192.168.42.129/32 dev usb0
desktop$ ping 192.168.42.129 -I 192.168.42.1
phone# tcpdump -i rndis0 -n icmp or arp
ARP, Request who-has 192.168.42.129 tell 192.168.42.1, length 28
ARP, Reply 192.168.42.129 is-at 3e:04:37:23:05:0e, length 28
IP 192.168.42.1 > 192.168.42.129: ICMP echo request, id 24667, seq 1, length 64
IP 192.168.42.129 > 192.168.42.1: ICMP echo reply, id 24667, seq 1, length 64

So the same concept applies when I try to ping something off-network; if I ping 8.8.8.8 using the interface as the source, it will blindly make an ARP request sans any matching routes:

desktop$ ping 8.8.8.8 -I usb0
phone# tcpdump -i rndis0 -n icmp or arp
ARP, Request who-has 8.8.8.8 tell 192.168.42.1, length 28

When using the interface address, the lack of any kind of next-hop route in the routing table will cause it to fail and not make an ARP request:

desktop$ ping 8.8.8.8 -I 192.168.42.1
phone# tcpdump -i rndis0 -n icmp or arp
... nothing ...

So lets add policy routing for the 192.168.42.1 address (using "ip rule from ...") to use 192.168.42.129 as the next-hop default, in the same manner as your question example:

desktop$ sudo ip rule add from 192.168.42.1 lookup T1
desktop$ sudo ip route add default via 192.168.42.129 dev usb0 table T1
desktop$ ping 8.8.8.8 -I 192.168.42.1
PING 8.8.8.8 (8.8.8.8) from 192.168.42.1 : 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=59 time=28.6 ms
...
phone# tcpdump -i rndis0 -n icmp or arp
IP 192.168.42.1 > 8.8.8.8: ICMP echo request, id 24969, seq 1, length 64
IP 8.8.8.8 > 192.168.42.1: ICMP echo reply, id 24969, seq 1, length 64

It works, because we are using the address, it rightfully matches the ip rule.

Now we try the same ping again using the interface:

desktop$ ping 8.8.8.8 -I usb0
PING 8.8.8.8 (8.8.8.8) from 192.168.42.1 usb0: 56(84) bytes of data.
^C
--- 8.8.8.8 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
...
phone# tcpdump -i rndis0 -n icmp or arp
ARP, Request who-has 8.8.8.8 tell 192.168.42.1, length 28

It fails; there no interface route for the next-hop, so it will again make an ARP request, which will never get replied to. So we need to add an ip rule for the interface to use 192.168.42.129 as the next-hop as well:

desktop$ sudo ip rule add oif usb0 lookup T1
desktop$ ping 8.8.8.8 -I usb0
PING 8.8.8.8 (8.8.8.8) from 192.168.42.1 usb0: 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=59 time=10.7 ms

--- 8.8.8.8 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 10.791/10.791/10.791/0.000 ms
...
phone# tcpdump -i rndis0 -n icmp or arp
IP 192.168.42.1 > 8.8.8.8: ICMP echo request, id 24979, seq 1, length 64
IP 8.8.8.8 > 192.168.42.1: ICMP echo reply, id 24979, seq 1, length 64

I believe generally, the lack of an interface route would not have had a negative impact on your implementation for normal, non-interface bound, outgoing connections. Most (not all) applications bind to the address for outbound TCP/UDP connections, and only bind to the interface for incoming connections (listening). The ping utility is a special case.

To prove this, if I remove the interface rule from the routing policy, I am still able to use normal outbound sockets when specifying the bind address. In the example below I use telnet and netcat, in both cases specifying the bind address (-b 192.168.42.1) and it properly matches the T1 table, and thus uses the gateway.

# remove the interface route, keep the address route
desktop$ sudo ip rule del from all oif usb0 lookup T1
desktop$ nc -zv 8.8.8.8 443 -s 192.168.42.1
google-public-dns-a.google.com [8.8.8.8] 443 (https) open

phone# tcpdump -i rndis0 -n host 8.8.8.8
IP 192.168.42.1.40785 > 8.8.8.8.443: Flags [S], seq 1678217252, win 29200, options [mss 1460,sackOK,TS val 20223895 ecr 0,nop,wscale 6], length 0
IP 8.8.8.8.443 > 192.168.42.1.40785: Flags [S.], seq 86178051, ack 1678217253, win 28400, options [mss 1432,sackOK,TS val 1937335284 ecr 20223895,nop,wscale 8], length 0
....
desktop$ telnet 8.8.8.8 53 -b 192.168.42.1
...
phone# tcpdump -i rndis0 -n host 8.8.8.8
IP 192.168.42.1.57109 > 8.8.8.8.53: Flags [.], ack 1, win 457, options [nop,nop,TS val 20288983 ecr 4154032957], length 0
IP 8.8.8.8.53 > 192.168.42.1.57109: Flags [F.], seq 1, ack 1, win 111, options [nop,nop,TS val 4154033968 ecr 20288983], length 0

I ran into this same issue while testing a policy-route implementation, and it took me a bit wrap my head around why my interface pings were unanswered. Hopefully this clears it up.



标签: linux tcp-ip arp