I am trying to run a second node on a different processor, either an ARM or a second x86_64. I have a DomMgr running on one x86_64 and attempting to start a node on either another x86_64 or arm using nodeBooter. The DevMgr starts and registers with the DomMgr but when it starts the GPP device it "Requesting IDM CHANNEL IDM CHANNEL IDM_CHANNEL" and then immediately "terminate called after throwing an instance of 'CORBA::OBJECT_NOT_EXIST'". The DomMgr printed out to the console that "Domain Channel: IDM_Channel created". Is it supposed to register that in the NameService or why does the remote DevMgr get an invalid object ref when it tries to get it?
I did not realize I could clarify my question by editing it to add new findings. I'll do that from now on.
By using ORBtraceLevel on the remote DevMgr I found that I had different problem on my remote x86-based DevMgr and my ARM-based one, even though the normal error messages were the same. The x86 case was simply that I my exported DevMgr dcd used the same name and id as one running locally on the Domain. When I fixed that I have no problem with the x86-based remote DevMgr starting its GPP device and registering.
But this is NOT the problem for the ARM-based case. With traceLevel=10 I started DevMgr on both my x86 successfully and my ARM and compared the outputs. First I should mention that my ARM is running Ubuntu 16.04 on a RaspberryPi 3. The cpu is 64-bit but no distro for either Ubuntu or CentOS is available as 64-bit so the OS is 32-bit Ubuntu for now. I know that RedHawk 2.0 says it only now supports 64-bit CentOS so perhaps that is the problem, although I was able to build RedHawk with no trouble and most of it works fine. But trace does show two warnings
WARN Device_impl:172 - Cannot set allocation implementation: Property ### is
of type 'CORBA::Long' (not 'int')
which do not show in the x86 case and I believe are due to the different sizes of int. If I do not start an Event Service on the domain, these same warnings show but I am able to start the GPP fine and run waveforms. So I do not know if this is related to my OBJECT_NOT_FOUND error in GPP or not but thought I should mention it.
Trace shows one successful
Creating ref to remote: REDHAWK.DEV.IDM.Channel
target id :IDL:omg.org/CosEventChannelAdmin/EventChannel:1.0
most derived id:
Adding root/Files<3> (activating) to object table.
but on the second case it immedately shows
Adding root<3> (activating) to object table.
followed by
throw OBJECT_NOT_EXIST from GIOP_C.cc:281 (NO,OBJECT_NOT_EXIST_NoMatch)
throw OBJECT_NOT_EXIST from omniOrbRef.cc:829 (NO,OBJECT_NOT_EXIST_NoMatch)
and then GPP terminates with signal 6.
The successful x86 trace shows the same Creating ref and Adding root<3> but then has
Creating ref to remote: root/REDHAWK_DEV.IDM_Channel <...>
Can this be related to the 32-bit vs 64-bit or why would this happen only on the ARM based GPP?
Note that I have iptables accepting any traffic from my subdomain on x86s and is not running at all on the ARM. There is a lot of successful connections including queries with nameclt, so this is not (as far as I can tell) a network connection issue.
It sounds like something is miss-configured on your system, perhaps IPTables or selinux? Lets walk through a quick example to show the minimum needed configuration and running processes needed for a multi-node system. If this does not clear things up, I'd suggest rerunning the domain and device manager with TRACE level debugging enabled and examine the output for any anomalies or disable selinux and iptables temporarily to rule them out as issues.
I'll use a REDHAWK 2.0.1 docker image as a tool to walk through the example. The installation steps used to build this image can be found here.
Note it only has a single interface so we do not need to specify an endPoint. However specifying the unix socket endpoint would provide a performance boost for any locally running components.
We can now startup omniNames, omniEvents, and the domain manager and after each step see what is running. The "extra operand" output on omniNames is expected on newer versions of CentOS6 and is an issue with the omniNames init script.
So we can see that we have omniNames, omniEvents, and the DomainManager binaries running. Time to move on to a new node for the device manager.
In a new terminal I create a new container and call it deviceManager
So we've successfully spun up two machines on the same network with unique IP addresses, designated one as the domain manager, omniNames, and omniEvents server and the other as a Device Manager / GPP node. At this point, we could connect to the domain manager either via the IDE or through a python interface and launch waveforms; we would expect these waveforms to launch on the sole device manager node.
What version of REDHAWK are you running? What OS? Can you provide a list of all the omni rpms you have installed on your machine?