Dear RTI Community
For some reason using the hello world example, my subscriber unmatches with its publisher. I'm struggling debugging this problem and I would sincerely appreciate any insights you all could give me.
My goal for the company I'm currently working is to embed the RTI Micro edition onto an RTX/TCPNet enabled embedded system. I've ported the OSAPI and the Netio components. I've ported the Helloworld_static_udp publisher component onto the embedded system, and I'm running the subscriber on PC running the Windows 7.
When I run in a similar configuration between two PCs, I get
Sample received
msg: Hello World! (1316)
.
.
.
Forever and ever.
Now when I use the embedded system as the publisher, it starts out the same for about 100 updates to the HelloWorld field. THen I get...
Sample received
msg: Hello World! (1318)
Sample received
INVALID DATA
Unmatched a publisher
Some time later the subscriber reconnents to the publisher, and the transfer begin, followed by the same error message. What can the difference be? I'm a novice with RTPS, so I don't know under what circumstances that a subscriber will be unmatched from a publisher.
I looked at the Wireshark trace ot both. I don't see any thing remarkably different that jumps out at me. So I turned up the trace on the subscriber. Here's what I get in the case where the subscriber unmatches with the publisher:
[1390516736.134498000] TID[7600][DATAREADER] Received topic Example HelloWorld
[1390516736.134498000] TID[7600]DRI: committing entry
Sample received
msg: Hello World! (101)
[1390516736.134498000] TID[7600]UDP: processed message
[1390516737.69552000] TID[7600][DPDE] PUBLICATION DATA RECEIVED
[1390516737.69552000] TID[7600]remove remote publication
[1390516737.69552000] TID[7600][DATAREADER] Unbind datareader[52e1999c.ed02cd2f.deadc0de.c7030000] for topic DCPSPublication to writer 0.4bc6a7e.1000da84.30a0000
[1390516737.69552000] TID[7600][DATAREADER] Unbind datareader[52e1999c.ed02cd2f.deadc0de.c7030000] for topic DCPSPublication to writer 0.4bc6a7e.1000da84.30a0000: bind does not exist
[1390516737.69552000] TID[7600][DATAREADER] Unbind datareader[52e1999c.ed02cd2f.deadc0de.c7040000] for topic DCPSSubscription to writer 0.4bc6a7e.1000da84.30a0000
[1390516737.69552000] TID[7600][DATAREADER] Unbind datareader[52e1999c.ed02cd2f.deadc0de.c7040000] for topic DCPSSubscription to writer 0.4bc6a7e.1000da84.30a0000: bind does not exist
[1390516737.69552000] TID[7600][DATAREADER] Unbind datareader[52e1999c.ed02cd2f.deadc0de.40a0000] for topic Example HelloWorld to writer 0.4bc6a7e.1000da84.30a0000
Sample received
INVALID DATA
[1390516737.69552000] TID[7600][DATAREADER] Unbound datareader[52e1999c.ed02cd2f.deadc0de.40a0000] for topic Example HelloWorld to writer 0.4bc6a7e.1000da84.30a0000
[1390516737.69552000] TID[7600][BIND] LOOKUP 2/_udp/1/7661/c0a8645c
[1390516737.69552000] TID[7600][BIND] UNBIND 2 1/7661/c0a8645c.0.0.0 => 4/0/52e1999c.ed02cd2f.deadc0de.0
[1390516737.69552000] TID[7600]UDP: unbind_external: 7661/c0a8645c.0.0.0 ==> dst 0/52e1999c.ed02cd2f.deadc0de.0
[1390516737.69552000] TID[7600]UDP: unbind_external: 7661/c0a8645c.0.0.0 ==> dst 0/52e1999c.ed02cd2f.deadc0de.0 ref_count = 0
[1390516737.69552000] TID[7600]UDP: unbind_external deleted record: 7661/c0a8645c.0.0.0 ==> dst 0/52e1999c.ed02cd2f.deadc0de.0
[1390516737.69552000] TID[7600]port 1/7661/c0a8645c unbound, 0 left
[1390516737.69552000] TID[7600][BIND] LOOKUP 2/_intra/1/7661/c0a8645c
[1390516737.69552000] TID[7600][BIND] LOOKUP FAILED 2/_intra/1/7661/c0a8645c
[1390516737.69552000] TID[7600][DATAREADER] DELETED ROUTE TOPIC=[Example HelloWorld] DR=[9c99e152.2fcd02ed.dec0adde.a04] DW=[0.7e6abc04.84da0010.a03]
Unmatched a publisher
.
Please correct me if I'm wrong...but from this trace, the far end (publisher) side of the link on my embedded target is consciously telling the subscriber shut the link down. For what reasons would this happen? Perhaps there is a liveliness setting defaulting to on for my PC based subscriber who isn't turned on for the embedded RTI Micro target?
I've been drilling into this problem for several days, and I'm not making any headway. I've implemented the callbacks on the subscriber (PC) side for on_liveliness_changed, deadline_missed, sample_lost, sample rejected. They are not getting hit. I study the wireshark trace from my PC Based "HelloWorld" example as opposed to my newly ported publisher on the embedded system. One thing I notice that's different....when I run the example application the "hostId" field is filled in with a non-zero value on the outbound RTSP messages from the publisher. On the newly ported embedded system, the hostId is always 0. I thought maybe this was the problem, so I dug into the micro code and found that the hostId is based on system time (seconds) which was getting called very early init...so it was 0. Adding 1 fixed the hostID at 0x1 0 0 0 (little endian.) still didn't solve my problem. After approximately 100 seconds of runtime the subscriber always unmatches the publisher.
So far, nothing else jumps out at me from the WireShark trace. My ported version of HelloWorld example behaves very close to the DeskTop version in terms of liveleness, packet sizes, etc.
more digging...
Hi,
Unmatching after 100 seconds is a clue that perhaps the application's participant liveliness lease duration is not being satisfied, as that has a default value of 100 seconds. This would mean discovery messages for DomainParticipants are not being sent or received within that duration. Specifically how that is happening is unclear; I can review your Wireshark trace if you can provide it. Also, to verify further, in your example, change discovery_plugin_properties.participant_liveliness_lease_duration to a different value (e.g. 10 seconds) and observe whether the timing of the unmatching changes accordingly.
Regards,
Edward
edward@rti.com
Thank-you very much. In fact, I had misconfigured the OSAPI_get_time_base function which returns the number of nanoseconds per OS Tick. So my embedded participant was only sending one liveliness message at the beginning of the communications.