Unmatched a publisher

5 posts / 0 new
Last post
Offline
Last seen: 8 years 11 months ago
Joined: 12/30/2013
Posts: 6
Unmatched a publisher

Dear RTI Community

For some reason using the hello world example, my subscriber unmatches with its publisher.  I'm struggling debugging this problem and I would sincerely appreciate any insights you all could give me.

My goal for the company I'm currently working is to embed the RTI Micro edition onto an RTX/TCPNet enabled embedded system.  I've ported the OSAPI and the Netio components.  I've ported the Helloworld_static_udp publisher component onto the embedded system, and I'm running the subscriber on PC running the Windows 7.


When I run in a similar configuration between two PCs, I get

 

Sample received
        msg: Hello World! (1316)
.

.

.

Forever and ever. 

 

Now when I use the embedded system as the publisher, it starts out the same for about 100 updates to the HelloWorld field.  THen I get...

Sample received
        msg: Hello World! (1318)


Sample received
        INVALID DATA
Unmatched a publisher

Some time later the subscriber  reconnents to the publisher, and the transfer begin, followed by the same error message.  What can the difference be?  I'm a novice with RTPS, so I don't know under what circumstances that a subscriber will be unmatched from a publisher.

I looked at the Wireshark trace ot both.  I don't see any thing remarkably different that jumps out at me.  So I turned up the trace on the subscriber.  Here's what I get in the case where the subscriber unmatches with the publisher:

[1390516736.134498000] TID[7600][DATAREADER] Received topic Example HelloWorld

[1390516736.134498000] TID[7600]DRI: committing entry

Sample received
    msg: Hello World! (101)


[1390516736.134498000] TID[7600]UDP: processed message
[1390516737.69552000] TID[7600][DPDE] PUBLICATION DATA RECEIVED
[1390516737.69552000] TID[7600]remove remote publication
[1390516737.69552000] TID[7600][DATAREADER] Unbind datareader[52e1999c.ed02cd2f.deadc0de.c7030000] for topic DCPSPublication to writer 0.4bc6a7e.1000da84.30a0000
[1390516737.69552000] TID[7600][DATAREADER] Unbind datareader[52e1999c.ed02cd2f.deadc0de.c7030000] for topic DCPSPublication to writer 0.4bc6a7e.1000da84.30a0000: bind does not exist
[1390516737.69552000] TID[7600][DATAREADER] Unbind datareader[52e1999c.ed02cd2f.deadc0de.c7040000] for topic DCPSSubscription to writer 0.4bc6a7e.1000da84.30a0000
[1390516737.69552000] TID[7600][DATAREADER] Unbind datareader[52e1999c.ed02cd2f.deadc0de.c7040000] for topic DCPSSubscription to writer 0.4bc6a7e.1000da84.30a0000: bind does not exist
[1390516737.69552000] TID[7600][DATAREADER] Unbind datareader[52e1999c.ed02cd2f.deadc0de.40a0000] for topic Example HelloWorld to writer 0.4bc6a7e.1000da84.30a0000

Sample received
    INVALID DATA
[1390516737.69552000] TID[7600][DATAREADER] Unbound datareader[52e1999c.ed02cd2f.deadc0de.40a0000] for topic Example HelloWorld to writer 0.4bc6a7e.1000da84.30a0000
[1390516737.69552000] TID[7600][BIND] LOOKUP 2/_udp/1/7661/c0a8645c
[1390516737.69552000] TID[7600][BIND] UNBIND 2 1/7661/c0a8645c.0.0.0 => 4/0/52e1999c.ed02cd2f.deadc0de.0
[1390516737.69552000] TID[7600]UDP: unbind_external: 7661/c0a8645c.0.0.0 ==> dst 0/52e1999c.ed02cd2f.deadc0de.0

[1390516737.69552000] TID[7600]UDP: unbind_external: 7661/c0a8645c.0.0.0 ==> dst 0/52e1999c.ed02cd2f.deadc0de.0 ref_count = 0

[1390516737.69552000] TID[7600]UDP: unbind_external deleted record: 7661/c0a8645c.0.0.0 ==> dst 0/52e1999c.ed02cd2f.deadc0de.0

[1390516737.69552000] TID[7600]port 1/7661/c0a8645c unbound, 0 left

[1390516737.69552000] TID[7600][BIND] LOOKUP 2/_intra/1/7661/c0a8645c
[1390516737.69552000] TID[7600][BIND] LOOKUP FAILED 2/_intra/1/7661/c0a8645c
[1390516737.69552000] TID[7600][DATAREADER] DELETED ROUTE TOPIC=[Example HelloWorld] DR=[9c99e152.2fcd02ed.dec0adde.a04] DW=[0.7e6abc04.84da0010.a03]
Unmatched a publisher

 

 

.

 

Offline
Last seen: 8 years 11 months ago
Joined: 12/30/2013
Posts: 6

Please correct me if I'm wrong...but from this trace, the far end (publisher) side of the link on my embedded target is consciously telling the subscriber shut the link down.  For what reasons would this happen?  Perhaps there is a liveliness setting defaulting to on for my PC based subscriber who isn't turned on for the embedded RTI Micro target?

Offline
Last seen: 8 years 11 months ago
Joined: 12/30/2013
Posts: 6

I've been drilling into this problem for several days, and I'm not making any headway.  I've implemented the callbacks on the subscriber (PC) side for on_liveliness_changed, deadline_missed, sample_lost, sample rejected.  They are not getting hit.  I study the wireshark trace from my PC Based "HelloWorld" example as opposed to my newly ported publisher on the embedded system.  One thing I notice that's different....when I run the example application the "hostId" field is filled in with a non-zero value on the outbound RTSP messages from the publisher.  On the newly ported embedded system, the hostId is always 0.  I thought maybe this was the problem, so I dug into the micro code and found that the hostId is based on system time (seconds) which was getting  called very early init...so it was 0.  Adding 1 fixed the hostID at 0x1 0 0 0 (little endian.)  still didn't solve my problem.  After approximately 100 seconds of runtime the subscriber always unmatches the publisher.

So far, nothing else  jumps out at me from the WireShark trace.  My ported version of HelloWorld example behaves very close to the DeskTop version in terms of liveleness, packet sizes, etc.

more digging...

Offline
Last seen: 4 years 10 months ago
Joined: 01/17/2013
Posts: 22

Hi,

Unmatching after 100 seconds is a clue that perhaps the application's participant liveliness lease duration is not being satisfied, as that has a default value of 100 seconds.  This would mean discovery messages for DomainParticipants are not being sent or received within that duration.  Specifically how that is happening is unclear; I can review your Wireshark trace if you can provide it.  Also, to verify further, in your example, change discovery_plugin_properties.participant_liveliness_lease_duration to a different value (e.g. 10 seconds) and observe whether the timing of the unmatching changes accordingly.

Regards,

Edward

edward@rti.com

 

 

 

Offline
Last seen: 8 years 11 months ago
Joined: 12/30/2013
Posts: 6

Thank-you very much.  In fact, I had misconfigured the OSAPI_get_time_base function which returns the number of nanoseconds per OS Tick.  So my embedded participant was only sending one liveliness message at the beginning of the communications.