Hi,
I have few questions regarding RTI DDS logs:
I’ve run the attached example, publishing 3 samples and then shutting down (gracefully) the subscriber and publisher.
In addition, attached are the raw logs + parsed ones.
1. In the beginning of the log file there is the following block which return few times (each time with different memory allocation).
RTIOsapi_envVarOrFileGet:file NDDS_DISCOVERY_PEERS not found
RTIOsapi_envVarOrFileGet:environment variable NDDS_DISCOVERY_PEERS not found
DDS_DiscoveryQosPolicy_get_default:no environment variable or file NDDS_DISCOVERY_PEERS
DDS_StringSeq_ensure_length:memory allocation: original 0, new 3
DDS_StringSeq_ensure_length:memory allocation: original 0, new 1
DDS_DiscoveryQosPolicy_get_default:value of: initial_peers="builtin.udpv4://239.255.0.1,builtin.udpv4://127.0.0.1,builtin.shmem://"
DDS_DiscoveryQosPolicy_get_default:value of: multicast_receive_addresses="builtin.udpv4://239.255.0.1"
DDS_PropertySeq_ensure_length:memory allocation: original 0, new 10
DDS_PropertySeq_ensure_length:memory allocation: original 10, new 20
a. Why does this repeats few times?
2. Regarding the following lines:
[1475504596.699991] [D0000|ENABLE]RTINetioSender_addDestination:NetioSender_Destination reused
[1475504596.700991] NDDS_Transport_UDPv4_receive_rEA:rR0000018b801 received 884 bytes from 0X201A8C0:60544
[1475504596.700991] RTINetioReceiver_receiveFast:rR0000018b801 received 884 bytes
a. What does the RTINetioSender stands for? (it is mentioned a lot in the publog.txt log file).
b. What the 884 bytes packet include (this also repeats many times in the log)? [line 239 in publog.txt].
3. Regarding the following block:
00000260 send failed:
00000261 locator:
00000262 transport: 16777216
00000263 address: 0000:0004:0002:0000:0000:0000:0000:0000
00000264 port: 7414
00000265 encapsulation:
00000266 transport_priority: 0
00000267 aliasList: ""
00000268 send failed:
00000269 locator:
00000270 transport: 16777216
00000271 address: 0000:0004:0002:0000:0000:0000:0000:0000
00000272 port: 7416
00000273 encapsulation:
00000274 transport_priority: 0
00000275 aliasList: ""
00000276 send failed:
00000277 locator:
00000278 transport: 16777216
00000279 address: 0000:0004:0002:0000:0000:0000:0000:0000
00000280 port: 7418
00000281 encapsulation:
00000282 transport_priority: 0
00000283 aliasList: ""
a. Why there an unreachable locator? There weren’t any defined DDS_DISCOVERY_PEERS nor initial_peers.
b. The above also repeats few times in the log, why is that? Does the participants tries cyclically to connect with those?
4. Regarding the following:
00000433 [1475504596.705991] RTIEventActiveDatabaseThread_loop:rDtb00018b801 sleeping {0000003d,00000000}
a. Why does the DDS event thread initiates multiple sleeps?
b. What does the information in the {} represents?
c. Is there a way to increase the even thread waking frequency? (in order to discover if there were any new received samples using the On_Data_available() method)
5. Regarding the following:
00000474 PRESPsService_assertRemoteEndpoint:TypeObject succesfully stored (topic: 'N/A', type: 'N/A')
a. Why does the topic name and type are NA? I didn’t disable the typeobject propagation in the QOS.
6. Regarding the following:
00000567 COMMENDSrWriterService_assertRemoteReader: writer oid 0x4c2 sends preemptive HB for sn (0000000000,00000001)-(0000000000,00000000)
a. What is the meaning of “sn (0000000000,00000001)-(0000000000,00000000)”. I don’t fully understand this meaning after I parsed it using rtiLogParser.
b. Is this a HB for the first sent sample? If yes, why the write operation is not in the log?
7. What does the SUB_READER and PUB_READER (+WRITER) represents? Are those related to the built-in topics as part of the discovery mechanism?
8. Regarding the following:
00956/0255 | 2016-10-03T14:23:18.982987 | ---> | 192.168.1.2 07980 1 | PARTIC_READER | Received DATA (1) from writer PARTIC_WRITER (best-effort)
00964/0259 | 2016-10-03T14:23:18.982987 | ---> | 192.168.1.2 07980 1 | PARTIC_READER | Received DATA (1) from writer PARTIC_WRITER (best-effort)
00972/0263 | 2016-10-03T14:23:18.982987 | ---> | 192.168.1.2 07980 1 | PARTIC_READER | Received DATA (1) from writer PARTIC_WRITER (best-effort)
00980/0267 | 2016-10-03T14:23:18.982987 | ---> | 192.168.1.2 07980 1 | PARTIC_READER | Received DATA (1) from writer PARTIC_WRITER (best-effort)
00990/0272 | 2016-10-03T14:23:18.982987 | ---> | 192.168.1.2 07980 1 | PARTIC_READER | Received DATA (1) from writer PARTIC_WRITER (best-effort)
a. What does the PARTIC_WRITER represents?
b. What kind of data it receives on the publisher side?
9. Regarding the following:
00001559 RTIOsapiThread_sleep: Sleep(1000 ms)
00001560 RTIOsapiThread_sleep: Sleep(1000 ms)
a. What does the RTIOSAPI_Thread stands for? for which of Connext threads does it map to?
b. Why there are so many sleeps before shutting down the participant?
10. Misc:
a. Is there more documentation regarding the logs, besides the 14.4 Debugging Discovery in the manual?
b. Regarding other example I have run:
i. Is there a way to get a more comprehensive log of the packets writing / reading stage (NDDS_Transport_UDPv4_receive_rEA) in order to investigate the actions on the OS level (that are DDS initiated) and in order to investigate the asyn_publisher actions taken place by the async_DDS_thread?
Best regards,
Gal
Attachment | Size |
---|---|
log_repro.zip | 31.91 KB |
Hi Gal,
1.a. The middleware checks the environment variable or file with name NDDS_DISCOVERY_PEERS to create the default Discovery QoS. In the start-up, there are different methods that gets the default Participant QoS and for this reason it appears several times in the logs. The memory allocation happens for an internal StringSeq to store the initial peers.
2.a. RTINetioSender is an internal module name for functions related to network operations.
2.b. The content of the 884 bytes packets are in the log lines 241 and 242. You can get this information by checking the RTI LogParser output too:
---> | H1:60544 | | Received 884 bytes
---> | H1.A1 | | Received INFO_TS packet
---> | H1.A1 | | Received DATA packet
As you can see, it has an INFO_TS and DATA packets.
3.a. Note the error is not about unreachable locator but a send failed. The transport ID (16777216) indicates that it happened for SHMEM. We can confirm this by looking at the address too. It has the format of a ShareMemory address: 0000:0004:0002:0000:0000:0000:0000:0000. This is an expected message for the ShareMemory transport and it usually happens when there aren't more applications attached to the same memory segment.
3.b. The message happens periodically probably because of the periodic participant announcements. As said previously, it is an expected behavior.
4. The event thread will sleep until the next scheduled event. This thread is not related to the transport threads so it doesn't affect to new received samples or on_data_available listener.
5. This is an issue in our logging system, we will fix it in future releases.
6.a. The first number is the lowest sample sequence number missing an ACK. The second number is the highest sample sequece already published. If a DataWriter hasn't published any sample yet, it will show (1, 0) that means samples up to 1 (but not included) have been ACKed and it published samples up to 0 (included, which is none). After sending one sample it should show (1, 1) and after receiving the ACK it will show (2, 1).
6.b. This HeartBeat tells to the DataReader that there isn't any sample written.
7. and 8. Yes, these are the internal Writers and Readers. Depending on the writer ID the DATA packets will be:
* DATA(p) for PARTIC_WRITER
* DATA(w) for PUB_WRITER
* DATA(r) for SUB_WRITER
9.a.This is the name for the internal module of threads in general.
9.b. There is a default wait period to wake up the thread that checks records for deletion. You can decrease the default value of 1 second with the QoS participant_qos/database/shutdown_cleanup_period. Note that decreasing its value, will cause the thread to wake up more often so the shutdown time is smaller but it will consume more CPU.
10.a. Apart form the Knowledge Base, we don't have documentation about the internal logs of the middleware. The goal of RTI LogParser is to make them more human-readable, so they can be used to debug issues.
10.b RTI LogParser shows two type of logs regarding Asynchronous DataWriters. The first one is "Scheduled DATA (1)" and it happens when the application calls write. In that moment the sample is scheduled in the Asynchronous Thread to be sent. The second one is "Sent DATA (1)" and it happens when the sample is actually published. In the reading side you will see messages like "Received DATA (1) from writer 00_W-K (reliable)".
Regards,
Benito
Hi Benito,
Thank you for your comprehansive explanation.
I have one question regarding 3.a:
What do you mean by "there aren't more applications attached to the same memory segment" ? Why DDS tries to send information on this addresses? and why does the attached applications has to do with it?
Best regards,
Gal
Hi Gal,
The default initial_peers are:
builtin.udpv4://H6, builtin.udpv4://H1, builtin.shmem://
. It includes the one for ShareMemory and since it doesn't specify the participant ID limit, this is 4 (more information in the Peer Descriptor format). This means, that your domain participant will try to contact with participants with index 0, 1, 2, 3 and 4 in the ShareMemory address. Since you only run a publisher and subscriber, you had participants 0 and 1. So the they couldn't send data to parcipants 2, 3 and 4 over ShareMemory. We can confirm this from the port numbers. The error specified that it couldn't contact with port numbers 7414, 7416 and 7418 and these ports correspond to participants in domain ID 0 and index 2, 3 and 4 for sending metadata (discovery) over an unicast locator (ShareMemory). To sum up, the default initial peers specifies that your participants should try to communicate with up to 5 participants in the ShareMemory transport, since you had only 2, they couldn't send discovery data to the other 3 participants. For this reason this is an expected message.I hope this clarifies the log. Let me know if you have more questions.
Regards,
Benito
Thank you for the thorough explanation Benito.
Best regards,
Gal