Instance Resources, Dispose, and Unregister

8 posts / 0 new
Last post
rose's picture
Offline
Last seen: 3 years 4 months ago
Joined: 08/22/2011
Posts: 148
Instance Resources, Dispose, and Unregister

Hello All, 

I think it is a good idea to have an understanding of how instances work in RTI DDS.  So a few questions:

 

  1. What is the difference in meaning between a dispose and an unregister?
  2. When are instance resources cleaned up (or released to be reused) – during a dispose or an unregister, and why?
  3. What does this mean, given that with default QoS a DataWriter automatically disposes upon unregister?
  4. How can I avoid hitting a max_instances resource limit?
Thanks,
Rose

 

Gerardo Pardo's picture
Offline
Last seen: 3 weeks 1 day ago
Joined: 06/02/2010
Posts: 602

These are good questions that keep coming back so we are going to try to provide a comprehensive answer. Be warned that the issue is a bit involved and requires some background so it will not be short...

Question: What is the difference in meaning between a dispose and an unregister?

The basic concept is that "unregister" indicates that the DDS DataWriter has no further information on the data-object. Whereas a "dispose" is a statement by the DDS DataWriter that the data-object no longer exists. As I will explain later what this means really depends on the application and the meaning the developers attach to the data-objects of the Topic itself.

The DDS DataWriter "dispose" operation on an instance is directly associated with the DDS DataReader observing the instance transition to the NOT_ALIVE_DISPOSED instance state.  The exception is when you have a Topic with OWNERSHIP QoS set to EXCLUSIVE and the DataWriter doing the "dispose" is not the owner for that instance (based on OWNERSHIP_STRENGTH).  

The DDS DataWriter "unregister" operation on an instance is associated with the DDS DataReader seeing the instance transition to the NOT_ALIVE_NO_WRITERS instance state. But the association is less direct.  The transition to NOT_ALIVE_NO_WRITERS will only occur if that DataWriter was the only one known (by the DataReader) to be writing the instance. Moreover a DataReader can also see an instance transition to NOT_ALIVE_NO_WRITERS if it loses connectivity to all the known writers of the instance.

Section 7.1.3.23.3 in the DDS specification (version 2.2) explains this in a more formal way. I quote below:

The DataWriter operation dispose is semantically different from unregister_instance. The dispose operation indicates that the data-instance no longer exists (e.g., a track that has disappeared, a simulation entity that has been destroyed, a record entry that has been deleted, etc.) whereas the unregister_instance operation indicates that the writer is no longer taking responsibility for updating the value of the instance.

Deleting a DataWriter is equivalent to unregistering all the instances it was writing, but is not the same as “disposing” all the instances.

For a Topic with EXCLUSIVE OWNERSHIP if the current owner of an instance disposes it, the readers accessing the instance will see the instance_state as being “DISPOSED” and not see the values being written by the weaker writer (even after the stronger one has disposed the instance). This is because the DataWriter that owns the instance is saying that the instance no longer exists (e.g., the master of the database is saying that a record has been deleted) and thus the readers should see it as such.

For a Topic with EXCLUSIVE OWNERSHIP if the current owner of an instance unregisters it, then it will relinquish ownership of the instance and thus the readers may see the value updated by another writer (which will then become the owner). This is because the owner said that it no longer will be providing values for the instance and thus another writer can take ownership and provide those values.

Because of this semantic difference the DataWriter and DataReader are generally able to reclaim all resources for associated with "unregistered" instances. However reclaiming  all resources (e.g. the memory needed to remember the Key) of disposed instances is only possible under more limited set circumstances. More on this later.

How these events are used and reacted to is to some extent application/domain dependent in that it depends on what the "data object" really represents in the real world.

It is easier to explain the practical use in the context of an example. Take a surveillance application where there is a set of systems, radars, cameras, etc. are used to monitor and keeping track of the location of objects. Assume the location of these objects are being published into the Topic "ObjectLocation" and that the associated data-type contains an "object identifier" assigned by the system to differentiate each object, which is used as the DDS Topic Key as well as additional attributes such as the location, description, etc.

In this situation a DataWriter that was tracking an object but can no longer observe its location will simply "unregister" the instance. With this action the DataWriter is telling the rest of the system "I do not know anymore what happened to that object."  It might have moved beyond the range I can observe. It might have merged with another object and no longer be observable (as a person entering a car), split into multiple objects (as a train separating into multiple wagons), etc.  The Object may be observable to other DataWriters in which case the DataReader will not get any notifications. However it this was the only DataWriter that was observing the object,  the DataReader will be notified of the "unregistration"  by seeing the instance transition to NOT_ALIVE_NO_WRITERS.  Upon observing this transition, the DataReader should assume the object still exists but its location is unknown to the DataWriters.  For this reason the DataReader should be prepared to have the object re-apear at a latter point in time.

The same situation occurs if the DataReader loses connectivity with the DataWriter that was tracking the object.  In term of instance management as far as the DataReader is concerned this is as if the DataWriter "unregistered" all the instances it was tracking. So the same observations as before apply.

On the other hand if the DataWriter detects that object ceases to exist. For example what was thought to be an object was really a false reading, or it was an aggregation of objects that when split have created separate objects causing the original entity to not exist, etc. Then it would use the "dispose" operation. The DataReader being notified of this will see the instance transition to NOT_ALIVE_DISPOSED instance state and it shall assume the object has ceased to exist and will not be seen again.

As mentioned earlier whether these two situations should be treated differently by DataWriters and DataReaders depends on the application itself and the Topic.  The middleware just provides there mechanisms to distinguish these scenarios and it is up to the application to take advantage of them for their own purposes.

 

Question: When are instance resources cleaned up (or released to be reused) – during a dispose or an unregister, and why?

The managent of middleware resources depends in part on the middleware implementation. So I will be describing here what RTI Connext DDS does. However this implementation is very much directed by the observable behavior specified in the DDS specification so I would not be too surprised if other middleware follows similar patterns.  The description here applies to RTI Connext DDS versions 4.4, 4.5, and 5.0. 

Given that both the DataWriter and DataReader keep their own state on insances I will discuss each of them separately.

1. Reclaiming of the instance state kept by the DataWriter

By default, the DataWriter resources associated with an instance (e.g. the space needed to remember the Instance Key or KeyHash) are released lazily. This means the resources are only reclaimed when the space is needed for another instance because max_instances is exceeded.  

With the default configuration RTI Connext DDS only releases resources associated with instances that meet the following two conditions:

  • The instance has been unregistered by the DataWriter
  • All the samples associated with the instance have been Acknowledged by the known active reliable DataReaders

Note that this means that with the default QoS settings RTI Connext DDS DataWriters do not release resources of instances that have been "disposed" but are still registered.  The reason is that there are various scenarios under which "forgetting diposed instances" could lead to inconsistent or erroneous outcomes. For example:

Scenario 1: With OWNERSHIP Qos Policy EXCLUSIVE and DURABILITY Qos Policy TRANSIENT_LOCAL, removing all the DataWriter state associated with disposed (but still registered)  instances would prevent the DataWriter from maintaining Ownership of the instance in the presence of late-joining DataReaders.  For example existing DataReaders would see the instance and "disposed" (NOT_ALIVE_DISPOSED),  but late-joining DataReaders could potentially receive prior updates on the instance from other, lower strength, DataWriters and not the "disposed" message from the DataWriter (as it has purged the state required to send this message). Consequently different DataReaders (the ones that were present when the instance was disposed and the ones that appeared later) would see the instance on a different state.

Scenario 2: With OWNERSHIP Qos Policy SHARED, DURABILITY Qos Policy TRANSIENT_LOCAL, and DESTINATION_ORDER Qos Policy BT_SOURCE_TIMESTAMP, removing all the DataWriter state associated with disposed (but still registered)  instances could lead to situations in which a late-joiner DataReader does not get notified about the most recent state of an existing instance.  For example the DataWriter could have updated and then disposed the instance. The late-joiner DataReader would not get any information on the instance from the DataWriter that disposed the instance and might instead get information with earlier source timestamp from other DataWriters that did not dispose the instance.  This compromises the eventual consistency provided by the destination order by SOURCE timestamp.  For example, in a database replication scenario where DDS notifications of NOT_ALIVE_DISPOSED instances are mapped into DELETE operation in a database table, a late-joiner DataReader could not receive an instance dispose notification from a DataWriter and never remote that record.

Maintaining state on instances that are disposed but not unregistered consumes resources, which, depending on the system, could become an issue. For example a system that continually creates and disposes instances and never unregisters them could eventually exceed configured resource limits, such as, the maximum number of instances on the DataWriter.  To enable robust operations in these scenarios RTI Connext DDS supports additional QoS policy settings on the DataWriter's DATA_WRITER_RESOURCE_LIMITS Qos Policy and WRITER_DATA_LIFECYCLE Qos Policy. These settings ensure robust operations under the above circumstances and also avoid unnecessary use of resources for DataWriters with DURABILITY Qos Policy VOLATILE.  The relevant policy settings are:

I will describe each setting separately.

1.1 Setting of the WriterResourceLimitsQoSPolicy "instance_replacement"

The "instance_replacement" setting controls which instances can have their associated resources reclaimed. It can take five values: UNREGISTER_INSTANCE_REPLACEMENT (this is the default),  ALIVE_INSTANCE_REPLACEMENT, DISPOSE_INSTANCE_REPLACEMENT,  ALIVE_THEN_DISPOSE_INSTANCE_REPLACEMENT, DISPOSE_THEN_ALIVE_INSTANCE_REPLACEMENT.  

In the explanation I will refer to some instances as being "fully acknowledged": An instance is said to be "fully acknowledged" if all samples for that instance have been Acknowledged by all the matched active reliable DataReaders.

The settings of the "instance_replacement"  are as follows:

instance_replacement  settingResulting behavior
UNREGISTER_INSTANCE_REPLACEMENT  (the default)A DataWriter will only reclaim instances that are both unregistered and "fully acknowledged".
DISPOSE_INSTANCE_REPLACEMENTf the DataWriter does not have any unregistered instances that are "fully acknowledged", then the DataWriter can also reclaim "fully acknowledged" instances that have been disposed.
ALIVE_INSTANCE_REPLACEMENTIf the DataWriter does not have any "fully acknowledged" instances that have been unregistered, then the DataWriter can also reclaim "fully acknowledged" instances that are still alive (i.e. not disposed and not unregistered).
DISPOSE_THEN_ALIVE_INSTANCE_REPLACEMENTIf  the DataWriter does not have any "fully acknowledged" instances that have been unregistered, then the DataWriter can also reclaim "fully acknowledged" instances that have been disposed.  If no such instances exist, then the DataWriter can additionally reclaim "fully acknowledged" instances that are alive.
ALIVE_THEN_DISPOSE_INSTANCE_REPLACEMENTIf  the DataWriter does not have any "fully acknowledged" instances that have been unregistered, then the DataWriter can also reclaim "fully acknowledged" instances that are alive (i.e. have not been disposed or unregistered).  If no such instances exist, then the DataWriter can additionally reclaim "fully acknowledged" instances that have been disposed.

 

1.2 Setting of the WriterResourceLimitsQoSPolicy "replace_empty_instances"

The "replace_empty_instances" setting can be used to prioritize the removal of instances that have no samples.

  • If  "replace_empty_instances" is set to FALSE (the default setting), then the selection of an instance to replace proceeds as specified by the "instance_replacement" setting.
  • If  "replace_empty_instances"  is set to TRUE, then the DataWriter will first try to remove instances that have no samples, and only if it cannot find such instances will it then select instances using the criteria selected by the  "instance_replacement" setting. 

Note there a are several ways in which a DataWriter could end up having instances with no samples.  Here I provide a couple of examples. It will be explained in more detail in a separate note. 

  • Lifespan: If the  DataWriter is configured with a LIFESPAN QoS Policy that has a finite duration.  In this situation samples will be removed after their lifespan expires and instances that are not written during that duration will end up with no samples.
  • Max Samples.  If the DataWriter is configured with RESOURCE_LIMITS where max_samples < max_instances. In this case the max_samples resource limits would be reached before reaching the max_instances limit, forcing some samples to be removed and leaving instances without samples.

 

1.3 Setting of the WriterResourceLimitsQoSPolicy "autopurge_unregistered_instances_delay"

The previous settings controlled which instances would have their resources reclaimed in the event that the max_instances resource limit is reached.  As long as this limit is not reached no instance resources would be reclaimed.

The problem with this approach is that if the max_instances is set to UNLIMITED then instance resources would never be reclaimed and in applications that continually create, dispose, and unregister instances this could lead to unbounded resource consumption.  Moreover, even in situations where the DataWriter is configured with a limited value for max_instances, this limit can be very large and it might be desirable to reclaim resources proactively before the resource limit is reached.

These are the scenarios are addressed by the "autopurge_unregistered_instances_delay" setting. 

If the "autopurge_unregistered_instances_delay" is set to INFINITE (this is the default). The instance resources are only purged when the resource limit max_instances is exceeded. As mentioned this can be problematic for systems with UNLIMITED max_instances.

If the  "autopurge_unregistered_instances_delay" is set to a finite time duration, RTI Connext DDS will reclaim resources of instances that are both "fully acknowledged" and have been unregistered, once the specified delay has elapsed since the instance was  unregistered.

2. Reclaiming the instance state kept by the DataReader:

By default, the resources a DataReader maintains on a particular instance are released when the following two conditions are met:

  • The instance has no known DataWriters that are writing it.  This occurs when all the DataWriters that were known (by the DataReader) to write the instance have either unregistered the instance or have left the system (so they are no longer matched with the DataReader). Note that the instance could be on a NOT_ALIVE_NO_WRITERS instance_state or a NOT_ALIVE_DISPOSED, depending on whether the instance was disposed prior to losing all the DataWriters.
  • There are no samples on the DataReader for the instance.  This situation occurs when the DataReader calls 'take' on all the samples for that instance and then calls "return_loan". Or else when the samples expire due to an elapsed finite LIFESPAN.

Note that samples that have been accessed via the DataReader 'read' operation remain in the DataReader cache so the associated instances will not have their resources reclaimed.

RTI DDS DataReaders do not release resources of instances in the NOT_ALIVE_DISPOSED instance_state.  Recall that a  DDS DataReaders instance will be in the  NOT_ALIVE_DISPOSED instance_state when the DataWriter that owns the instance disposes it (in the case the DataReader has OWNERSHIP QoS set to EXCLUSIVE), or else when any DataWriter disposes the instance (in the case the DataReader has OWNERSHIP QoS set to SHARED).

The reason RTI DDS DataReaders do not release resources of instances in the NOT_ALIVE_DISPOSED state is that there various scenarios under which this could lead to inconsistent or erroneous outcomes. For example:

Scenario 1: With EXCLUSIVE ownership, removing the resources associated with the instance would forget which DataWriter "owns" the instance and if a new DataWriter which lower strength wrote the instance the update would be incorrectly accepted.

Scenario 2: With SHARED ownership and destination order by SOURCE timestamp,  removing the resources associated with the instance would forget the source timestamp when the deletion occurs and if a different DataWriter where to write the instance with an earlier timestamp the update would be incorrectly accepted.

Note that absent additional mechanisms reclaiming all resources associated with with an instance could lead to erroneous behavior, even if the instance is in the NOT_ALIVE_NO_WRITERS state.  For example:

Scenario 3: With SHARED ownership and destination order by SOURCE timestamp, some of the DataReaders (but not all) may lose connectivity with the DataWriters publishing an instance 'I'.  These DataReaders would conclude the Instance "I" is in the "NOT_ALIVE_NO_WRITERS" state and could therefore remove the state associated with the instance 'I' when this happens the DataReader will forget the source timestamp when the instance was last updated 'I_SourceTimeStamp'. However other DataReaders that do not lose connectivity will not remove the state associated with the instance state and do remember the source timestamp if the last update.  If a DataWriter updates the instance with a source timestamp earlier than the 'I_SourceTimeStamp', the  DataReader that had purged the state will incorrectly accept that update for the instance.

Scenario 4: If a DataReader receiving samples from an orDataWriter and a Persistence Service loses connectivity, the instances received by the DataReader will move to NOT_ALIVE_NO_WRITERS state. If the DataReader purges all the state associated with these instances, after the connectivity is recovered the DataReader may receive duplicate samples. For example:

  1. DataReader receives Sample 'n' for Instance 'I' coming from the original DataWriter
  2. DataReader loses connectivity with original DataWriter and  the Persistence Service
  3. DataReader reclaims all resources associated with Instance 'I'
  4. DataReader recovers connectivity with original DataWriter and the Persistence Service
  5. DataReader receives Sample 'n' for Instance 'I' coming from the Persistence Service
  6. DataReader provides sample to the application since it does not remember that it already received Sample 'n' from the original DataWriter

To overcome these potential scenarios RTI Connext DDS introduced in version 4.4 an additional mechanism based on the concept of Virtual Sample Identity (Virtual Source Global Unique Identifier [GUID] and Virtual Sequence Number). With this enhancement the DataReader can preserve a minimum amount of state per instance consisting on the last source timestamp and the last Virtual Sequence Number received from a virtual GUID. This state is maintained even if the state associated with the instance is removed.   This behavior is configurable using the "max_total_instances" setting in the DataReader DATA_READER_RESOURCE_LIMITS QoS Policy

Note that the DATA_READER_RESOURCE_LIMITS QoS Policy "max_total_instances" must be explicitly enabled. The default out-of-the-box setting of the READER_RESOURCE_LIMITS  does not keep that minimum state.

Question:  What does this mean, given that with default QoS a DataWriter automatically disposes upon unregister?

This default setting is done for convenience as specified by the DDS specificaton because of the common case that an instance is written by a single DataWriter and the lifespan of an instance is tied to the presence of the DataWriter that writer it.

The rational is explained in section 2.1.3.21 WRITER_DATA_LIFECYCLE of the DDS Specification (version 2.2)

Note that the deletion of a DataWriter automatically unregisters all data-instances it manages (Section 2.1.2.4.1.6, ”delete_datawriter” ). Therefore the setting of the autodispose_unregistered_instances flag will determine whether instances are ultimately disposed when the DataWriter is deleted either directly by means of the Publisher::delete_datawriter operation or indirectly as a consequence of calling delete_contained_entities on the Publisher or the DomainParticipant that contains the DataWriter.

Question:  How can I avoid hitting a max_instances resource limit?

On the DataWriter you can avoid hitting the resource limits unregistering instances before reaching the "max_instances" and setting a finite value for the WRITER_DATA_LIFECYCLE "autopurge_unregistered_instances_delay". In the extreme case setting autopurge_unregistered_instances_delay to zero will cause the instance resources to be reclaimed as soon as all the samples have been acknowledged by all the known active DataReaders.

We are also considering enhancements that could provide additional proactive mechanisms for example,  adding an API to the DataWriter that allows an application to remove an instance (given its handle) from the DataWriter queue

On the DataReader instances are removed when there are no remote DataWriters publishing the instance, and there are no outstanding samples for the instance.  So in order to not hit the resource limits the DataWriters in the system must be unregistering the instances before the limit is reached. There is no way from the DataReader to ensure or enforce this.

We are also considering enhancements that could provide additional proactive mechanism on the DataReader for example, adding a configuration setting that allows to purge NOT_ALIVE_DISPOSE instances that have no outstanding samples, or adding an API to the DataReader that allows to remove an instance (given its handle) from the DataReader cache.  Notice that removing instances that are not in NOT_ALIVE_NO_WRITERS state may impact application correctness as described above.

Gerardo and Fernando

Offline
Last seen: 10 years 3 weeks ago
Joined: 01/29/2013
Posts: 4

Hi Gerdardo and Fernando

That is a long story, thanks for the explanation.

Would it be possible to summarize what a user needs to do to avoid memory growth in the case of the following situation: a DataWriter writes a new instance (let's say using a sequence number as its key and let's say a new instance every second), and disposes every instance after a certain period of time (let's say one minute). From your article, I understand now that invoking dispose() only is not god enough and will result in leaking memory -- the DataWriter needs to do unregister_instance() as well. Is that sufficient? How about on the DataReader side?

Thank you,

Reinier

fercs77's picture
Offline
Last seen: 2 months 2 weeks ago
Joined: 01/15/2011
Posts: 30

 Current Behavior

DataWriter

The user will have to call unregister and set the QoS value writer_qos.writer_data_lifecycle.autopurge_unregistered_instances_delay to a finite value. If the user wants to purge fully acknowledged unregistered instances immediately he should set autopurge_unregistered_instances_delay to 0.

DataReader

On the DataReader, instances get automatically removed when the following two conditions are met:

* The instance has no known DataWriters that are writing it
* There are no samples on the DataReader for the instance

(see detailed description on previous post from Gerardo)

In our existing implementation there is no other way to purge instances from the DataReader queue

RFE

There is already an RFE filed to provide additional ways to purge instances from the DataWriter and DataReader queue (CORE-5450). This is a summary of the proposal described in the RFE:

DataWriter

Add a new Qos Policy value into DDS_WriterDataLifecycleQosPolicy called autopurge_dispose_instances_delay. This parameter will configure the maximum duration for which a DataWriter will maintain information regarding one instance once it has been disposed and all its samples have been ACKed by all the live DataReaders.

DataReader

Add two new Qos policy values into DDS_ReaderDataLifecycleQosPolicy called autopurge_dispose_instances_delay and autopurge_nowriter_instances_delay. These parameters are equivalent to the ones defined in the DataWriter.

In addition, we should add a new operation to the DataReader API that allows removing an instance independently of its state.

Regards,

    Fernando

Offline
Last seen: 11 years 2 months ago
Joined: 08/29/2013
Posts: 2

Hello all!


What is the proper way of configuring WriterResourceLimitsQoSPolicy "instance_replacement" using XML?

 

thanks, Peter

Gerardo Pardo's picture
Offline
Last seen: 3 weeks 1 day ago
Joined: 06/02/2010
Posts: 602

Hi Peter,

Are you asking what the syntax for configuring the DataWriterResourceLimitsQoSPolicy is when using the XML file to configure QoS? Or are you asking what is the proper value to set it to?

The syntax in the XML file is like for any other QoS. It follows the same member names as the C/C++/Java/C# member types. For example this snippet placed within a <qos_profile> tag will set the DataWriter QoS specifying the DataWriterResourceLimits:

 
         <datawriter_qos>
               <writer_resource_limits>
                        <instance_replacement>DISPOSED_THEN_ALIVE_INSTANCE_REPLACEMENT</instance_replacement>
                </writer_resource_limits>
         </datawriter_qos>

The best way to do this is to use an XSD-aware XML editor, such as VisualStudio on windows or Eclipse with the WebTookit and also make sure the xsi:noNamespaceSchemaLocation that appears within the <dds> tag is pointing to the absolute path to the ${NDDSHOME}/resource/qos_profiles_5.0.0/schema/rti_dds_qos_profiles.xsd file (replacing ${NDDSHOME} with the correct one in your system) . With this setup the editor will give you assistance and auto-complete the tags and constant values.

If your question was about the correct values to set it to, then there is no "one answer fits all" it really depends on your application, the semantics it attaches to the disposition and no-writer events on the data-instances and what you are willing to give up when you run out of resources.

Gerardo

 

Offline
Last seen: 11 years 2 months ago
Joined: 08/29/2013
Posts: 2

Many thanks Gerardo! That's exactly what I needed!

Also good to know about auto-complete schema, I will configure this for sure!

 

thanks again,

Peter

Offline
Last seen: 7 years 2 months ago
Joined: 02/06/2017
Posts: 8

Hi everybody

My use case is...i already have a legacy system with whom i talk on udp....so i m receiving separate delete message for instances of streaming data....whereas in my DDS system i have to unregister that instance as soon as i receive delete......bt i m not able to unregister a instance with a particular key.......and i m aware of the number of instances that i m going to receive from legacy system....do i need to create so many instances and instance handle for that...