Confused by binary format of data in rtirecord output file

8 posts / 0 new
Last post
Offline
Last seen: 4 years 5 months ago
Joined: 03/12/2018
Posts: 32
Confused by binary format of data in rtirecord output file

Hi all,

I have a test scenario where I sent some DDS data from an application and recorded it with the rtirecord application version 5.3.0 on 64 bit linux.  

Now, on a Windows PC, I'm trying to analyze the data in the output file from rtirecord to check the contents of the DDS messages that I sent.  I'd like to put the data into MATLAB to plot some features. 

I need a little help understanding what the data format is like under the hood when I pull it out of the database, or an alternative way of getting the data into matlab.

I'm not a database expert, and I'm only lightly familiar with python.  I don't have a database server installed. 

I have python 3.6.  Importing sqlite3, I was able to get the python sql layer to connect to the database, and I was able to examine the rows of data.  I focused on the column 'rti_serialized_sample', as this is where the data was serialized as a bytestream.  I tried pulling this into python variables by parsing the binary data.

I have sequences (with maximum lengths) in my data, as well as other longs and 'long long's scattered among them.

If I pull the binary data out and stare at it for a while, I can see a few features about the data:

  • a sequence has an integer at the beginning, containing the length. 
  • There are also four bytes at the beginning that are not part of my payload, and therefore must be something RTI has put there.
  • 'long's and 'long long's both seem to be saved as 8 bytes (which is not expected).

My point is that since I'm guessing all of this stuff, I must be missing something... either a document showing support for these formats, or a different workflow for extracting data from the database file that is supported.

As it turns out, I can parse a sequence and allocate the fields to python variables as expected for a bit, but at the end of the first sequence, my data becomes confused and corrupted.  I must have misinterpreted the length of some or other field (like, I think I am getting the length field of the next sequence mixed in with the bytes for the last field of the previous sequence, or something).

The data rate is about 50 MB/s, and the file I have from a few seconds of data is 400 MB, so I don't relish the idea of using rtirecconv to convert it to xml, if that's a suggestion.  However, I will do what is necessary.

Offline
Last seen: 11 months 1 week ago
Joined: 02/11/2016
Posts: 144

hello aaron,

The expected way to use it is to use the record convert utility.

In rti launcher you can find it under Utilities.

For matlab I would recommend csv format although other formats may be useful as well.

 

Good luck,

Roy.

Offline
Last seen: 4 years 5 months ago
Joined: 03/12/2018
Posts: 32

Thank you Roy.  This was helpful to know.  I gave it a try, as you suggested.

Unfortunately,  the converter's output appears to be broken.   Could I get a patch, or tips on how to configure the converter to produce a usable CSV?

  • I think it was intended that comma-separated sample values would be on each line, with a timestamp first.  However, The timestamp of each sample is followed by a newline.  The next line is a comma followed by the data.  So, every other line is a timestamp without a comma, and the other lines are commas followed by the rest of the data.  How do I pull this into any CSV reader?  Anything that parses CSV uses newlines to delineate records (I'm currently fighting with xlsread in matlab)
  • the file is too big to be edited by any editor I know of (notepad, notepad++, microsoft excel), so I can't fix this with a  search-and-replace (I would replace the sequence "comma-newline" with just a "comma").  FYI I'm waiting for my sysadmin people to approve ultraedit, which handles large files better.
  • The first thing in the first row of the file is the table name.  This messes up column alignment of column name/column data.  However, this is asthetic only, so I don't consider it a major issue... just inconsistent.

Could I get a fixed version of the conversion tool that omits the newline after the timestamp so the tool will produce usable output?

Here is an excerpt of what I see.  I invite you to try reading this into any tool that processes CSVs.  Yes, Excell will "display" it, but you will be hard-pressed to plot or compute with any columns (yes the timestamps are off; please ignore).  

$ head convertedTopic-_RECEIVER_STATUS_0_RecordAll_domain0.csv | cat -n
 1 Table Name: RECEIVER_STATUS_0$RecordAll$domain0, TTimestamp,timeOfValidity.seconds,timeOfValidity.nanoseconds,state,obitStatus,gainState,decimationFactor,centerFrequency,generatorState,lastCommandTime.seconds,lastCommandTime.nanoseconds,lastCommandedState
 2 Mon Feb 26 00:13:55 2018
 3 ,1523960649,290905600,1,1,0,0,0.0000000000,2,0,0,1
 4 Mon Feb 26 00:13:56 2018
 5 ,1523960650,291604736,1,1,0,0,0.0000000000,2,0,0,1
 6 Mon Feb 26 00:13:57 2018
 7 ,1523960651,292283648,1,1,0,0,0.0000000000,2,0,0,1
 8 Mon Feb 26 00:13:58 2018
 9 ,1523960652,292964608,1,1,0,0,0.0000000000,2,0,0,1
 10 Mon Feb 26 00:13:59 2018

 

 

Offline
Last seen: 4 years 5 months ago
Joined: 03/12/2018
Posts: 32

As a followup, I was able to get matlab to erase the blank lines by searching for the nulls and removing them from matlab's data  (on the small data set...)

However, on the large dataset, matlab's xlsread and csvread fail.  This seems logical, since I think they use the same algorithm that Excel uses, and Excel prints a warningwhen I try to read the files, saying it could not read the whole file.

Therefore, the only way I can do this is by parsing the internal format of the database file.

I understand that you may have support issues... no one likes to release internal documentation subject to change.  However, I really need to verify my application.

 

Offline
Last seen: 11 months 1 week ago
Joined: 02/11/2016
Posts: 144

Hey,

I'll start with a question: "However, I really need to verify my application" - What do you mean exactly by this statement?

Recorder allows you to analyze, after the fact, the behavior of the distributed system (assuming you view your various applications as part of one big distributed system).

It is limited in some ways you've mentioned (less than ideal behavior around null values, outputs that are too large to process, possible crashes for large inputs).

What I can recommend as workarounds:

1. avoid null values

2. split your conversions to smaller timeframes and then process the partial csvs in matlab (or what ever you wish to use)

 

This is assuming you want to be able to do "anything" with the data after the fact (and less so when you know exactly what you would like to test before the fact).

If you do know what you'd like to test for, I would recommend other methodologies, specifically log and event monitoring (or in more simple scenarios, unit tests / system tests / integration tests).

Personally my project went through the process of implementing our own "recorder" which identifies the writers in the environment, creates matching readers, and stores the data in a mongoDB.

We can then export the data from mongo.

The issues you've mentioned didn't disappear because we chose to implement but we were able to work around them using code (and also we were able to prevent some strange behaviors that can be experienced in an environment that has potentially very old applications).

It could take a bit of time to implement your own "solution" so whether or not I recommend to do so depends on your specific use-case.

 

In fact, looking back to your original post (50MB/s), if you are interested in verifying all of that all the time I would definitely not recommend using the recorder (or any "recorder" per se).

In this case I would need to know a bit more about the architecture before I give a concrete suggestion but something along the lines of having services dedicated to analyzing your data flows in real-time and notifying you when something happens.

 

Good luck,

Roy.

Offline
Last seen: 4 years 5 months ago
Joined: 03/12/2018
Posts: 32

This is a signal processing application, but I am running test data through it to verify my application and/or DDS's role in it.  The test data contains known/predicted values for verification purposes (imagine a ramp function), along with my own timestamps as my data was processed at various points in the application. 

What I desire to do is capture my data in matlab so I can plot the ramp values (or rather, the deltas) to identify places where my application might have dropped data (before DDS even enters the picture, I do a lot of data shuffling, and my data source waits for no one).  I would also like to plot the differences in my timestamps to help identify timing bottlenecks.

The good news is that I was finally able to decipher the internal structure of the sql database.  There are some shenanigans in there, but I played along until the data made sense, and am now able to convert my binary data to something I can work on in matlab. (until the recorder changes underneath me, I guess)

Thanks for you help.

Offline
Last seen: 11 months 1 week ago
Joined: 02/11/2016
Posts: 144

Hey Aaron,

I'm glad to hear you've worked out a solution that works for you.

I think, how ever, that what you want (to identify timing bottlenecks) would work better if you take a more real-time approach (that is, monitoring timing bottlenecks, and not simply running a test and then analyzing it).

Unless you have some way of guaranteeing that your test covers all of the behaviours that your system may exhibit, in which case by all means use this approach.

Good luck,

Roy.

sara's picture
Offline
Last seen: 3 weeks 2 days ago
Joined: 01/16/2013
Posts: 128

HI Aaron,

I know you have this solved by now, but you could capture the information from DDS directly in Matlab by using the booklet Mathworks created: https://es.mathworks.com/hardware-support/rti-dds.html

I hope this also helps.
Sara