NX-Series Processors - Frequent Crashes During IP Client Processing

Reese Jacobs · December 2016

I am in the process of upgrading some systems from NI to NX processors. I have run into multiple instances where an NX-1200 processor (running latest firmware - 1.5.68) will reboot during the middle of performing IP client operations. The latest reboot occurs when an IP connection has been made to an RSS weather server and the HTTP response from the server is being processed. It occurs at the same point in time and is very repeatable. This is not the first time I have seen the NX reboot during network operations - I initially did some testing with the old i!-EquipmentMonitor module sending email via SMTP and the processor would reboot when sending an email message. I have since converted to the internal SMTP functions so that problem went away.

I am curious if anyone else has run into similar problems with NX reboots. I do not have *any* devices connected to the processor - no EXB devices, no touch panels, no AXLINK, no ICSLan, etc. This is very disconcerting and does not give me a lot of confidence in the NX upgrade. The same code runs on an NI-3100 without any problems. I have several NX-4200 processors that I have yet to test so I guess that is the next step unless someone has other recommendations.

ericmedley · December 2016

I have been doing quite a bit of IP control/communication on NX series processors of all shapes and sizes and have encountered no problems. I suppose it might be wise to have a look at the code. The usual warnings apply about porting NI code to NX. You need to get rid of of any DEFINE_PROGRAM code and move to up to Define_event. This can be a challenge if you're using any middles that have define_program code in them. Also, (and I have seen this in some code I've gotten that was written by someone a while back) where they were trying to manage the asynchronous nature of IP comms by using waits. While on an old NI it kinda worked. But on the much faster NX the same code failed. If you want to post some of the code we can have a look at it.

Reese Jacobs · December 2016

Eric - nothing in DEFINE_PROGRAM, everything is event driven. Also, no waits in the code. I have avoided waits as a general rule due to possible timing issues. The code is pretty simple -- just wish there were some additional tools or logs that could help pinpoint the problem. I will keep plugging away and find a subset of the program that I can forward to AMX and see if they can re-create the crash/reboot and hopefully shed some light. Thanks.

ericmedley · December 2016

Reese Jacobs wrote: »

Eric - nothing in DEFINE_PROGRAM, everything is event driven. Also, no waits in the code. I have avoided waits as a general rule due to possible timing issues. The code is pretty simple -- just wish there were some additional tools or logs that could help pinpoint the problem. I will keep plugging away and find a subset of the program that I can forward to AMX and see if they can re-create the crash/reboot and hopefully shed some light. Thanks.

Tools: yeah no kidding. At the Developers Conference we've requested the addition of being able to see IP communications in the diagnostics. What we've been told is that it is very difficult to pull off since IP communications happens in a layer outside the layers of Netlinx. There is no way to port it through for whatever reason. This is from the Firmware engineers. Haveing said that, I usually just have to do a send_string 0 for smaller messages and for bigger messages, I create a large global buffer to hold strings. Then I copy/paste out of debug into a text doc to read.

Also it's a good idea to telnet into the master and turn messages on. IP communications messages do happen there and while in no way comprehensive, the do often provide helpful clues. MSG ON ALL.

MLaletas · December 2016

I've done a bunch of IP comms on the NX series processors and have not seen this issue once. I have seen some code programmed by others that do not manage their IP connections properly and it has done some funky things but nothing that I have created. Sounds like you are doing all the right things with driving everything from events so very curious to what the problem is.

fogled@mizzou · December 2016

Can confirm I've got NX controllers running a hundred or more simultaneous outbound and inbound IP connections, without a problem. I HAVE had to send a couple new NX controllers back for repair for memory card replacements though; the problems OP describes are similar to what I was seeing on the NX controllers I had to send in for repair.

Reese Jacobs · December 2016

Well - spent a few days trying to narrow down the problem. I now have a 500 line program that opens 2 TLS connections and 3 standard IP connections to common servers like Yahoo, Fox, CNN, etc. and simply does an HTTP GET / and then confirms the response. The program successfully makes all 5 connections, sends the GET request, and receives the HTTP response from the server. Shortly after the last response, the Netlinx NX-1200 master reboots. This happens every time you run the program so it is 100% repeatable at least on my master. The common theme to my original program which led me to trying to isolate the problem with a test program was the use of a TLS connection. If I remove the one TLS connection from the original program, it works fine. With the TLS connection, the program crashes every time although interestingly it could be several minutes after the TLS connection was made and data successfully retrieved. The delay between the timing of the TLS connection and the closing of the TLS connection (with successful data retrieval) led me to look to other explanations before I circled back to TLS as the possible culprit. Since all of the code being tested works perfectly fine on an NI-3100 (of course there is no TLS on the NI-3100 so that code used standard IP), the TLS code was always a wildcard.

It may turn out this is a hardware problem but I doubt it seriously. It seems to be related to the new TLS functions. I am not using TLS certificate validation nor have I introduced any TLS/SSL certificates onto the master. The master is running standard 1.5.68 firmware. If anyone is interested in giving the test program a spin, I can send you the AXS file so you can examine the code, compile it, and see if your NX handles it any better. I have several NX-4200 processors here so I guess the next step is to try them.

I will separately try to send the test file to AMX Tech Support but since I am only an independent programmer and not a dealer, I am not sure how they will treat the request. Thanks to all who posted for the additional data points.

sling100 · December 2016

There is an issue with TLS connections which I reported to TS a week or two back. I believe that there is a hotfix imminent for it, so you might find that this is about to be sorted.

Simon

Reese Jacobs · December 2016

Simon - thanks for the heads-up. I heard back from Tech Support this morning and they have a copy of my test program that will crash the NX-1200 master every time it is run. I am fairly certain it is a TLS connection issue and probably the same one you have reported. I will post if I hear back from Tech Support on the issue and let me know if you hear something directly. Thanks.

Reese Jacobs · December 2016

Just heard from AMX Tech Support that they were able to re-create the problem using the test program and all of the information has been forwarded to engineering. As Simon noted, there are some issues with TLS connections that crash the master. Hopefully AMX engineering will get it sorted out soon.

a_riot42 · December 2016

I found a bug the other day that would crash my NX1200. I was building a system and had lots of index errors, but one of them made the system reboot. It turned out that it was from executing an index of 0 into an array that was in a switch guard. IE:

var = 0
switch([array[var]])

Any chance your parsing code has a similar construct?
Paul

Reese Jacobs · December 2016

No, the source of my problem is definitely the use of TLS connections which is a new feature to the NX series in firmware 1.5.68. Simon posted that he had seen similar issues regarding the use of TLS and AMX Engineering has several problem reports in progress with respect to the TLS feature. Regarding your problem, I would expect Netlinx to throw an exception and provide a diagnostic alerting you to an index of 0 error condition and not for the master to reboot. In a perfect world, since as programmers we have limited tools to debug the cause of a master reboot (logging ....), I would hope a master reboot caused by a programming error to be extremely rare. The most frustrating problem to solve is when the master hangs or reboots particularly when the root cause is network related because of the limited tools for problem analysis and debugging.

NX-Series Processors - Frequent Crashes During IP Client Processing

Comments