It isn't whining. It is fact. How about this side of the coin:
With source, I could actually debug and find the problem and not have to call tech support to begin with.
The only time I called tech support about a module issue, they couldn't do anything for me but possibly offer me an NDA so I could fix it myself. No ETA for a fix - that was the only way. I could care less about tech support.
I don't understand. If you want to fix it yourself, why not sign an NDA to be able to fix it yourself?
Paul
Having to deal with a module and spend a few hours installing it, using it and then determining it's broke, then calling TS just to have them say, "well we can't/won't/don't have time to fix it" but if you sign a NDA we'll release the code so you can fix it yourself is pretty much a slap in the face of reason. So, now a day or two later (job on hold), True fixes it but the AMX version still doesn't work? That's effecient and logical. And to sign an NDA for a Yamaha receiver whose protocol is readily available to anyone who wants it. What's the NDA for? To keep out of the hands of C restron who might copy the parsing routine and modify it to work with their system. Other than some equipment specific to commercial type jobs what could C restron possibly want that they don't already have that isn't already working better? Heck we don't usually want to spend the time to re-write an exisitng AMX module and we do AMX.
Sounds like they have no desire to support standard modules anymore if they're telling True to fix it himself. So if they don't want to support them, don't. Go through their list and see which ones they had to sign an NDA for with a particular manufacturer and then release/open all the others. Seems they're too paranoid. Must be all that cold war goverment work. The secret, hush, hush, stuff. ??
Again AMX could keep an archived original copy, have a point of contact so when a broke standard module is fixed the fix can be reviewed by AMX or by forum members for that matter and then the broke archived copy can be replaced. We're a free resource that AMX could/should tap, plus we have a vested interest in good working modules while their 9-5 programmers don't.
If they are willing to give you the code with an NDA that is a lot more helpful than most companies. If you find a legitimate bug in an AMX module and report it, at least AMX will try and fix it. Have you reported the bugs you have found?
The problem I think is that many programmers will report a bug in a module that is really due to programmer error, wiring, networking or some other issue, and blame the module. If AMX had to track down every claimed bug in every module, they wouldn't have much time to do much else. I have found though, that if many people call in about the same issue in a module they often provide a hotfix pretty quickly. While there is always room for improvement, the only time in 7 years that I can recall an AMX module holding a project up was when I had to use an early Polycom duet module instantiated twice in a project that didn't play nicely together. TS was very helpful in tracking down the issue and a hotfix was provided the next day that worked. I believe it has since been fixed but I was very impressed with the way it was handled all things considered.
Paul
I've never called about a module so I don't know first hand what they would do. I do think however the focus has shifted form standard mods to DUET so I don't think they want to continue working in the past, obviously they would for the right dealers/products. Plus in this current economic climate I would think resources would be a bit tight as they try to maintain their existance. Things may not be that bad but they definitely can't be good epescially with iPads taking a good chunk of their business away. Granted their other business that sells the app is probably doing ok.
Can anyone tell me how the last two++ pages has anything to do with code optimisation?
Come on guys, whine somewhere else will you?
Since when have most threads stayed on topic. Besides this was a perfectly harmless drift discussing serious problems with AMX modules and I don't think there was much whining until your post. Just folks stating facts, opinions and constructive ideas. If you don't like it that's your right and it's also your right to agree, disagree or describe member's posts as whining. Of course those whining posts have some relevance and purpose all be they slighty off the OP topic while complaining about whining is just more whining with out relevance just an obvious purpose. Notice the 3 smileys, so don't take too much offense. I'm just vining, whining about your whining about whining started by vining. Of course I can't really take credit for the drift but it rhymed and I am a guilty participant. Now that's off topic! Peace.
Um.. Is there any. In any forum anywhere in the world?
Sorry that I forgot mine! (fixed with credit)
You should use a wink not a tongue smiley.
Back on topic, in my own experience with AMX modules (especially the Intercom module) I've come to believe they are all **** and that I shouldn't spend too much time on them, especially not if it's for a device where the module is easily written.
The only module I have used that works well is the iPort module.
I also believe the modules are often far from optimal and written in the shortest timespan possible.
Writing a few lines myself helps remove the clutter.
I.e. the stuff that I don't need that's still in it to support the full device.
I know that my own code is often far from optimal, due to my incomplete knowledge and due to the timespan I have to do it in.
I do think however that they should spend more time on their modules and release their source to AMX Programmers, we can learn things from those modules and change/remove things we don't need or want differently.
That will help us optimize those modules as well, they can keep the original module in their archives and we can post our versions in the modpedia.
A different kind of optimization:
- Pick programming guidelines (The common netlinx initiative is a good one!), so your programs are similar.
- Keep revisions of your software.
- Look back at your old code and optimize & use it in new projects. (This way you ensure you write re-usable and useful code)
- Ask for help here when you need it, don't be afraid to post snippets of your code, we all learn something new every day. You can only learn from your mistakes.
Back on topic, in my own experience with AMX modules
Technically that's the "off topic" subject. The OP subject was "Optimize your code" which dealt with techniques to make your code more efficient. AMX modules and the pro/con experiences is the drift.
Technically that's the "off topic" subject. The OP subject was "Optimize your code" which dealt with techniques to make your code more efficient. AMX modules and the pro/con experiences is the drift.
This is true. Perhaps the moderator can move this thread to another thread. I propose the title.
AMX Modules: The old bone AMX programmers love to gnaw.
Technically that's the "off topic" subject. The OP subject was "Optimize your code" which dealt with techniques to make your code more efficient. AMX modules and the pro/con experiences is the drift.
OK, so I just finished reading this thread and I realize there is a bit of a controversy as to when Mainline is run.
Is it only run when there is stuff it it. Or is it run when any messages are received that it doesn't know what to do with?
With that in mind I still have a question.
One of the metric's I've used to see how healthy my program is to count the cycles through mainline each second. A blank NI4100 comes in around 15K and as you add code that number goes down. I'm noticed problems when that number gets close to 200.
So I've minimized what is in mainline. I think I'm doing pretty well at keeping everything happening in events only when needed.
My question is this. Once the processor starts up and the event table is populated shouldn't nothing be happening on the processor as long as nothing is in mainline?
I have a project that has 10 iports in it. And regardless of whether they are hooked up, it seems like the processor is running slowly. In the iport (non-duet) UI module that I'm using there is only one signal check in mainline and I have no idea what's in the comm module.
As I comment out different instances of this module and re-check my cycle metric the numbers don't make sense. If I take them all out then I have ~12K cycles/sec. Once I put in my first COMM it drops to about 6K and then the next one takes like 500 of the cycles.
I feel like there is something I don't understand about how the events are being processed. I can watch the notification on all devices all notification types and the system is quiet.
With all iPort modules in place I'm down at the 200 cycles/sec value and I'm expecting trouble on boot up.
I am looking forward to using the device holdoff feature to help mitigate some of the strange startup behavior but I'd like to know if my processor is siting around spinning on code somewhere and being hogged up.
Are there any more ideas out there for pinpointing where the program is spending its time?
There is a continuously running loop in the processor that runs no matter what. This is what is, in most programming language, referred to as mainline. In AMX programming, it checks hardware interrupts (button presses, relay states, etc.), it checks event tables, it checks WAITs and it runs whatever is in DEFINE_PROGRAM. Most of the time, what people are referring to in these forums when they say "mainline" is really DEFINE_PROGRAM. And, of course, if it's empty, nothing happens at that point. If the event tables are empty, nothing happens there either, but hardware events are still processed ... it's just that if you are looking for an event in an empty table, your search is pretty quick. But the processor is always looping, even if nothing much happens each loop. There are internal matters to deal with, timers and timing, checking IP ports and the web server virtual device. There is still a lot going on, even in an empty program.
Add anything Duet, and a whole bunch more happens, even if nothing else is going on. The Java interpreter is always running, and checking for Java events. It's also always updating itself with the native AMX programming, and making sure they all play together nice. This part is the job of the SNAPI router (as I understand it). But again, even if nothing is actually triggering an event, it is still doing something. In processor time, it doesn't amount to anything significant, but it is there.
Your iPort modules are doing a lot in that comm module. They all have their own event tables, and they all have their own "DEFINE_PROGRAM" section, which is almost certainly being used. They are all continuously checking the hardware on the ports they are assigned to, looking for responses from the iPorts, and they are likely continuously polling the same ports to either make sure nothing is happening, or to process what is happening. And, being that they are not something you have any control of, there is no guarantee whatsoever they are optimized for efficiency in any way (and, if they were written by AMX, they likely are designed more to be complete ... every feature possible ... than efficient).
There is only so much you can do to streamline the running of code. On top of all the above, NetLinx itself is an interpreted language, and there is some overhead to that as well. And this is true of almost any modern, high-order, programming language: resources (memory, processor time) are cheap, and they are profligate with them.
The iport module I'm using isn't a Duet module. But I am using a couple of Duet Autopatch modules.
Does anyone out there have an opinion on the effectiveness of counting the number of times through 'Define Program' per second.
I use a wait that reports the previous counter and clears the counter. Actually now that I write this I realize I don't exactly understand what is going on. Below is what I'm doing and it seems to print out one statement per second.
DEFINE_PROGRAM
CYCLE_COUNT++;
WAIT 10
{
SEND_STRING 0,"'CYCLES THROUGH MAINLINE IN LAST SECOND: ',ITOA(CYCLE_COUNT)";
CYCLE_COUNT = 0;
}
Why doesn't this schedule a new wait every time it runs through define program?
CYCLE_COUNT++;
WAIT 10
{
SEND_STRING 0,"'CYCLES THROUGH MAINLINE IN LAST SECOND: ',ITOA(CYCLE_COUNT)";
CYCLE_COUNT = 0;
}
Why doesn't this schedule a new wait every time it runs through define program?
Because you can't create multiple instances of the same wait. The first time DEFINE_PROGRAM runs, it creates an instance of the wait, and starts the timer. The second time DEFINE_PROGRAM runs, it knows there is already an instance of that wait active, so it does nothing. Etc. After the wait fires, there is no longer an instance of that wait on the table, so it is eligible to be created again the next time your DEFINE_PROGRAM runs. Each WAIT in your program gets a unique space in memory whether you give it a name or not. This happens at compile time, not run time, which is why you can't use stack_vars inside of waits.
I use a wait that reports the previous counter and clears the counter. Actually now that I write this I realize I don't exactly understand what is going on. Below is what I'm doing and it seems to print out one statement per second.
DEFINE_PROGRAM
CYCLE_COUNT++;
WAIT 10
{
SEND_STRING 0,"'CYCLES THROUGH MAINLINE IN LAST SECOND: ',ITOA(CYCLE_COUNT)";
CYCLE_COUNT = 0;
}
Why doesn't this schedule a new wait every time it runs through define program?
There can only be 1 instance of a wait active at a time. If you name the wait, you can cancel it and other things. If you don't name the wait, the processor/compiler names it for you and tracks it that way. This is a feature of the wait and knowing how to leverage this can save a lot of code writing
As for the benchmarking question, I am hesitant to believe that a simple counter variable in the define_program section is going to be overly helpful in a running system. The addition of this code causes the processor to perform numerous unnecessary operations and (in my mind) has the potential to slow down the overall performance of the system in a way that negates the benefits of knowing that the rest of the system is running slow.
One approach I have taken is optimizing bits and pieces of code. If I think that a function I write is going to be a little processor intensive and I can come up with more than one way to accomplish the task, I will run each method through a benchmark test (posted somewhere on the forums) and then pick the one that I feel is best. In theory, if all of the code you are running is as efficient as possible and it is necessary, then you shouldn't have to worry about code running slower over time (this is not a computer that surfs the internet and acquires "additional code" to run ). The best benchmark is: When I push this button, does it respond in a fashion that makes me think it is working properly? If you are experiencing lag on the UI, then it is time to explore splitting your code among multiple processors.
That makes sense about the waits. I think I knew that at some point and then forgot.
And I'm sure there are areas I could improve the optimization of my code. But I don't really notice lag when I'm using the system under normal circumstances. The only problem I've encountered is that occasionally on bootup the system totally freaks out and it will get 'stuck' updating a channel on another processor repeatedly (like 50 times a second telling a single channel to go on).
I've realized that the closer the number CYCLE_COUNT gets to 200 the more likely this is to happen.
A couple of things that I'm going to try when I'm back on site are to use the 'device holdoff on' commend. I'm also going to space out some of the data population functions I have located in various define_start sections.
I would love to have some kind of indication of how busy a processor is. My understanding is that CPU Usage is kind of a funny metric also. It judges how many times it goes through mainline and has stuff to do.....or something like that?
One suggestion I have is to move your startup code out of the define_start section and into the online section of a virtual device. This coupled with device holdoff on and spacing the commands a little should clear up most of the issues.
I would love to have some kind of indication of how busy a processor is. My understanding is that CPU Usage is kind of a funny metric also. It judges how many times it goes through mainline and has stuff to do.....or something like that?
You've got the 'spy' function as well, but I don't think that's very reliable either.
CPU Usage metrics are widely misunderstood. From our engineer:
In case you’re not already familiar with this: The CPU USAGE command (from TELNET) only reports a valid number when the system is started with the “reboot heap watch” command in TELNET. Otherwise it will always give the same numbers which are a snap shot taken at boot time. This WATCH mode is turned off at next normal reboot.
The numbers are obviously going to vary based on what your system is currently doing. If you run the CPU USAGE command several time you’ll see the numbers fluctuate. Having a max of 99% is not uncommon. An average of 60% would be pretty normal on a busy system.
Here is what my test processor happened to be at when I looked: (System was handling some traffic but not much.)
CPU usage = 9.71% (30 sec. average = 51.85%, 30 sec. max = 99.96%)
Here are my numbers during a reboot:
CPU usage = 96.61% (30 sec. average = 21.10%, 30 sec. max = 96.61%)
CPU Usage metrics are widely misunderstood. From our engineer:
In case you’re not already familiar with this: The CPU USAGE command (from TELNET) only reports a valid number when the system is started with the “reboot heap watch” command in TELNET. Otherwise it will always give the same numbers which are a snap shot taken at boot time. This WATCH mode is turned off at next normal reboot.
The numbers are obviously going to vary based on what your system is currently doing. If you run the CPU USAGE command several time you’ll see the numbers fluctuate. Having a max of 99% is not uncommon. An average of 60% would be pretty normal on a busy system.
Here is what my test processor happened to be at when I looked: (System was handling some traffic but not much.)
CPU usage = 9.71% (30 sec. average = 51.85%, 30 sec. max = 99.96%)
Here are my numbers during a reboot:
CPU usage = 96.61% (30 sec. average = 21.10%, 30 sec. max = 96.61%)
G'day John,
What your engineer is saying is correct but out-dated. He is quoting firmware v3.41.414 which was essentially incorrect anyway. You did NOT have to reboot with "reboot heap watch" you just had to wait for the task to start after running CPU Usage for the first time. You can test this yourself by loading .414 firmware.
Since v3.41.422 the CPU Usage task runs at boot and also gives more information (the average and max statistics).
Well....
Here's what I get at total system idle if all I do is send the command anytime after a normal boot.
Welcome to NetLinx v3.50.430 Copyright AMX LLC 2008
>cpu usage
CPU usage = 99.99% (30 sec. average = 85.94%, 30 sec. max = 99.99%)
That number never, ever changes. Ever. No matter what is or isn't happening.
That can't be right.
However, if I reboot with the command
>reboot heap watch
the netlinx replies:
"Rebooting in heap watch mode..."
After the boot with that command, the numbers make sense. For example
>cpu usage
CPU usage = 12.92% (30 sec. average = 12.61%, 30 sec. max = 13.94%)
But odder still, after a while, I go back to that same device, and the results are back to "stuck" at a high number.
cpu usage
CPU usage = 98.76% (30 sec. average = 18.15%, 30 sec. max = 98.76%)
And thereafter, this number will not change ever, ever. Which can't be right.
Why would that command have an effect if it weren't intended to be used? And why does it work only for a while? Or does it work at all, would I get the same temporary readings without it? The research continues.
Ask 10 people inside AMX, get 10 answers, apparently.
CPU usage = 99.99% (30 sec. average = 85.94%, 30 sec. max = 99.99%)
That number never, ever changes.
That can't be right.
Yes, it can be right, if you have enough code in mainline (Define_Program) or a very big system in which case you will have to wait much longer (up to 5 minutes) for the master to settle.
Try this, if you will.
1. load empty code into the master.
2. change firmware to v3.41.414
3. reboot the master.
4. connect to the master via telnet ASAP
5. run 'cpu usage' ASAP then again every 10 seconds for 1 minute
6. repeat 1-5 after running 'reboot heap watch'
7. post results of both
8. repeat all of the above with firmware >= v3.41.422 (to be fair you should reboot once after the firmware dumps to eliminate house keeping from the stats)
My Results
Welcome to NetLinx v3.41.414 Copyright AMX LLC 2008
>cpu usage
Initializing CPU Thread (one time only). Please wait 30 seconds...
CPU usage = 0.00%
CPU usage = 57.02%
CPU usage = 87.72%
CPU usage = 85.70%
CPU usage = 88.23%
CPU usage = 67.70%
CPU usage = 53.87%
CPU usage = 0.00%
>reboot heap watch
Rebooting in heap watch mode...
Welcome to NetLinx v3.41.414 Copyright AMX LLC 2008
>cpu usage
Initializing CPU Thread (one time only). Please wait 30 seconds...
CPU usage = 0.00%
CPU usage = 58.67%
CPU usage = 77.93%
CPU usage = 83.44%
CPU usage = 83.46%
CPU usage = 79.76%
CPU usage = 64.81%
CPU usage = 54.24%
CPU usage = 0.00%
Welcome to NetLinx v3.41.422 Copyright AMX LLC 2008
>cpu usage
CPU usage = 53.64% (30 sec. average = 11.48%, 30 sec. max = 82.62%)
CPU usage = 66.23% (30 sec. average = 41.36%, 30 sec. max = 93.38%)
CPU usage = 92.22% (30 sec. average = 65.59%, 30 sec. max = 95.53%)
CPU usage = 92.88% (30 sec. average = 81.56%, 30 sec. max = 95.53%)
CPU usage = 6.62% (30 sec. average = 79.61%, 30 sec. max = 95.53%)
CPU usage = 6.29% (30 sec. average = 37.23%, 30 sec. max = 95.53%)
CPU usage = 6.29% (30 sec. average = 22.06%, 30 sec. max = 95.20%)
>reboot heap watch
Rebooting in heap watch mode...
Welcome to NetLinx v3.41.422 Copyright AMX LLC 2008
>cpu usage
CPU usage = 63.35% (30 sec. average = 11.40%, 30 sec. max = 76.95%)
CPU usage = 94.20% (30 sec. average = 39.04%, 30 sec. max = 94.20%)
CPU usage = 95.52% (30 sec. average = 65.60%, 30 sec. max = 95.85%)
CPU usage = 85.24% (30 sec. average = 81.80%, 30 sec. max = 95.85%)
CPU usage = 25.21% (30 sec. average = 85.26%, 30 sec. max = 95.85%)
CPU usage = 7.13% (30 sec. average = 56.38%, 30 sec. max = 95.52%)
CPU usage = 6.47% (30 sec. average = 30.53%, 30 sec. max = 95.52%)
Welcome to NetLinx v3.50.439 Copyright AMX LLC 2008
>cpu usage
CPU usage = 82.56% (30 sec. average = 16.73%, 30 sec. max = 82.56%)
CPU usage = 93.36% (30 sec. average = 47.21%, 30 sec. max = 94.02%)
CPU usage = 69.10% (30 sec. average = 77.36%, 30 sec. max = 94.35%)
CPU usage = 85.71% (30 sec. average = 87.07%, 30 sec. max = 96.01%)
CPU usage = 7.31% (30 sec. average = 68.31%, 30 sec. max = 96.01%)
CPU usage = 6.98% (30 sec. average = 42.62%, 30 sec. max = 96.01%)
CPU usage = 7.81% (30 sec. average = 19.77%, 30 sec. max = 94.35%)
reboot heap watch
Rebooting in heap watch mode...
Welcome to NetLinx v3.50.439 Copyright AMX LLC 2008
>cpu usage
CPU usage = 63.85% (30 sec. average = 9.64%, 30 sec. max = 82.92%)
CPU usage = 92.70% (30 sec. average = 27.56%, 30 sec. max = 92.70%)
CPU usage = 93.03% (30 sec. average = 53.31%, 30 sec. max = 94.03%)
CPU usage = 91.54% (30 sec. average = 74.99%, 30 sec. max = 96.35%)
CPU usage = 96.19% (30 sec. average = 85.62%, 30 sec. max = 96.35%)
CPU usage = 6.14% (30 sec. average = 82.71%, 30 sec. max = 96.35%)
CPU usage = 6.14% (30 sec. average = 62.40%, 30 sec. max = 96.35%)
OK, rebooted again without the HEAP command, and once again, at least for a while, I have responses:
cpu usage
CPU usage = 12.37% (30 sec. average = 12.56%, 30 sec. max = 13.70%)
Then later, STUCK:
>cpu usage
CPU usage = 98.69% (30 sec. average = 18.02%, 30 sec. max = 98.69%)
forever.
Huh.
Curiously - are you in a M2M setup? I recently had a job where the CPU usage was like that, > 98% and I found this alarming. I started removing bits of code and eventually - it fell down to 18%, and it happened to have been a module. I forget exactly what was happening in the module that made it go up that high (I wrote the module), but it was indeed the cause. While the system was extremely responsive, never hiccuped and worked very well - this 98% CPU was annoying, and made me fear that if something did go wrong what might happen.
Point is - your system's CPU usage might very well be running at 98% and you just don't know it. I'd suggest removing bits and pieces and see if anything changes.
Comments
With source, I could actually debug and find the problem and not have to call tech support to begin with.
The only time I called tech support about a module issue, they couldn't do anything for me but possibly offer me an NDA so I could fix it myself. No ETA for a fix - that was the only way. I could care less about tech support.
I don't understand. If you want to fix it yourself, why not sign an NDA to be able to fix it yourself?
Paul
Sounds like they have no desire to support standard modules anymore if they're telling True to fix it himself. So if they don't want to support them, don't. Go through their list and see which ones they had to sign an NDA for with a particular manufacturer and then release/open all the others. Seems they're too paranoid. Must be all that cold war goverment work. The secret, hush, hush, stuff. ??
Again AMX could keep an archived original copy, have a point of contact so when a broke standard module is fixed the fix can be reviewed by AMX or by forum members for that matter and then the broke archived copy can be replaced. We're a free resource that AMX could/should tap, plus we have a vested interest in good working modules while their 9-5 programmers don't.
The problem I think is that many programmers will report a bug in a module that is really due to programmer error, wiring, networking or some other issue, and blame the module. If AMX had to track down every claimed bug in every module, they wouldn't have much time to do much else. I have found though, that if many people call in about the same issue in a module they often provide a hotfix pretty quickly. While there is always room for improvement, the only time in 7 years that I can recall an AMX module holding a project up was when I had to use an early Polycom duet module instantiated twice in a project that didn't play nicely together. TS was very helpful in tracking down the issue and a hotfix was provided the next day that worked. I believe it has since been fixed but I was very impressed with the way it was handled all things considered.
Paul
Come on guys, whine somewhere else will you?
Sorry that I forgot mine! (fixed with credit)
You should use a wink not a tongue smiley.
Back on topic, in my own experience with AMX modules (especially the Intercom module) I've come to believe they are all **** and that I shouldn't spend too much time on them, especially not if it's for a device where the module is easily written.
The only module I have used that works well is the iPort module.
I also believe the modules are often far from optimal and written in the shortest timespan possible.
Writing a few lines myself helps remove the clutter.
I.e. the stuff that I don't need that's still in it to support the full device.
I know that my own code is often far from optimal, due to my incomplete knowledge and due to the timespan I have to do it in.
I do think however that they should spend more time on their modules and release their source to AMX Programmers, we can learn things from those modules and change/remove things we don't need or want differently.
That will help us optimize those modules as well, they can keep the original module in their archives and we can post our versions in the modpedia.
A different kind of optimization:
- Pick programming guidelines (The common netlinx initiative is a good one!), so your programs are similar.
- Keep revisions of your software.
- Look back at your old code and optimize & use it in new projects. (This way you ensure you write re-usable and useful code)
- Ask for help here when you need it, don't be afraid to post snippets of your code, we all learn something new every day. You can only learn from your mistakes.
This is true. Perhaps the moderator can move this thread to another thread. I propose the title.
AMX Modules: The old bone AMX programmers love to gnaw.
Use a wink!!
Sssssh, don't go telling on me!!
@eric,
AMX Modules: The old bone AMX Programmers love to gnaw and/or bury. (as that's what I end up doing often :P)
No, that's what I intended.
OK, so I just finished reading this thread and I realize there is a bit of a controversy as to when Mainline is run.
Is it only run when there is stuff it it. Or is it run when any messages are received that it doesn't know what to do with?
With that in mind I still have a question.
One of the metric's I've used to see how healthy my program is to count the cycles through mainline each second. A blank NI4100 comes in around 15K and as you add code that number goes down. I'm noticed problems when that number gets close to 200.
So I've minimized what is in mainline. I think I'm doing pretty well at keeping everything happening in events only when needed.
My question is this. Once the processor starts up and the event table is populated shouldn't nothing be happening on the processor as long as nothing is in mainline?
I have a project that has 10 iports in it. And regardless of whether they are hooked up, it seems like the processor is running slowly. In the iport (non-duet) UI module that I'm using there is only one signal check in mainline and I have no idea what's in the comm module.
As I comment out different instances of this module and re-check my cycle metric the numbers don't make sense. If I take them all out then I have ~12K cycles/sec. Once I put in my first COMM it drops to about 6K and then the next one takes like 500 of the cycles.
I feel like there is something I don't understand about how the events are being processed. I can watch the notification on all devices all notification types and the system is quiet.
With all iPort modules in place I'm down at the 200 cycles/sec value and I'm expecting trouble on boot up.
I am looking forward to using the device holdoff feature to help mitigate some of the strange startup behavior but I'd like to know if my processor is siting around spinning on code somewhere and being hogged up.
Are there any more ideas out there for pinpointing where the program is spending its time?
Thanks,
Jimi
Add anything Duet, and a whole bunch more happens, even if nothing else is going on. The Java interpreter is always running, and checking for Java events. It's also always updating itself with the native AMX programming, and making sure they all play together nice. This part is the job of the SNAPI router (as I understand it). But again, even if nothing is actually triggering an event, it is still doing something. In processor time, it doesn't amount to anything significant, but it is there.
Your iPort modules are doing a lot in that comm module. They all have their own event tables, and they all have their own "DEFINE_PROGRAM" section, which is almost certainly being used. They are all continuously checking the hardware on the ports they are assigned to, looking for responses from the iPorts, and they are likely continuously polling the same ports to either make sure nothing is happening, or to process what is happening. And, being that they are not something you have any control of, there is no guarantee whatsoever they are optimized for efficiency in any way (and, if they were written by AMX, they likely are designed more to be complete ... every feature possible ... than efficient).
There is only so much you can do to streamline the running of code. On top of all the above, NetLinx itself is an interpreted language, and there is some overhead to that as well. And this is true of almost any modern, high-order, programming language: resources (memory, processor time) are cheap, and they are profligate with them.
Does anyone out there have an opinion on the effectiveness of counting the number of times through 'Define Program' per second.
I use a wait that reports the previous counter and clears the counter. Actually now that I write this I realize I don't exactly understand what is going on. Below is what I'm doing and it seems to print out one statement per second.
DEFINE_PROGRAM
CYCLE_COUNT++;
WAIT 10
{
SEND_STRING 0,"'CYCLES THROUGH MAINLINE IN LAST SECOND: ',ITOA(CYCLE_COUNT)";
CYCLE_COUNT = 0;
}
Why doesn't this schedule a new wait every time it runs through define program?
There can only be 1 instance of a wait active at a time. If you name the wait, you can cancel it and other things. If you don't name the wait, the processor/compiler names it for you and tracks it that way. This is a feature of the wait and knowing how to leverage this can save a lot of code writing
As for the benchmarking question, I am hesitant to believe that a simple counter variable in the define_program section is going to be overly helpful in a running system. The addition of this code causes the processor to perform numerous unnecessary operations and (in my mind) has the potential to slow down the overall performance of the system in a way that negates the benefits of knowing that the rest of the system is running slow.
One approach I have taken is optimizing bits and pieces of code. If I think that a function I write is going to be a little processor intensive and I can come up with more than one way to accomplish the task, I will run each method through a benchmark test (posted somewhere on the forums) and then pick the one that I feel is best. In theory, if all of the code you are running is as efficient as possible and it is necessary, then you shouldn't have to worry about code running slower over time (this is not a computer that surfs the internet and acquires "additional code" to run ). The best benchmark is: When I push this button, does it respond in a fashion that makes me think it is working properly? If you are experiencing lag on the UI, then it is time to explore splitting your code among multiple processors.
Jeff
And I'm sure there are areas I could improve the optimization of my code. But I don't really notice lag when I'm using the system under normal circumstances. The only problem I've encountered is that occasionally on bootup the system totally freaks out and it will get 'stuck' updating a channel on another processor repeatedly (like 50 times a second telling a single channel to go on).
I've realized that the closer the number CYCLE_COUNT gets to 200 the more likely this is to happen.
A couple of things that I'm going to try when I'm back on site are to use the 'device holdoff on' commend. I'm also going to space out some of the data population functions I have located in various define_start sections.
I would love to have some kind of indication of how busy a processor is. My understanding is that CPU Usage is kind of a funny metric also. It judges how many times it goes through mainline and has stuff to do.....or something like that?
Jeff
You've got the 'spy' function as well, but I don't think that's very reliable either.
In case you’re not already familiar with this: The CPU USAGE command (from TELNET) only reports a valid number when the system is started with the “reboot heap watch” command in TELNET. Otherwise it will always give the same numbers which are a snap shot taken at boot time. This WATCH mode is turned off at next normal reboot.
The numbers are obviously going to vary based on what your system is currently doing. If you run the CPU USAGE command several time you’ll see the numbers fluctuate. Having a max of 99% is not uncommon. An average of 60% would be pretty normal on a busy system.
Here is what my test processor happened to be at when I looked: (System was handling some traffic but not much.)
CPU usage = 9.71% (30 sec. average = 51.85%, 30 sec. max = 99.96%)
Here are my numbers during a reboot:
CPU usage = 96.61% (30 sec. average = 21.10%, 30 sec. max = 96.61%)
G'day John,
What your engineer is saying is correct but out-dated. He is quoting firmware v3.41.414 which was essentially incorrect anyway. You did NOT have to reboot with "reboot heap watch" you just had to wait for the task to start after running CPU Usage for the first time. You can test this yourself by loading .414 firmware.
Since v3.41.422 the CPU Usage task runs at boot and also gives more information (the average and max statistics).
Cheers
Mush
Here's what I get at total system idle if all I do is send the command anytime after a normal boot.
Welcome to NetLinx v3.50.430 Copyright AMX LLC 2008
>cpu usage
CPU usage = 99.99% (30 sec. average = 85.94%, 30 sec. max = 99.99%)
That number never, ever changes. Ever. No matter what is or isn't happening.
That can't be right.
However, if I reboot with the command
>reboot heap watch
the netlinx replies:
"Rebooting in heap watch mode..."
After the boot with that command, the numbers make sense. For example
>cpu usage
CPU usage = 12.92% (30 sec. average = 12.61%, 30 sec. max = 13.94%)
But odder still, after a while, I go back to that same device, and the results are back to "stuck" at a high number.
cpu usage
CPU usage = 98.76% (30 sec. average = 18.15%, 30 sec. max = 98.76%)
And thereafter, this number will not change ever, ever. Which can't be right.
Why would that command have an effect if it weren't intended to be used? And why does it work only for a while? Or does it work at all, would I get the same temporary readings without it? The research continues.
Ask 10 people inside AMX, get 10 answers, apparently.
cpu usage
CPU usage = 12.37% (30 sec. average = 12.56%, 30 sec. max = 13.70%)
Then later, STUCK:
>cpu usage
CPU usage = 98.69% (30 sec. average = 18.02%, 30 sec. max = 98.69%)
forever.
Huh.
Yes, it can be right, if you have enough code in mainline (Define_Program) or a very big system in which case you will have to wait much longer (up to 5 minutes) for the master to settle.
Very good question, I have no answer.
____________________________________________________________________
Try this, if you will.
1. load empty code into the master.
2. change firmware to v3.41.414
3. reboot the master.
4. connect to the master via telnet ASAP
5. run 'cpu usage' ASAP then again every 10 seconds for 1 minute
6. repeat 1-5 after running 'reboot heap watch'
7. post results of both
8. repeat all of the above with firmware >= v3.41.422 (to be fair you should reboot once after the firmware dumps to eliminate house keeping from the stats)
My Results
Welcome to NetLinx v3.41.414 Copyright AMX LLC 2008
>cpu usage
Initializing CPU Thread (one time only). Please wait 30 seconds...
CPU usage = 0.00%
CPU usage = 57.02%
CPU usage = 87.72%
CPU usage = 85.70%
CPU usage = 88.23%
CPU usage = 67.70%
CPU usage = 53.87%
CPU usage = 0.00%
>reboot heap watch
Rebooting in heap watch mode...
Welcome to NetLinx v3.41.414 Copyright AMX LLC 2008
>cpu usage
Initializing CPU Thread (one time only). Please wait 30 seconds...
CPU usage = 0.00%
CPU usage = 58.67%
CPU usage = 77.93%
CPU usage = 83.44%
CPU usage = 83.46%
CPU usage = 79.76%
CPU usage = 64.81%
CPU usage = 54.24%
CPU usage = 0.00%
____________________________________________________________________
Welcome to NetLinx v3.41.422 Copyright AMX LLC 2008
>cpu usage
CPU usage = 53.64% (30 sec. average = 11.48%, 30 sec. max = 82.62%)
CPU usage = 66.23% (30 sec. average = 41.36%, 30 sec. max = 93.38%)
CPU usage = 92.22% (30 sec. average = 65.59%, 30 sec. max = 95.53%)
CPU usage = 92.88% (30 sec. average = 81.56%, 30 sec. max = 95.53%)
CPU usage = 6.62% (30 sec. average = 79.61%, 30 sec. max = 95.53%)
CPU usage = 6.29% (30 sec. average = 37.23%, 30 sec. max = 95.53%)
CPU usage = 6.29% (30 sec. average = 22.06%, 30 sec. max = 95.20%)
>reboot heap watch
Rebooting in heap watch mode...
Welcome to NetLinx v3.41.422 Copyright AMX LLC 2008
>cpu usage
CPU usage = 63.35% (30 sec. average = 11.40%, 30 sec. max = 76.95%)
CPU usage = 94.20% (30 sec. average = 39.04%, 30 sec. max = 94.20%)
CPU usage = 95.52% (30 sec. average = 65.60%, 30 sec. max = 95.85%)
CPU usage = 85.24% (30 sec. average = 81.80%, 30 sec. max = 95.85%)
CPU usage = 25.21% (30 sec. average = 85.26%, 30 sec. max = 95.85%)
CPU usage = 7.13% (30 sec. average = 56.38%, 30 sec. max = 95.52%)
CPU usage = 6.47% (30 sec. average = 30.53%, 30 sec. max = 95.52%)
____________________________________________________________________
Welcome to NetLinx v3.50.439 Copyright AMX LLC 2008
>cpu usage
CPU usage = 82.56% (30 sec. average = 16.73%, 30 sec. max = 82.56%)
CPU usage = 93.36% (30 sec. average = 47.21%, 30 sec. max = 94.02%)
CPU usage = 69.10% (30 sec. average = 77.36%, 30 sec. max = 94.35%)
CPU usage = 85.71% (30 sec. average = 87.07%, 30 sec. max = 96.01%)
CPU usage = 7.31% (30 sec. average = 68.31%, 30 sec. max = 96.01%)
CPU usage = 6.98% (30 sec. average = 42.62%, 30 sec. max = 96.01%)
CPU usage = 7.81% (30 sec. average = 19.77%, 30 sec. max = 94.35%)
reboot heap watch
Rebooting in heap watch mode...
Welcome to NetLinx v3.50.439 Copyright AMX LLC 2008
>cpu usage
CPU usage = 63.85% (30 sec. average = 9.64%, 30 sec. max = 82.92%)
CPU usage = 92.70% (30 sec. average = 27.56%, 30 sec. max = 92.70%)
CPU usage = 93.03% (30 sec. average = 53.31%, 30 sec. max = 94.03%)
CPU usage = 91.54% (30 sec. average = 74.99%, 30 sec. max = 96.35%)
CPU usage = 96.19% (30 sec. average = 85.62%, 30 sec. max = 96.35%)
CPU usage = 6.14% (30 sec. average = 82.71%, 30 sec. max = 96.35%)
CPU usage = 6.14% (30 sec. average = 62.40%, 30 sec. max = 96.35%)
____________________________________________________________________
As you can see very little difference between firmware versions and heap watch mode.
Cheers
Point is - your system's CPU usage might very well be running at 98% and you just don't know it. I'd suggest removing bits and pieces and see if anything changes.
Do you have CineTouch in your master? I imagine that your code could be processor intensive at times.