Optimize your code. (Tips and techniques)
Spire_Jeff
Posts: 1,917
This thread is dedicated to known ways to speed up your code execution, reduce memory usage, or make code easier to reuse. This thread should not be used to debate techniques (create a separate thread to handle this). If possible, post proof of speed increase or reduced memory usage.
############################
# Reduce Memory Consumption #
############################
Tip - When using multiple ports on a virtual device, use sequential ports starting at port 1.
Proof - Tech support says so The processor allocates memory sequentially for ports. Even if you only use port 1 and port 100, the processor allocates memory to track all ports between 1 and 100 in addition to ports 1 and 100.
##########################
# Faster Loop Execution #
##########################
Tip - When using a For loop, reduce the work being done in the evaluation portion of the for loop. The most efficient for loop I have found counts down, not up.
Example - for(x=MaxNum;x;x--)
Proof -
##########################
# Faster Loop Execution #
##########################
Tip - Use stack_vars as counters in for loops.
Proof -
These are the few that come to mind right now. I look forward to hearing about other optimizations out there.
Jeff
############################
# Reduce Memory Consumption #
############################
Tip - When using multiple ports on a virtual device, use sequential ports starting at port 1.
Proof - Tech support says so The processor allocates memory sequentially for ports. Even if you only use port 1 and port 100, the processor allocates memory to track all ports between 1 and 100 in addition to ports 1 and 100.
##########################
# Faster Loop Execution #
##########################
Tip - When using a For loop, reduce the work being done in the evaluation portion of the for loop. The most efficient for loop I have found counts down, not up.
Example - for(x=MaxNum;x;x--)
Proof -
Line 13 (16:16:47):: START...FOR (nLoop = 1; nLoop <= LENGTH_ARRAY(nInfo); nLoop++) Line 14 (16:16:51):: ...END Line 15 (16:16:51):: pass 1: 3876ms Line 16 (16:17:07):: START...FOR (nLoop = 1; nLoop <= MAX_LENGTH_ARRAY(nInfo); nLoop++) Line 17 (16:17:11):: ...END Line 18 (16:17:11):: pass 2: 3694ms Line 19 (16:17:27):: START...FOR (nLoop = 1, nMax = MAX_LENGTH_ARRAY(nInfo); nLoop <= nMax; nLoop++) Line 20 (16:17:30):: ...END Line 21 (16:17:30):: pass 3: 2019ms Line 22 (16:17:48):: START...FOR (nLoop = MAX_LENGTH_ARRAY(nInfo); nLoop > 0; nLoop--) Line 23 (16:17:50):: ...END Line 24 (16:17:50):: pass 4: 1855ms Line 25 (16:18:08):: START...FOR (nLoop = MAX_LENGTH_ARRAY(nInfo); nLoop; nLoop--) Line 26 (16:18:10):: ...END Line 27 (16:18:10):: pass 5: 1557ms
##########################
# Faster Loop Execution #
##########################
Tip - Use stack_vars as counters in for loops.
Proof -
Line 13 (16:16:47):: START...FOR (nLoop = 1; nLoop <= LENGTH_ARRAY(nInfo); nLoop++) Line 14 (16:16:51):: ...END Line 15 (16:16:51):: pass 1: 3876ms This is the same format as pass 1, but with a stack_var. Line 23 (16:28:35):: START...FOR (snLoop = 1; snLoop <= LENGTH_ARRAY(nInfo); snLoop++) Line 24 (16:28:39):: ...END Line 25 (16:28:39):: pass 0: 3478ms$0D$0A Line 25 (16:18:08):: START...FOR (nLoop = MAX_LENGTH_ARRAY(nInfo); nLoop; nLoop--) Line 26 (16:18:10):: ...END Line 27 (16:18:10):: pass 5: 1557ms This is the same format as pass 5, but with a stack_var used for the counter Line 28 (16:18:28):: START...FOR (snLoop = MAX_LENGTH_ARRAY(nInfo); snLoop; snLoop--) Line 29 (16:18:29):: ...END Line 30 (16:18:29):: pass 6: 1213ms
These are the few that come to mind right now. I look forward to hearing about other optimizations out there.
Jeff
0
Comments
This is a very interesting topic. I have been doing this for about ten years now and I never even thought about the difference between using an increasing or decreasing value "for loop". It makes total sense though.
Out of curiosity how do you calculate the speed of the program? I have some ideas on how to do it but I have never attempted it so I thought why re-invent the wheel. I'll just ask you.
As for the timing, I have two methods I have used. The method listed above is done by parsing strings and calculating time based on when the string arrives. This was written by someone else (unknown to me), but it was what I had around. I have also done testing using a timeline. I start the timeline just before executing the code, then pause it right after the code finishes. I then use the timeline time to decide how long things took. I'm am not sure which is more accurate, but I believe that in either case, you are only talking a couple of ms room for error. I would love to know for sure, but I figure as long as I use the same method for all variations being tested, the overhead involved should be irrelevant in the final comparison.
One thing becomes clear tho, the processor is very fast. I like to make things happen a LOT (loops over 50000 times) so that most of the tests take at least 1 second (give or take a little). This helps to magnify the actual difference. Right now, the test is the only code running on the processor, but in the future I might start injecting the test code into a work code chunk to see if there is a difference in speed.
Jeff
Well, I don't think you have a choice on non-virtual devices. If you need to use port 17 on an NI-3100, you have to use it. I am not aware of any way to remap the physical device.
I also updated the for loop tests and added benchmark test code to a separate thread in Tips and Tricks.
#########################
# Squeeze EVERY last bit out of the processor
#########################
tip - If you need to make every single processor cycle count, use ON[variable] and OFF[variable] instead of variable=TRUE and variable=FALSE.
Proof - see the other post as the information is there. Over 70,000 cycles, ON and OFF were 95ms faster.
P.S.
Anyone know of any other competing code bits that would be fun to test?
Believe it or not, 1 and 0 are slower than TRUE and FALSE!
Jeff
I did some testing myself and didn?t find that to be the case at all. The time it takes to count up or down in a FOR loop is nearly absolutely identical at least using the method I tried.
I was looking at your benchmark code in the other thread and I think the testing method may be introducing too much error into the equation and giving you inaccurate conclusions. Why do all the pausing, killing, and creating of TIMELINEs when a simple GET_TIMER should suffice?
Here is my FOR loop counting code with results. It?s just a FOR loop that counts to 1 million and that?s it aside from the GET_TIMER and SEND_STRING 0 statements. I tested both up and down 10 times and each loop takes about 13 seconds to complete so it should be a plenty big enough sampling size.
Here are the results for counting up.
And here are for all practical purposes identical results for counting down.
However if your physical NI port aren't defined the same should hold true, they won't take up resources so use those ports sequentially as well and not use serial ports 1 & 7 leaving 2-6 un-used for example.
Look closer at the for loops The way to make counting down faster is to reduce the operations. Try the following:
Notice how the comparison is just x, not x>0. That is where I think the gain comes from.
I could probably change the timeline aspect, but that was my original thought on the task. I may have been trying something else at the time, but that's the way it was so I just kept it. It shouldn't make a difference as long as the overhead is the same from test to test, but if I get a moment, I might try modifying the code to use your suggestion.
Jeff
I see where you are coming from now. Yes, the same holds true for touch panels. While we are mentioning touch panels and ports, I just recalled another thing... I believe that the set_virtual_channel_count() command is done in blocks of 255 or was it 256. I will have to try to find the original post or the documentation to verify this. I guess the general rule of thumb is to start at the lowest number (generally 1) and try to go up sequentially in everything AMX.
The only exception that I have found is in actual device numbers. If it's not defined, it won't be tracked.
Jeff
A FOR loop that counts down is still not faster but I do agree that the comparison of x vs. x>0 is faster which is good to know.
I got the opposite results. Running through a FOR loop one million times I found that TRUE/FALSE (1/0) is faster than ON/OFF by about 5%. The results are below.
Here is the code ? The FOR loops each take about 20 seconds to run.
And here are the results which show that ON/OFF is slightly slower than TRUE/FALSE (1/0) and that TRUE/FALSE is the same as 1/0.
When I get into the office, I will try to mix things up a little and see if I can figure out why the results I am getting are different.
Jeff
It?s my opinion that the testing method should be as simple and straight forward as possible so that the testing method itself doesn?t end up skewing the results.
I think this post is great, although there has been little consensus, makes me want to do some testing as well.
It should be noted that the tests I ran were on an old NI-700 with the latest Duet firmware.
Agreed!
This post is at least 10 characters.
Folks,
Here are some testing modules used in Tech Support for many years. Guy Minervini shared these with me.
Have Fun!
-Jamie
it gets even better! some reputation for you
I switched the ID around to see if that changed anything. Here are the results:
Now, I am not sure what else could be different between the tests, but I am about to try your method. I am running these tests on a newer NI-900 and based on some other benchmarks I have, this NI-900 is running almost as fast as an NI-3100 and about 2.25x faster than an NI-3000.
Jeff
That is what I thought, the only thing I can think of is that maybe TRUE and FALSE are treated as boolean instead of integers? Or maybe the compiler is doing some sort of optimization on them.... or maybe my testing is flawed
I just finished playing HVAC guy (had to run a new wire to the roof top unit to hook the new Viewstat... 4 wires is just not enough ). I'm going to try to change the test method now.
Jeff
I think I see the problem and the reason I wound up using timelines. Timelines have a higher resolution than GET_TIMER. GET_TIMER has a resolution of only .1 seconds. Timelines allow for .001 seconds. Here is a sample of both methods head to head:
Tests 8, 9, and 10 are done with GET_TIMER. 11,12, and 13 are done with timelines. Tests 8, 9, 11, and 12 are identical in what they are doing. Tests 8 and 11 are looped 1000 times, tests 9 and 12 are looped 1500 times.
I will post the code so a second set of eyes can see if I am missing something.
Jeff
ADDED:
Here is another set of results. This time I ran the loops 10000 times for the first set and 12000 for the second set.
Jeff
I don?t feel loops that only last milliseconds collect anywhere near enough data to draw solid conclusions. The FOR loops I tested with were 1 million times round trip When tests last for 20 seconds plus, measuring to the tenth of a second or to a thousand of a second doesn?t make much of a difference. If my math is correct that?s about ? of 1 percent possible margin of difference or at most ? of 1 percent.
If your test methods still support that 1/0 is slower than TRUE/FALSE, I just can?t buy it. I also believe the simpler and least obstructive testing method works best. We?ll simply have to agree to disagree.
Whatever the case, I?m not going to be afraid to use ON/OFF, TRUE/FALSE, or 1/0 because all we?re really doing in the big scheme of things is contesting (discussing) at most a drop of water in the ocean with this one. It sure isn?t going to make any difference in your code that takes 25 minutes to boot. Man that?s over the top crazy unless you have a boat load of Duet modules in that program, even still...