Glaring omission in NetLinx runtime library
DHawthorne
Posts: 4,584
I extensively use my own logging module that writes a file to the NetLinx RAM drive to help with troubleshooting when a master may need to be rebooted and you need a persistent record. It mostly has been working very well, but recently I had a runaway error that quickly filled the available drive space with a rapidly expanding log. I had thought the internal drive full error return on the file routines would be sufficient to prevent this being a problem, but apparently, they are like the engine heat idiot lights on some cars - by the time they light up, it's too late. This overblown log file not only consumed all available memory, but it locked up the master.
Well, I knew I needed more robust memory management anyway, so I set to fixing this flaw in my module. Much to my chagrin, I discovered that there are no available memory functions[ in the NetLinx runtime library. No drive space tests, nothing. Not even a way to get the current size of existing files.
This seems to me to be a glaring ommission. How can you include file operations, but not provide a method of testing in advance that your file operations are something you really want to go ahead with?
I've gotten around it by querying the master for it's memory availabilty, and parsing the response, then getting a directory listing, and parsing that for file sizes. But it's awkward as heck. And it depends on the fact that I'm already telneted into the master itself, from itself. It wouldn't work if I was doing some other file operation that simply manipulated files.
Well, I knew I needed more robust memory management anyway, so I set to fixing this flaw in my module. Much to my chagrin, I discovered that there are no available memory functions[ in the NetLinx runtime library. No drive space tests, nothing. Not even a way to get the current size of existing files.
This seems to me to be a glaring ommission. How can you include file operations, but not provide a method of testing in advance that your file operations are something you really want to go ahead with?
I've gotten around it by querying the master for it's memory availabilty, and parsing the response, then getting a directory listing, and parsing that for file sizes. But it's awkward as heck. And it depends on the fact that I'm already telneted into the master itself, from itself. It wouldn't work if I was doing some other file operation that simply manipulated files.
0
Comments
I've also noticed that when you write to the card alot and with short time in between, the card gets corrupted, and needs to be formatted again
To get the file size - After you open the file use FILE_SEEK to EOF (-1) and the return value will contain the number of bytes read.
For the for what it?s worth category ? The parameters I pass into my logging module are the directory, filename, and the max byte size for the file. When the module opens the file to append a log message it first seeks to the end of the file to check the size. If the length of the log message plus the current size of the file exceeds the max file size, the module turns the file into a .bak file and then starts a new log file. I should modify the module to accept the number of log files to rotate but I haven?t run across the need yet so I just stick with a current and a backup.
I?m like yuri and curious as to what is being logged. I?ve always found there to be oodles of space for log files (although I don?t know what else you store on the drive) But I?d say you definitely need to monitor the file size or eventually the program will go boom.
You mentioned it logging a runaway error. Is it possible that the master croaked from trying to do too much disk I/O too fast?
I experienced the same problem. Prior to me working for the company i currently work at. A logging routine had been written when there was a problem with our vbrick mpeg4 encoders. However on error, it wrote so many errors, thousands, that eventually the log file would file all of the available memory on the system. This happened. So I attempted to make a code change on this controller, and major problems developed. I could not make the change. Because as soon as I rebooted the log started to grow, and I could not fit new code.
Eventually this was figured out. I changed the code to prevent this from happening. But you have to be careful as there is nothing in netlinx to prevent this. I also wrote a batch file that takes logs that are written to our controllers, every monday and pulls them to a network drive.
Just a basic ftp batch file.
It thus keeps track of all events on many controllers. WE have over 25 controllers, with hundreds of devices over three complexes.
So you have to be careful.
The size of the files varies with the traffic and amount of debugging data the code generates. The number of files kept is part of the module parameters, and can be adjusted on-the-fly by date (ie., there is an interface command to set how many days worth of logs are kept). And yes, for most purposes, you can put a lot of stuff in there before memory becomes an issue. But a runaway error can produce dozens of error messages per second. That will grow a file fast.
I wanted a way to get file size before even attempting to open it, though that is another way to do it. My concept of dealing with available memory and file size is more on a line-by-line basis: check how much memory is available, check if there is room for the current write while allowing for a predefined slack percentage; if that slack parameter has been reached, close the file and disable further logging.
It was not too much I/O that caused the crash ... I could not so much as load a new program, there was insufficient memory. Yet I could connect to the master, it just wouldn't actually run the program. I FTP'd in and deleted the offending log, and rebooted, and all was well again. The nature of the error was such that it only happened under specific circumstances, and once you duplicated that circumstance, there was a constant stream of index zero error messages. I was visiting my sales rep and telneted into his conferenc e room system just for the heck of it, and saw a similar error on that machine. It did not affect operations at all, but the telnet session was a constant stream of errors. So this is not the kind of error you can't expect to run into.
I ran into a similar problem but not with a log file. (it's along story that I won't bore you with)
I found too, that if the master locked up, I had to reload the program and reboot. My only problem was that my program itself was so large that I had to include a blank master program file in the project, load it in, reboot, load the big fat program in, reboot and then I was good to go.
Much to my horror, I forgot to include the persistent variables in the blank program, so I lost those values the first time. (this was with an old-skool ME260. Since then we've upgraded to the ME260/64. problem solved)
I agree with you DHawthorne, with the current framework on file management, you're kinda stuck guessing when you're about to run out of RAM. I've built in about an 80% full buffer for myself just to be on the safe side. there are obviously a couple ways to check on what you've got before you hit the button, but it's kinda 'quantum physics' in that it can actually change between the point of checking and sending the new file.