We recently ran out of storage space on a very large file serverone with many terabytes of spaceand upon closer inspection we found that it was just one employee who had used it all up. The space was taken up almost exclusively by small files that were the result of running some data-analysis scripts. These files were completely unnecessary after they had been read once. The code that generated the files had no good way of cleaning them up once they had been created; it just went on believing that storage was infinite. Now we've had to put quotas on our file servers and, of course, deal with weekly cries for more disk space. Surely there is a better way of dealing with this problem than clamping down on everyone for fear that one of them will do the wrong thing.
Caught Between a Block and a Lack of Space
Yes, there are better ways of handling this problem. You have now discovered one of the drawbacks of cheap storage (and yes, that old adage is true): files will always expand to fill the available storage space, just as programs expand to fill all available memory and spawn more threads until all of your CPU is utilized as well.
Shared storage, such as you are dealing with, presents the thorniest problem because it is shared, and, it would seemas regular readers of this column are, I'm sure, awarepeople simply cannot be trusted to police themselves. In reality most people can, but it takes just one, as you found out, to "ruin it for everybody," as our teachers used to say.
The point you make about the scripts not having a way of cleaning up after themselves is a good one. When you build programs out of many small source files your tools also generate intermediate filesthe objects that then get linked into a final executable. All build systems worthy of the name, however, have some form of "clean" target. Although this target was originally created so that you could start a new build from scratch, it is also a handy way of shrinking down the size of your work area when a project is either complete or on hold. Having a program that would do the same work with intermediate data files is a good start, but there are other things that can be done to improve the situation.
Littering the file system with files that have to be deleted later results in a performance problem. If you need to find all the files via recursive descent of the file system before you can delete them, then you are going to be hammering your file system. In the case of NFS (network file system)- mounted systems, you will also be hammering your network while trying to clean up after yourself. Although it might appear that the best course of action would be to delete the files immediately after use, this would prevent you from debugging problems in your data analysis. Also, if you have to rerun some part of the analysis, then the derived objects you created could come in handy in speeding up the second, or third, orwell, you knowthe nth run before you finally get it right. Probably the best compromise position is to place all of the derived objects into their own directory or set of directories, which can be easily located and purged when it is time to free up some space on the file system.
Keeping all the files in one place means you do not have to descend the file system recursively to find all the files that can be safely deleted. That will make the process easier, faster, and therefore more likely to be used by the people on your system. If cleaning up after yourself takes 30 seconds, you are pretty likely to do it; if it requires 30 minutes, you are going to put it off as long as you can, usually long enough for the file system to fill up again.
You have written in previous columns about not using
printf to debug programs, and you recommended using a debugger, but you must admit that there are times when a
Still Pounding on Printf
True, I have written in previous columns about the reasons for not using
The first instance where
The second instance where
I normally don't work on keyboard drivers, but I know the people who did, and I know there is nothing more frustrating than having a whiny user send you an email message saying, "The keyboard doesn't work." The driver itself was not long, and I knew about where the hang would happen in the code, so I just backtracked from where I thought the hang point was and used an Emacs macro I had written for just such an occasion, as shown in Figure 1.
Attaching the code shown in Figure 1 to a key sequence, I could insert a
Yes, there are times when you need or want
A Conversation with Bruce Lindsay
Photoshop Scalability: Keeping It Simple
Clem Cole, Russell Williams
A Call to Arms
Jim Gray, Mark Compton
The Digital Library is published by the Association for Computing Machinery. Copyright © 2011 ACM, Inc.