On our cluster, we've seen Pig( http://incubator.apache.org/pig/ ) filling up the /tmp and failing. (also inefficient since all the local tasks were spilling to the same disk)
Pig is simply using java api createTempFile,
Can we add -Djava.io.tmpdir="./tmp" somewhere ?
so that,
1) Tasks can utilize all disks when using tmp 2) Any undeleted tmp files will be deleted by the tasktracker when task(job?) is done.
The easiest way is to set it inside mapred.child.java.opts in the config, but this can be overwritten if the users set their own task heapsize.