Page 1 of 1

Systemd path unit pile-up

Posted: 2020/01/01 00:15:07
by sawozny
I'm wondering (and I can test this, but I was hoping the gurus on the board might have some words of wisdom) what happens when a systemd path unit calls it's related service unit and then another file shows up that meets the criteria of the path unit (using a PathExistsGlob directive) while the program executed by the service unit is still running? Does it run another copy of the program or does it wait until the .service called has finished it's business before starting the process again from the top?

The reason I'm asking is I'm envisioning a path unit watching a file delivery folder from a downstream system where 3 related files will be delivered at a time. Once the path unit realizes that new files have arrived, it will move the files into a processing folder and then do some stuff with them which will take an undetermined amount of time. Once it's done processing them, it will archive them and go back to sleep waiting for the next file set. This work fine for a small number of downstream systems delivering file sets, but once I start to ramp up, I envision that while the program called by the service unit is working on one file set, another file set may be delivered to the file delivery folder and I want to either avoid processing it until the first file set it processed, OR (if the path unit will have the service unit run the processing program immediately whether I like it or not) is there anything I can pass to the service and on to the program from the path unit as to the names of the discovered files so the instance of the program called by the service unit knows to only work on ITS files and not the ones also in the processing folder and already being worked on by a prior instance of the processing program.

Alternatively, if this problem has already been solved by a pre-packaged application in the linux repos that I can install and configure, I'd love to hear about it, but I haven't been able to find anything that fits, thus far.

Thanks,

Scott

Re: Systemd path unit pile-up

Posted: 2020/02/02 00:14:37
by sawozny
FYI, in case anyone finds this looking for a solution to a similar problem, there is absolutely nothing to prevent a pile-up of systemd service executions as I described. On the systemd mailing list there was a request to be able to pass the inotify (which does the heavy lifting for the path unit) returns including the name of the file to the service unit somehow but the response was that it would be too racy (sort of my core problem) and that the purpose of the path unit was lazy loading of services, not a fully featured file management system.

Disappointing, but as always, there's more than one way to do it. So what I ended up doing was delivering my file set to a "receiving" directory, then doing a mv of each file (with the last file being my "trigger" file) into a "received" directory. Since renames within the same partition are as small and atomic a transaction as you're going to get involving disk, I attached a shell script to the service unit file matching my path file watching for the trigger file and then first thing I did in THAT script was note the name of the trigger file and move it to a "processing" directory. This is followed by moving the other files in the set into the processing directory and working on them by name with no wildcarding needed, but if another file set gets delivered and a trigger file gets dropped into the "received" directory while I'm working on the first file set it doesn't matter because the second instance of the script will only work on the newly delivered files and so on.

I know with multiple file sources I still have a very thin possibility of having a second file set delivered while I'm gathering the file names of the first set and relocating the trigger file to the processing directory (it's about 20ms between the trigger file landing in the "received" directory and the service's script getting it moved to the "processing" directory) but the only way I can see around that is with the delivery script on the source system checking to see if another delivery is in progress when it logs in and, if so, to log out and try back in 10 seconds or so. The "detect another delivery" part of the logic is pretty straightforward, but I have to figure out how to get the needed files from the source's file system while running as a script on the destination without creating another TCP connection / SSH login. I guess that's a challenge for another day.