Automated backups and open files

I have a script that I’m reasonably happy with. It makes backups anytime the manager database changes and sends it off the server to a remote location. My test is less than ideal, but it works: I hash the database every half an hour. If the hash is different, send it off. Crude*, but effective. (I also keep a log of the hashes). The fact that the hash in the next half an hour is the same as the previous one has me assuming no changes occurred to the database while it was being copied (I hash both ends).

An Electrical Engineer friend of mine insists I have to shut manager down before making the backup, and whilst I would generally agree with him, that’s a bit of a PITA if I happen to be using it (and I want to keep it automated). But TBH, in this case, I don’t think I really need to stop manager, do I? Would anyone else bother shutting it down? Do you consider it should be?

My thinking is, it doesn’t need to be, but I just can’t satisfy myself of that. It will depend on the underlying filesystem and what happens when the copy (rsync in my case) command is issued, if the file is stored in memory or just on the drive, are transactions immediately written to disk, etc. I don’t believe the file is locked for rsync, but I’m not sure.

I think the fact the manager datafile is an archive, I assume the database may be kept in memory and that copying the file would in fact be ok. I believe that arbitrarily copying the database should be relatively fine. How can I be absolutely sure that what is being copied out is (for want of better terminology) “finalised and complete”? (I am only talking “to that point”, I don’t care if a change is made moments later, I’m concerned about file changes whilst being copied)

When the user presses the “Backup” button, manager supplies you a copy of the database as is, right now. However, is making a copy of the database while the server application is running is never going to be a “sure thing”, is it? (my belief here is the file that manager downloads in the backup is just a copy of the archive as it exists in the application directory, but is it locked while making that copy/backup?)

As mentioned, I’m reasonably confident I’m covered and safe.

@lubos is probably the best to answer this because he knows how the database transactions occur, disk writes, file locks etc but I’m curious as to anyone’s (especially a systems engineer’s) thoughts on this.

It would be easy enough to create a 4am cron job that does in fact stop the manager service, create the backup and restart the service. But I wouldn’t want to arbitrarily shut it down half a dozen or more times a day.

My file size is about 400MB. Bandwidth and disk space are not an issue.

* I really don’t like the fact I am consistently hashing the manager archive, but is there a better way? Do other automators just use time stamps?

1 Like

If the API supported it, that would be a safer way of creating an archive. It may also change the database eg time stamp of last backup so may decrease your storage efficiently.

The risks of taking a snap shot are the same as an arbitrary power down, which many database programs are designed to be resistant to by few really like.

Have you looked at general backup solutions

1 Like

Based on my observation. the data would write based on loading time of user’s browser real time. if its finished loading the data is already complete wrote/deleted on disk. since sqlite is small and it does not have longer than 5 second batch process in manager design. it is finalized immediately. it really depends user’s internet connectivity. it is different story when user actually upload attachment. which usually more than 250kb.

You may set a rule where which duration is a ‘minor maintenance period’ where you discourage users to access manager at that period. what are the chances of all versions of data backup one will be corrupted. you have many versions. pick the non corrupted. out of all versions available.

No harm done you can try test out backup without shutting down the service and try run a service on another vm to see if there any corruption with the backup file, now and then. Hosting a service always a need of parallel system if another fail, run the next.

I haven’t done any testing, but that would be a good one to test the waters. Because I control the script, I could orchestrate an upload and backup at the same time. In the absence of authoritative data, I might do that and get back to people. Meaning: I’m not trying to break it, I just want to assess the volatility of the data: what could make it corrupt, what can ensure best backups etc.

My use case is pretty good, I am really the only user. A bonus for me.

This was exactly one of the thought processes, they can’t ALL be corrupt lol. quick fact, I haven’t had any that that are corrupt, not one, not any. Just had to get that out there.

I have done this a number of times, but in the absence of manager actually verifying the data, opening it to access the info to display you right now does not verify the entirety of it. I usually open it in the latest desktop version.

Perhaps when this is implemented it would be the preferred solutions