Discussion:
Replicated filesystem for linux - disconnected use
(too old to reply)
Tim Watts
2014-06-21 10:43:21 UTC
Permalink
There are many clustered filesystems for linux - most seem to have HPC
clustering or failover in mind and assume there is solid networking
between the hosts.


I'm after one that would suit multiple client "ordinary/home" usage with
intermittent connectivity.

Right now, I have a central NFS server at home which is backed up
properly. I work mostly on a laptop (which is the way everyone in my
family is going, we have no "desktop" - just a monitor and keyboard for
docking to). I occasionally sync back to base with unison, which is a
great tool.


I'm not looking for a cachefs type thing that depends on the network
being there - I'm after a full on replicated (at the file level, not the
block[1]) filesystem preferably with no concept of a master (unison
handles this quite well).

[1] Replication will always have some clashes. I'd rather have good
files with the possibility one file is not the right version, than have
a buggered FS.



So there seem to be a couple of directions:

1) Run unison as root from a script with a carefully chosen config file
per FS area. Do some DIY so the script runs when (say) at-home WiFi is
detected, so as to avoid syncing over a mobile or work link.

Email errors to me for manual fixing (unison generally "does the right
thing" and baulks before doing something that is not provably correct).


Or

2) Find a more elegant solution that works at the kernel or daemon level.


So:

1 - Anyone done this and did it work out?

2 - Any FSs worth looking at that would behave well in a
WAN-with-intermittent-connectivity context?

Cheers :)

Tim
Theo Markettos
2014-06-24 20:34:53 UTC
Permalink
Post by Tim Watts
There are many clustered filesystems for linux - most seem to have HPC
clustering or failover in mind and assume there is solid networking
between the hosts.
I'm after one that would suit multiple client "ordinary/home" usage with
intermittent connectivity.
Have you thought about the various version control systems? I've been using
git recently as a way to copy files around between machines: do work on one
machine, push to another. Pull copy from desktop down onto laptop, go on
plane. Any work I do offline can be pushed about like this, and it deals
with clashes between different versions. You do have to manually checkpoint
and add files but it's simple and a good habit to get into (each checkpoint
is anotated with a commit message which is useful as a progress report).

A downside is you're carrying a full copy of all the history about with you,
which might not always be desirable. Though it's easy to do a fresh clone
without the history.

Other version control systems do different things, which might suit you
better depending on what you want.

Theo
Tim Watts
2014-06-24 20:57:59 UTC
Permalink
Post by Theo Markettos
Post by Tim Watts
There are many clustered filesystems for linux - most seem to have HPC
clustering or failover in mind and assume there is solid networking
between the hosts.
I'm after one that would suit multiple client "ordinary/home" usage with
intermittent connectivity.
Have you thought about the various version control systems? I've been using
git recently as a way to copy files around between machines: do work on one
machine, push to another. Pull copy from desktop down onto laptop, go on
plane. Any work I do offline can be pushed about like this, and it deals
with clashes between different versions. You do have to manually checkpoint
and add files but it's simple and a good habit to get into (each checkpoint
is anotated with a commit message which is useful as a progress report).
A downside is you're carrying a full copy of all the history about with you,
which might not always be desirable. Though it's easy to do a fresh clone
without the history.
Other version control systems do different things, which might suit you
better depending on what you want.
Theo
It's an interesting idea - and I do often drop git into /etc/ to track
my config updates (where it's not worth bothering with a config file
management system).

I fear it will not suit all the random large binary files I have though
(spreadsheets, media etc). But the concept is very interesting. Thank you.

I'm looking for something I can make fully automated so when the kids
are slightly bigger and all have laptops for school work, I can have
everybody's /home on every device and backed up centrally.

I used to have a nice setup when we had a couple of desktops - even had
dual X sessions on one so me the wife did not have to kick each other
off - just did a CTRL-ALT-F7/8

Been looking to get the networked desktop experience on the laptops but
as you can see it's a little trickier. The good thing is disk is cheap
so there's no real problem with having replicated copies everywhere.
Theo Markettos
2014-06-24 23:27:02 UTC
Permalink
Post by Tim Watts
It's an interesting idea - and I do often drop git into /etc/ to track
my config updates (where it's not worth bothering with a config file
management system).
etckeeper will do that for you (think it does git and bzr) including
integrating with dpkg or RPM. It's quite neat in that every time you
apt-get update it commits a new revision.
Post by Tim Watts
I fear it will not suit all the random large binary files I have though
(spreadsheets, media etc). But the concept is very interesting. Thank you.
We use Subversion for holding binary files - eg every time a document is
changed the PDF also gets committed. It's OK with that, though a full
checkout of the many-GB is not quick. SVN is a centralised VCS of course.
I did wonder if there's a distributed VCS that doesn't insist on keeping a
local copy of the entire history if you care less about old versions, so
multi-GB file churning doesn't waste as much space.

Subversion also plays better with the 'I just want this bit' where git will
force you to take the whole repo (and handling multiple repos is awkward).
I think Mercurial is similar to git, but don't know the others.
Post by Tim Watts
I'm looking for something I can make fully automated so when the kids
are slightly bigger and all have laptops for school work, I can have
everybody's /home on every device and backed up centrally.
Another idea is Time Machine: obviously that's an Apple thing, but there are
Linux clones. While the 'sparse bundles' it makes aren't directly usable,
there's a Linux FS (FUSE-based I think) to mount them and get at the files
inside (and you don't need a Apple drive to do TM backups to).

Time Machine isn't really a sync protocol as such - it's only really one
way. Would be interested to know what you find anything a bit more like
that, which could be useful for 2/3/4 way sync rather than one.

Theo
Tim Watts
2014-06-25 06:21:07 UTC
Permalink
Post by Theo Markettos
Post by Tim Watts
It's an interesting idea - and I do often drop git into /etc/ to track
my config updates (where it's not worth bothering with a config file
management system).
etckeeper will do that for you (think it does git and bzr) including
integrating with dpkg or RPM. It's quite neat in that every time you
apt-get update it commits a new revision.
Must look at that - thanks. Always up for new things that may be useful :)
Post by Theo Markettos
Post by Tim Watts
I fear it will not suit all the random large binary files I have though
(spreadsheets, media etc). But the concept is very interesting. Thank you.
We use Subversion for holding binary files - eg every time a document is
changed the PDF also gets committed. It's OK with that, though a full
checkout of the many-GB is not quick. SVN is a centralised VCS of course.
I did wonder if there's a distributed VCS that doesn't insist on keeping a
local copy of the entire history if you care less about old versions, so
multi-GB file churning doesn't waste as much space.
Subversion also plays better with the 'I just want this bit' where git will
force you to take the whole repo (and handling multiple repos is awkward).
I think Mercurial is similar to git, but don't know the others.
Post by Tim Watts
I'm looking for something I can make fully automated so when the kids
are slightly bigger and all have laptops for school work, I can have
everybody's /home on every device and backed up centrally.
Another idea is Time Machine: obviously that's an Apple thing, but there are
Linux clones. While the 'sparse bundles' it makes aren't directly usable,
there's a Linux FS (FUSE-based I think) to mount them and get at the files
inside (and you don't need a Apple drive to do TM backups to).
Hmm - I didn't know that...
Post by Theo Markettos
Time Machine isn't really a sync protocol as such - it's only really one
way. Would be interested to know what you find anything a bit more like
that, which could be useful for 2/3/4 way sync rather than one.
Martin Gregorie
2014-06-24 23:49:07 UTC
Permalink
Post by Tim Watts
It's an interesting idea - and I do often drop git into /etc/ to track
my config updates (where it's not worth bothering with a config file
management system).
I fear it will not suit all the random large binary files I have though
(spreadsheets, media etc). But the concept is very interesting. Thank you.
Have you tried rsync?

Its currently my backup/replication system of choice because its simple
to set up and very fast: the latter is because it does the least possible
work to bring the replicated fs into line with the master fs. It has a
number of clever tricks for dealing with large files with small amounts
of updating scattered through them. It makes short work of backing up
PostgreSQL databases, tarballs etc.

I'm currently using two generations of offline backups with USB disk
drives as the backup medium. The only times the backups are slow are the
first backup after a clean install of a new distro version and when I've
just replaced a backup disk. IOW the first replication with rsync is not
much faster than an uncompressed full copy but subsequent runs are very
much quicker depending only on the amount of change that happened since
the last rsync run.
--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |
The Natural Philosopher
2014-06-24 23:53:39 UTC
Permalink
Post by Martin Gregorie
Post by Tim Watts
It's an interesting idea - and I do often drop git into /etc/ to track
my config updates (where it's not worth bothering with a config file
management system).
I fear it will not suit all the random large binary files I have though
(spreadsheets, media etc). But the concept is very interesting. Thank you.
Have you tried rsync?
Its currently my backup/replication system of choice because its simple
to set up and very fast: the latter is because it does the least possible
work to bring the replicated fs into line with the master fs. It has a
number of clever tricks for dealing with large files with small amounts
of updating scattered through them. It makes short work of backing up
PostgreSQL databases, tarballs etc.
I'm currently using two generations of offline backups with USB disk
drives as the backup medium. The only times the backups are slow are the
first backup after a clean install of a new distro version and when I've
just replaced a backup disk. IOW the first replication with rsync is not
much faster than an uncompressed full copy but subsequent runs are very
much quicker depending only on the amount of change that happened since
the last rsync run.
+1
--
Ineptocracy

(in-ep-toc’-ra-cy) – a system of government where the least capable to
lead are elected by the least capable of producing, and where the
members of society least likely to sustain themselves or succeed, are
rewarded with goods and services paid for by the confiscated wealth of a
diminishing number of producers.
Tim Watts
2014-06-25 06:54:25 UTC
Permalink
Post by Martin Gregorie
Post by Tim Watts
It's an interesting idea - and I do often drop git into /etc/ to track
my config updates (where it's not worth bothering with a config file
management system).
I fear it will not suit all the random large binary files I have though
(spreadsheets, media etc). But the concept is very interesting. Thank you.
Have you tried rsync?
Its currently my backup/replication system of choice because its simple
to set up and very fast: the latter is because it does the least possible
work to bring the replicated fs into line with the master fs. It has a
number of clever tricks for dealing with large files with small amounts
of updating scattered through them. It makes short work of backing up
PostgreSQL databases, tarballs etc.
I'm currently using two generations of offline backups with USB disk
drives as the backup medium. The only times the backups are slow are the
first backup after a clean install of a new distro version and when I've
just replaced a backup disk. IOW the first replication with rsync is not
much faster than an uncompressed full copy but subsequent runs are very
much quicker depending only on the amount of change that happened since
the last rsync run.
+1
I like rsync, but this needs to be a 2 way sync...
Tim Watts
2014-06-25 06:51:49 UTC
Permalink
Post by Martin Gregorie
Post by Tim Watts
It's an interesting idea - and I do often drop git into /etc/ to track
my config updates (where it's not worth bothering with a config file
management system).
I fear it will not suit all the random large binary files I have though
(spreadsheets, media etc). But the concept is very interesting. Thank you.
Have you tried rsync?
Hi,

Yes - I use it a lot for backups.

I think it would fall into the category of unison, only unison does 2
way syncs with a very good algorithm.

The only fault I see with unison is that the protocol changes with
certain version changes, but debian are very good about having multiple
versions packaged.
Post by Martin Gregorie
Its currently my backup/replication system of choice because its simple
to set up and very fast: the latter is because it does the least possible
work to bring the replicated fs into line with the master fs. It has a
number of clever tricks for dealing with large files with small amounts
of updating scattered through them. It makes short work of backing up
PostgreSQL databases, tarballs etc.
I'm currently using two generations of offline backups with USB disk
drives as the backup medium. The only times the backups are slow are the
first backup after a clean install of a new distro version and when I've
just replaced a backup disk. IOW the first replication with rsync is not
much faster than an uncompressed full copy but subsequent runs are very
much quicker depending only on the amount of change that happened since
the last rsync run.
I use rsnapshot (rsync based) with a cheap NAS (The Western Digital
MyCloud units which are a cheap linux+big disk in a box wth root ssh
access and now debian based)

I think I should probably try automating unison - no one is jumping out
with "you must try <syncfs>" so I guess I should be looking more at my
first option :)

Allegedly it "does the right thing" running as root, like rsync.

Sounds like time for a perl wrapper to handle emailing conflicts and
deciding when the network is stable and the right one to trigger a sync.

Thanks for the ideas :)
Martin Gregorie
2014-06-25 11:42:34 UTC
Permalink
Post by Tim Watts
I use rsnapshot (rsync based) with a cheap NAS (The Western Digital
MyCloud units which are a cheap linux+big disk in a box wth root ssh
access and now debian based)
My rsync backups are weekly, but I also run an overnight backup that
currently keeps 17 days worth of backups (compressed tarballs) on anther
USB disk, but its permanently online, though spun down when not in use,
because the backup happens at 0300.

I keep thinking that this backup would be faster and could put more daily
backups on the disk if I used rsnapshot rather than my homegrown script
+tar solution.

Thanks for providing a prod in that direction.
--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |
Chris Davies
2014-06-25 13:08:58 UTC
Permalink
Post by Martin Gregorie
My rsync backups are weekly, but I also run an overnight backup that
currently keeps 17 days worth of backups (compressed tarballs) on anther
USB disk [...]
I keep thinking that this backup would be faster and could put more daily
backups on the disk if I used rsnapshot rather than my homegrown script
+tar solution.
You can't usefully use rsnapshot to a FAT-based filesystem as it requires
hard links capability. However, I have used it (and continue to do so)
very successfully with ext{3,4} as the backups storage and any/all of
ext4, ntfs, and FAT32/vfat as the source.

Chris
Martin Gregorie
2014-06-25 20:04:51 UTC
Permalink
Post by Chris Davies
Post by Martin Gregorie
My rsync backups are weekly, but I also run an overnight backup that
currently keeps 17 days worth of backups (compressed tarballs) on
anther USB disk [...]
I keep thinking that this backup would be faster and could put more
daily backups on the disk if I used rsnapshot rather than my homegrown
script +tar solution.
You can't usefully use rsnapshot to a FAT-based filesystem as it
requires hard links capability. However, I have used it (and continue to
do so) very successfully with ext{3,4} as the backups storage and
any/all of ext4, ntfs, and FAT32/vfat as the source.
Yeah, I should have said that I reformat my USB drives to ext4 as a
matter of course. Its the first thing I do after taking it out of its
packing.
--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |
Tim Watts
2014-06-25 15:21:11 UTC
Permalink
Post by Martin Gregorie
Post by Tim Watts
I use rsnapshot (rsync based) with a cheap NAS (The Western Digital
MyCloud units which are a cheap linux+big disk in a box wth root ssh
access and now debian based)
My rsync backups are weekly, but I also run an overnight backup that
currently keeps 17 days worth of backups (compressed tarballs) on anther
USB disk, but its permanently online, though spun down when not in use,
because the backup happens at 0300.
I keep thinking that this backup would be faster and could put more daily
backups on the disk if I used rsnapshot rather than my homegrown script
+tar solution.
Thanks for providing a prod in that direction.
I can run rsnapshot 4 times a day - it is quite light (thanks to rsync
being so efficient).

The one thing I don't like, and used to do differently when I had a perl
script driving rsync, was rsnapshot uses cp -rl to populate every tree
with a full copy of pointers to the inodes - rsync then unlink()s and
replaces any that have changed.

The cp -rl is very time consuming and a waste of directory entries - I'd
love to turn it off and just have one full tree and reverse increment
trees of only the files which had changed.

Sadly rsnapshot has no option for this...
Chris Davies
2014-06-26 08:25:41 UTC
Permalink
Post by Tim Watts
The cp -rl is very time consuming and a waste of directory entries - I'd
love to turn it off and just have one full tree and reverse increment
trees of only the files which had changed.
You could perhaps replace the "cmd cp" option with "cmd true". But I'm
not sure how this would let you identify files that had been removed
since the previous snapshot.

Chris
Tim Watts
2014-06-26 12:29:58 UTC
Permalink
Post by Chris Davies
Post by Tim Watts
The cp -rl is very time consuming and a waste of directory entries - I'd
love to turn it off and just have one full tree and reverse increment
trees of only the files which had changed.
You could perhaps replace the "cmd cp" option with "cmd true". But I'm
not sure how this would let you identify files that had been removed
since the previous snapshot.
Chris
It's a little trickier than that Chris.

Based on how I implemented it at work a long time ago (and don't have
the code), you basically tell rsync to resync to the *same* full backup
target, but you give it an option to copy any files it is about to
modify or delete into a second target (your -1 incremental).

Having previously rotated all your incrementals

-2 > -3
-1 > -2,

etc, you can maintain as many changes as you like.

This would probably involve modifying rsnapshop - or writing a
catch-script that happens to be called "rsync" (and ditto for cp) -
though IIRC you can specify the binaries to use in the rsnapshot config.
Theo Markettos
2014-06-26 18:08:38 UTC
Permalink
Post by Tim Watts
It's a little trickier than that Chris.
Based on how I implemented it at work a long time ago (and don't have
the code), you basically tell rsync to resync to the *same* full backup
target, but you give it an option to copy any files it is about to
modify or delete into a second target (your -1 incremental).
The tradeoff is you then have to apply multiple diffs to go back in time:
apply -1 to full0 to make full-1
apply -2 to full-1 to make full-2
etc

Also it doesn't cope with newly created files. Let's say it exists in full0
and full-1, at which point it was created. It shouldn't exist in full-2.
But you can't make a negative file you can copy over a file to make it
disappear.

One solution to the snapshots problem is btrfs, but that didn't work out for
me when I tried it about 3 years ago. It may have improved since. It was
also lacking frontend tools (eg a nice backup program using snapshots) -
that may have improved. ZFS is also nice, but there's never been much
takeup from Linux folks (for various reasons).

Theo
Tim Watts
2014-06-26 20:42:38 UTC
Permalink
Post by Theo Markettos
Post by Tim Watts
It's a little trickier than that Chris.
Based on how I implemented it at work a long time ago (and don't have
the code), you basically tell rsync to resync to the *same* full backup
target, but you give it an option to copy any files it is about to
modify or delete into a second target (your -1 incremental).
apply -1 to full0 to make full-1
apply -2 to full-1 to make full-2
etc
This is true.
Post by Theo Markettos
Also it doesn't cope with newly created files. Let's say it exists in full0
and full-1, at which point it was created. It shouldn't exist in full-2.
But you can't make a negative file you can copy over a file to make it
disappear.
There is no "full2".

There's only:

full (and most recent)
inc-1
inc-2
...
inc-N
Post by Theo Markettos
One solution to the snapshots problem is btrfs, but that didn't work out for
me when I tried it about 3 years ago. It may have improved since. It was
also lacking frontend tools (eg a nice backup program using snapshots) -
that may have improved. ZFS is also nice, but there's never been much
takeup from Linux folks (for various reasons).
You might have more luck with ZFS - I tried the Debian 7 packages and
have been VERY impressed.
Theo Markettos
2014-06-26 21:42:22 UTC
Permalink
Post by Tim Watts
There is no "full2".
full (and most recent)
inc-1
inc-2
...
inc-N
Yes, but how do you say 'file X was created at the time of inc-1 but did not
exist earlier?'. There's nothing you can put in inc-2 that, when copied
over full-1 (ie full with inc-1 on top to 'regress' it), will cause the file
to be deleted.

You can have additional shell scripts to delete files, but rsync doesn't
emit those. You can also have empty files, but they aren't the same as
deletion.

This is a big problem for files like date-named logfiles: if you regress
back to the very beginning it'll be full of logfiles from the future.
Post by Tim Watts
You might have more luck with ZFS - I tried the Debian 7 packages and
have been VERY impressed.
Thanks - I think that's enough of a prod for me to have a go :)

Theo
Tim Watts
2014-06-26 22:25:00 UTC
Permalink
Post by Theo Markettos
Post by Tim Watts
There is no "full2".
full (and most recent)
inc-1
inc-2
...
inc-N
Yes, but how do you say 'file X was created at the time of inc-1 but did not
exist earlier?'. There's nothing you can put in inc-2 that, when copied
over full-1 (ie full with inc-1 on top to 'regress' it), will cause the file
to be deleted.
Indeed - you have spotted a weakness.

I would clarify that the requirements were (when I had to implement this):

1) A full backup from last night be available;

2) The user should have the option of going back (say) 30 days and being
able to retrieve a changed or deleted file.

3) It be suitable for self-help restores (unlike tape).


3 was achieved by making it available RO over NFS as
/vol/recover/<somefs>/{full,inc-00,inc-01,...}/thefiles


It was never a design criteria that a full point in time recover could
be made to N days ago - although you could, with the exception that you
rightly noticed that the user might gain some files.

Generally users are less upset by gaining some files[1] compared to
losing them :)

[1] Unless it's their porn browser history and the wife/college
tutor/employer has just noticed...
Post by Theo Markettos
You can have additional shell scripts to delete files, but rsync doesn't
emit those. You can also have empty files, but they aren't the same as
deletion.
This is a big problem for files like date-named logfiles: if you regress
back to the very beginning it'll be full of logfiles from the future.
Indeed :)

The times we used this in such a situation (restoring the webserver and
its logs) we just needed last nights (before the server got hacked).

That however could be a scenario where we were hacked 3 days ago and
want the now-4 copy. However, a certain amount of tidying up by hand
could be tolerated.
Post by Theo Markettos
Post by Tim Watts
You might have more luck with ZFS - I tried the Debian 7 packages and
have been VERY impressed.
Thanks - I think that's enough of a prod for me to have a go :)
Theo
Rahul
2014-06-25 05:14:15 UTC
Permalink
Post by Tim Watts
1 - Anyone done this and did it work out?
2 - Any FSs worth looking at that would behave well in a
WAN-with-intermittent-connectivity context?
Some of the tools that may fit your use case seem still at the research
stage unfortunately.

e.g. Is MIT's Ivy what you need? It's strictly peer to peer.

http://pdos.csail.mit.edu/ivy/

The project has been a bit dead though.


Or the Shark FS project?

http://www.scs.stanford.edu/shark/overview/

Or the Nimb
Tim Watts
2014-06-25 06:53:56 UTC
Permalink
Post by Rahul
Post by Tim Watts
1 - Anyone done this and did it work out?
2 - Any FSs worth looking at that would behave well in a
WAN-with-intermittent-connectivity context?
Some of the tools that may fit your use case seem still at the research
stage unfortunately.
e.g. Is MIT's Ivy what you need? It's strictly peer to peer.
http://pdos.csail.mit.edu/ivy/
The project has been a bit dead though.
I'm avoiding dead stuff :)
Post by Rahul
Or the Shark FS project?
http://www.scs.stanford.edu/shark/overview/
Ah - just seen this (after my last post where i gave up hope )

Looks interesting - I will have a long hard look. Thank you sir.
Post by Rahul
Or the Nimb
nimb?
Tim Watts
2014-06-25 07:50:10 UTC
Permalink
Post by Rahul
Or the Shark FS project?
http://www.scs.stanford.edu/shark/overview/
This *seems* to be yet another cachingFS. I may be missing something in
their FAQs but I'm not seeing any mention of disconnected use.

This seemed to be the case for all the other funky FSs I've googled before.

There does seem to be a gap in the "market" for a clean replicating FS -
perhaps built on top of the unison algorithms, though it brings it's own
problems as some interaction seems to be necessary to resolv ethe odd
conflict.

I wonder how MS DFS works?
Post by Rahul
Or the Nimb
alexd
2014-06-25 21:14:33 UTC
Permalink
Post by Tim Watts
I wonder how MS DFS works?
Perhaps that's the answer?

http://www.serverlab.ca/tutorials/linux/storage-file-systems-linux/how-to-deploy-a-distributed-file-system-server-on-centos-6/
--
<http://ale.cx/> (AIM:troffasky) (***@ale.cx)
22:13:34 up 174 days, 23:52, 10 users, load average: 0.42, 0.47, 0.46
"If being trapped in a tropical swamp with Anthony Worral-Thompson and
Christine Hamilton is reality then I say, pass the mind-altering drugs"
-- Humphrey Lyttleton
Tim Watts
2014-06-25 22:19:53 UTC
Permalink
Post by alexd
Post by Tim Watts
I wonder how MS DFS works?
Perhaps that's the answer?
http://www.serverlab.ca/tutorials/linux/storage-file-systems-linux/how-to-deploy-a-distributed-file-system-server-on-centos-6/
That does indeed seem to handle the unified namespace - but I see no
mention of replication (which I *think* DFS can also do, with WAN links
in mind - I'm not actually an MS person, but I've listened to colleages
going on about it).

It's the replication I'm interested in.
alexd
2014-06-27 12:37:00 UTC
Permalink
Post by Tim Watts
It's the replication I'm interested in.
You know, I knew that was exactly what you were after, and I was convinced
at the time that that's what that article was describing, but a second
reading shows that it wasn't :-)

A bit more googling reveals that no, it doesn't support the -R bit of DFS-R.
I suspect the rationale for this is, it handles the client facing bit
alright, and you can use any number of linux-native replication schemes on
the backend.
--
<http://ale.cx/> (AIM:troffasky) (***@ale.cx)
13:25:44 up 176 days, 15:04, 10 users, load average: 0.46, 0.51, 0.51
"If being trapped in a tropical swamp with Anthony Worral-Thompson and
Christine Hamilton is reality then I say, pass the mind-altering drugs"
-- Humphrey Lyttleton
Rahul
2014-06-26 16:03:15 UTC
Permalink
Post by Tim Watts
This *seems* to be yet another cachingFS. I may be missing something
in their FAQs but I'm not seeing any mention of disconnected use.
This seemed to be the case for all the other funky FSs I've googled before.
There does seem to be a gap in the "market" for a clean replicating FS
- perhaps built on top of the unison algorithms, though it brings it's
own problems as some interaction seems to be necessary to resolv ethe
odd conflict.
Just to clarify my understanding: How do you differentiate between a
caching FS and a replicating FS?

On the clustering / HPC end they focus on performance and failover by
striping across several storage hosts & adding redundancy. (well, ok and
some hashing and metadata tricks) Lustre / HPFS / Gluster etc. seem to
follow variations of this basic model.

Now if I understand your use case correctly, would having something like a
lightweight Gluster node on *each* of your devices work? Is that what you
have in mind.

I mean say your network gets partitioned (i.e. one device temporarily
cannot see another) how do you propose to enforce consistancy after the
network recovers? Assuming someone else changes the same file. Or is that
not a use case we are bothering about?
Tim Watts
2014-06-26 20:40:07 UTC
Permalink
Post by Rahul
Just to clarify my understanding: How do you differentiate between a
caching FS and a replicating FS?
Certainly:

Caching is opportunistic and not usually with 100% coverage.

Replicating is deterministic with 100% coverage at the earliest opportunity.

Or, another way:

Caching says "We don't really care too much as we know the master has
the authoritative copy - but wouldn;t be nice if we could keep some hot
objects locally to make things a bit faster. But we will assume that we
can check with the master when we wish.


Replicating says (to me): There is no master. When we get access to our
peer or peers, we will attempt to synchronise as quickly as possible
because we do not know when we will get the next chance. Our copy is the
most important and we much have 100% all of the time.
Post by Rahul
On the clustering / HPC end they focus on performance and failover by
striping across several storage hosts & adding redundancy. (well, ok and
some hashing and metadata tricks) Lustre / HPFS / Gluster etc. seem to
follow variations of this basic model.
Now if I understand your use case correctly, would having something like a
lightweight Gluster node on *each* of your devices work? Is that what you
have in mind.
Could be.
Post by Rahul
I mean say your network gets partitioned (i.e. one device temporarily
cannot see another) how do you propose to enforce consistancy after the
network recovers? Assuming someone else changes the same file. Or is that
not a use case we are bothering about?
I would say: using the same algorithm that unison uses. It's a very well
developed algorithm and it will either be able to prove which is the
newer copy (which is often IME) OR it will either be uncertain OR it
will be able to prove a clash as you suggested.

In the last 2 cases it has little choice but to ask for manual intervention.

Cheers

Tim
Rahul
2014-06-29 17:21:35 UTC
Permalink
Post by Tim Watts
Replicating says (to me): There is no master. When we get access to our
peer or peers, we will attempt to synchronise as quickly as possible
because we do not know when we will get the next chance. Our copy is the
most important and we much have 100% all of the time.
Most of this seems a two-way (or multi-way?) rysync? The only problem seems
clash / conflict resolution. Since there's no way a system can resolve that
the best one could do is conflict identification.

Would something like checksums take care of this?
Chris Davies
2014-06-30 14:16:35 UTC
Permalink
Post by Tim Watts
Replicating says (to me): There is no master. When we get access to our
peer or peers, we will attempt to synchronise as quickly as possible
because we do not know when we will get the next chance. Our copy is the
most important and we much have 100% all of the time.
I would suggest your model (as written here) should be tweaked
slightly. "Our copy" isn't always the most important, because another peer
might have a more recently updated copy. Two way synchronisation usually
defers to file creation/modification time: most recently modified is best.

Chris

Andrzej Adam Filip
2014-06-26 10:25:35 UTC
Permalink
Post by Tim Watts
There are many clustered filesystems for linux - most seem to have HPC
clustering or failover in mind and assume there is solid networking
between the hosts.
I'm after one that would suit multiple client "ordinary/home" usage
with intermittent connectivity.
[...]
Have you considered using distributed version system like git?
Tim Watts
2014-06-26 12:32:15 UTC
Permalink
Post by Andrzej Adam Filip
Post by Tim Watts
There are many clustered filesystems for linux - most seem to have HPC
clustering or failover in mind and assume there is solid networking
between the hosts.
I'm after one that would suit multiple client "ordinary/home" usage
with intermittent connectivity.
[...]
Have you considered using distributed version system like git?
I have and I've rejected this. It has to be automatic, transparant and
ordinary user friendly.

I am concluding there is no such thing as an open source replicating
filesystem layer for linux... I am surprised - I expected someone to say
"eeblefs does exactly what you need, why DYGFI you moron ;->"

Oh well, I think then some automation to drive unison as root is
probably the way to go...

Thanks anyway (and everyone else)!
Theo Markettos
2014-06-26 18:25:55 UTC
Permalink
In uk.comp.os.linux Tim Watts <***@dionic.net> wrote:
[using git]
Post by Tim Watts
I have and I've rejected this. It has to be automatic, transparant and
ordinary user friendly.
Well, there's gitfs:
https://github.com/rossbiro/GitFS
but that's probably not up to scratch.
Post by Tim Watts
I am concluding there is no such thing as an open source replicating
filesystem layer for linux... I am surprised - I expected someone to say
"eeblefs does exactly what you need, why DYGFI you moron ;->"
I was rather hoping you'd find something too :(

Time for me to start reading about ZFS replication I suspect...

Theo
Tim Watts
2014-06-26 20:44:41 UTC
Permalink
Post by Theo Markettos
[using git]
Post by Tim Watts
I have and I've rejected this. It has to be automatic, transparant and
ordinary user friendly.
https://github.com/rossbiro/GitFS
but that's probably not up to scratch.
Post by Tim Watts
I am concluding there is no such thing as an open source replicating
filesystem layer for linux... I am surprised - I expected someone to say
"eeblefs does exactly what you need, why DYGFI you moron ;->"
I was rather hoping you'd find something too :(
Seems like a nice Masters or 3rd Year CompSci Project for someone in the
making...
Post by Theo Markettos
Time for me to start reading about ZFS replication I suspect...
I had not considered that - and I'm using ZFS on Debian! I got the hang
of it in a MD-RAID/LVM2/FS use case and did not explore further...

OK - off to look at the manuals...
Theo Markettos
2014-06-26 21:33:30 UTC
Permalink
Post by Tim Watts
Seems like a nice Masters or 3rd Year CompSci Project for someone in the
making...
Good idea... I might be able to do something about that.

If you feel like refining the requirements a bit (eg what it should/should
not do, interesting corner cases, etc) and get them to me by September, I
might be able to set something up.

Theo
Tim Watts
2014-06-26 22:37:18 UTC
Permalink
Post by Theo Markettos
Post by Tim Watts
Seems like a nice Masters or 3rd Year CompSci Project for someone in the
making...
Good idea... I might be able to do something about that.
If you feel like refining the requirements a bit (eg what it should/should
not do, interesting corner cases, etc) and get them to me by September, I
might be able to set something up.
Theo
I can write a top level outline in a week.

That's not to say that whatever points I come up might not be open to
challenge!

After that I might be able to refine it a bit if I study unison's
algorithm a bit.


I *believe* unison does not rely on system clocks being synced.

It does keep a lot of local state after the first run. I *suspect* it
checks the files against the local (now previous) state (size,mtime
and/or checksums) to decide if object X changed.

It's very fast to do a resync after the first run so I suspect it's not
reliant totally on checksums.


On a subsequent run, if object X changed on FS-A (Filesystem on server
A) but not FS-B, then FS-A must be the newer copy and should be
replicated to FS-B without question.

Vice versa also.

If it detects a change on both ends for object X, then it must ask,
unless you are willing to trade off on some level.




Before you quote me on any of that, I should actually go and look at it!




I think the two most difficult practical problems in a transparent
replicating FS might include:

1) Communicating with the user to manually resolve conflicts.

2) Warning the user the last sync was not completed because the network
went away.


Conflicts of "real" files are probably infrequent - eg she edits a
spreadsheet at both ends before syncing.

However, if an entire $HOME is replicated, there are going to be a *lot*
of "dotfiles" that are going to be churning - thunderbird state, web
browser cache, desktop-WM settings.



Cheers

Tim
Chris Davies
2014-06-27 14:07:00 UTC
Permalink
Post by Tim Watts
I am concluding there is no such thing as an open source replicating
filesystem layer for linux... I am surprised - I expected someone to say
"eeblefs does exactly what you need, why DYGFI you moron ;->"
Dropbox / One drive / Google drive / Hubic ?

Or if you want to run your own inhouse, consider Owncloud. Synchronisation
apps available for many platforms.

Chris
Tim Watts
2014-06-27 14:23:08 UTC
Permalink
Post by Chris Davies
Post by Tim Watts
I am concluding there is no such thing as an open source replicating
filesystem layer for linux... I am surprised - I expected someone to say
"eeblefs does exactly what you need, why DYGFI you moron ;->"
Dropbox / One drive / Google drive / Hubic ?
Valid point, though...
Post by Chris Davies
Or if you want to run your own inhouse, consider Owncloud. Synchronisation
apps available for many platforms.
I did try this (with android in mind). Seems you have to get the
commercial version if you want to export arbitrary filesystems rather
than it's own little bit hiding in /var/somewhere/

But that does not make it an invalid solution (I didn't say "must be
free")... Does owncloud have a linux client that fully replicates?
Gordon
2014-06-28 05:07:53 UTC
Permalink
Post by Tim Watts
Post by Chris Davies
Post by Tim Watts
I am concluding there is no such thing as an open source replicating
filesystem layer for linux... I am surprised - I expected someone to say
"eeblefs does exactly what you need, why DYGFI you moron ;->"
Dropbox / One drive / Google drive / Hubic ?
Valid point, though...
Post by Chris Davies
Or if you want to run your own inhouse, consider Owncloud. Synchronisation
apps available for many platforms.
I did try this (with android in mind). Seems you have to get the
commercial version if you want to export arbitrary filesystems rather
than it's own little bit hiding in /var/somewhere/
But that does not make it an invalid solution (I didn't say "must be
free")... Does owncloud have a linux client that fully replicates?
Rsync, unison. Grsync is a front end for rsync.
Chris Davies
2014-06-28 15:20:39 UTC
Permalink
Post by Tim Watts
Post by Chris Davies
Or if you want to run your own inhouse, consider Owncloud. Synchronisation
apps available for many platforms.
I did try this (with android in mind). Seems you have to get the
commercial version if you want to export arbitrary filesystems rather
than it's own little bit hiding in /var/somewhere/
It does its own thing with regard to versioning, etc., in its "internal"
filesystem, yes, but I assume you'd be comfortable mounting a big (empty)
filesystem on /var/somewhere if you took this solution.
Post by Tim Watts
But that does not make it an invalid solution (I didn't say "must be
free")... Does owncloud have a linux client that fully replicates?
Yes. I'm running Owncloud client 1.5 (rather than the website's 1.6). What
I have just realised is that I haven't seen a command-line synchronisation
client.

https://software.opensuse.org/download/package?project=isv:ownCloud:desktop&package=owncloud-client

Chris
Continue reading on narkive:
Loading...