Fossil

Fossil is a file server which serves real files from a disk partition (as opposed to fileservers serving synthetics). Fossil is an archival file server, meaning that a snapshot of the current filesystem is taken periodically at configurable intervals with one minute resolution. These (ephemeral) snapshot are available at any time, files from the "past" can be copied or compared easily to the actual version, deleted files can be recovered.

Fossil can also create archival snapshots, normally one each day, which are backed up to a venti file server.

While normal snapshots are discarded after some configurable time, archival snapshots are never discarded.

Venti

A venti server is a file server optimized for space efficient storage of archival data, that is, for storing periodically copies of amounts of data which mainly stays the same over time.

Very roughly explained: if two files in a filesystem have the same content, the content is only stored once. If the same file is stored today and tomorrow, it is only stored once, there are just two references to the same contents.

Usage considerations

On a Plan9 terminal just use a fossil fileserver to access your disk and your files.

On a Plan9 fileserver use fossil+venti. Take backups from the venti partition and eventuelly stack several disks via fs (file stack device).

If you run your Plan9 terminal standalone without a separate fileserver use fossil+venti

Space considerations

A fossil partitions needs space for the file data and the metadata (directories) as a whole. The snapshots are created "copy on write", so space consumption between two snapshots corresponds to the respectively changed data.

The default installation configures one snapshot each hour and an expiration of two days, i.e. 48 snapshots.

We give some examples to illustrate possible approaches to disk space consideration. For simplicity 1G=1000M, 1M=1000k, 1yr=300d. The calculations are at best thumb-rule aproximations and have to be taken with a grain of salt.

Note that these are not experienced setups, I just want to get a clue about venti and fossil and write down my ideas here.

Example 1: Home Server

We suppose that some users do write "word" documents, and others develop and compile programs, but all users do have a considerable amount of already downloaded or created files, which are consulted (spreadsheets, documentation, templates, ...).

Our users already have about 1GByte of data in their homedirectories. Each of them generates about 100k data per day, that is about 30M a year. We want the setup to last for 5 years, and accomodate for 80 users.

The total disk space will be: 80G+5*30M*80=92G

Within two days the generated data is: 80*2*100k=16M, as we see, the space for ephemeral snapshots is negligible.

When using a 100GByte Harddisk, we could expire the snapshots within 80 days.

To estimate the venti partition we assume, that people duplicate 50% of the stored information. Initially we cope with 80G, which reduce to 40GByte when folded into venti. Storage increase is 5*30M*80=12G over five years and will be compressed to 6G. Thus it should be sufficient to use a 46G disk partition for the venti back storage.

Example 2: Mail/News/Database Server

For the same amount of users we set up a fossil+venti disk space hosting, with very questionable goals:
users mailboxes:
1G/user, 100k of new emails each hour. We'll be able to see all the deleted messages from last month, year ..., however we might not need to search through mailing list archives
selective News feed:
5G/day, we expire after 20 days. We'll be able to see all the expired articles again
databases:
up to 5G, data revolution: 1%/hr. We'll be able to recover from crash or restore back to the past.

We'll set a 10 minute period for ephemeral snapshots and will make archival snapshots every four hours.

Total data set: 80*1G+5G*20+5G=200G

Since almost none of the data es repetitive and changes very fast (creates and deletes), we assume each snapshot to require the full volume of changed data

Frequency: Lets cover a twenty days period for ease of history exploration: 6*24*20=2880 snapshots (btw.: 20d=28800min)

Cost per snapshot: 100k/6+5G/24*6+0.5/6=34M

Total cost for ephemeral snapshots: 34M*2880=98G

With four archival snapshot each day we get the following for five years: 34M*4*300*5=204G

Our fossil partitions needs 298G then, and the venti partition 404G.

Example 3: play around with Plan9

I get a 20G Harddisk and an old PC lying around and want to play around with Plan9. Eventually I'll use it as a fileserver to access all my personal documents and my mailbox from all around the globe.

Since the disk is old, I'll have to replace it in two years. I suppose my data to compress to a 70% of size when stored on venti, and that about 400k of data changes each day (a lot of emails from mailing lists, etc.). For fun I want three months of ephemeral snapshots

Archival Snapshots: 400k*300*2*0.7=168M, it's negligible

Ephemeral Snapshots: 400k*3*30=36M, also negligible

So I devide the disk in a venti partition 30% smaller then the fossil partition: 0.7f+f=20G => f=11.76G, v=8.23G

Disk preparation for venti

The ventiaux(8) man page says:

  • "The total size of the index should be about 2% to 10% of the total size of the arenas, but the exact depends both the index block size and the compressed size of block stored to Venti." the default installation proposes a 5% sized isect partition.

Example 1::: venti=46G, isect=2.3G Example 2::: venti=404G, isect=20G Example 3::: venti=8.23G, isect=410M

Caveats

When fossil runs out of space you get: cacheAllocBlock: xxx1 disk is full

You may be able to recover by:

  1. shutting down the machine
  2. booting from a CD or rescue partition
  3. mounting fossil by hand
  4. discarding some of the older snapshots to free space If this is not possible, you can restore an archival snapshot from the corresponding venti partition, with a smaller footprint, this means reformatting! your fossil.

On production systems both procedures seem impractical, so be sure to monitor your disk usage.

Setup example

Note: I wanted to set up a cpu/fileserver with an idea similar to what I am used in Linux: one disk (partition) for the root filesystem, and another for the user data, which would be shared on the network. This is in prinziple possible, however not practicable:

Since the /adm and the /usr directory are hardcoded in fossil to store the user authentication data for the fossil fileserver, exporting the "data" disk would export just these. If I enter the cpu/fileserver with drawterm, I would operate on the "root" disk, instead of the "data" disk.

This could be overcome with some namespace setup, however seems to annoying in the long run. So the following example was never finished, since I just set the cpu up with the big-disk at once.

In this example we assume the disk layout from example 1, homeserver:

  • disk: 150G
  • fossil: 80G
  • venti: 70G

    • arenas: 66.5G
    • isect: 3.5G The setup follows these steps:
  • write a mbr

  • setup the partition table with a Plan9 partition
  • 'prep'are the Plan9 partition, i.e. 'Plan9'-partition it into
    • fossil
    • arenas
    • isect
  • format the arenas
  • format the isect
  • combine them into a venti setup
  • start venti
  • TO BE CONTINUED So: we put a disk into our computer and see it when starting at '/dev/sd01'. Fir

Whishlist for fossil+venti

  • reserve a certain amount of space for the "filesystem owner" (like ext3), to be able to clean up a partition
  • expanded fossilcons df command to list the space required by each snapshot/epoch
  • automatically discard snapshots when disk gets full
  • signal (via the console?) when disk gets full
  • write some kind of log to a 'secure' place, especially the archival snapshot signatures with time/date to find out from where to restore a fossil if fossil, fossil/last is damaged.
  • venti could have a separate vac tree, where fossil blocks get stored when fossil runs out of space I should be able to acomplish this on my own:

  • write a df command which shows occupied space on a fossil partition

  • send fscons messages via email to the sysadm