A Filesystem on Noms

nomfs in action (for non-readers)

What’s Modern?

When people talk about a filesystem being “modern” there’s a list of features that they often have in mind. Let’s look at how the Noms database stacks up:

Snapshots

A filesystem snapshot preserves the state of the filesystem for some future use — typically data recovery or fast cloning. Since Noms is append-only, every version is preserved. Snapshots are, therefore, a natural side effect. You can make a Noms “snapshot” — any commit in a dataset’s history — writeable by syncing it to a new dataset. Easy.

Dedup

The essence of dedup is that unique data should be stored exactly once. If you duplicate a file, a folder, or an entire filesystem the storage consumption should be close to zero. Noms is content addressable, unique data is only ever stored once. Every Noms dataset intrinsically removes duplicated data.

Consistency

A feature of a filesystem — arguably the feature of a filesystem — is that it shouldn’t ever lose or corrupt your data. One common technique to ensure consistency is to write new data to a new location rather than overwriting old data — so called copy-on-write (COW). Noms is append-only, it doesn’t throw out (or overwrite) old data; copying modified is required and explicit. Noms also recursively checksums all data — a feature of ZFS and btrfs, notably absent from APFS.

Backup

The ability to backup your data from a filesystem is almost as important as keeping it intact in the first place. ZFS, for example, lets you efficiently serialize and send the latest changes between systems. When pulling or pushing changes git also efficiently serializes just the changed data. Noms does something similar with its structured data. Data differences are efficiently computed to optimize for minimal data transfer.

Designing a Schema

Initially, Badly

It’s in the name: Noms eats all the data. Feed it whatever data you like, and let Noms infer a schema as you go. For a filesystem though I wanted to define a fixed structure. I started with a schema modeled on a simplified ZFS. Filesystems keep track of files and directories with a structure called an “inode” each of which has a unique integer identifier, the “inode number”. ZFS keeps track of files and directories with DMU objects named by their integer ID. The schema would use a Map<number, Inode> to serve the same function (spoiler: read on and don’t copy this!):

Schema philosophy

This made sense for a filesystem. Did it make sense for Noms? I wasn’t trying to put the APFS team out of work, rather I was creating a portal from the shell or Finder into Noms. To evaluate the schema, I had the benefit of direct access to the Noms team (and so can all developers at http://slack.noms.io/). I learned two guiding principles for data in Noms:

A Better Schema

My first try made for a fine filesystem, just not a Noms filesystem. With a better understanding of the principles, and with help from the Noms team, I built this schema:

struct LinkedList {
data: Blob
next: Cycle<0>
}

Writing It

To build the filesystem I picked a FUSE binding for Go, dug into the Noms APIs, and wrestled my way through some Go heartache.

Demo

Showing it off has all the normal glory of a systems demo! Check out the documentation for requirements.

$ go build
$ mkdir /var/tmp/mnt
$ ./nomsfs /var/tmp/nomsfs::fs /var/tmp/mnt
running…
Your database fell into my filesystem!
$ noms show http://demo.noms.io/ahl_blog::fs
struct Commit {
meta: struct {},
parents: Set<Ref<Cycle<0>>>,
value: struct Filesystem {
root: struct Inode {
attr: struct Attr {
ctime: Number,
gid: Number,
mode: Number,
mtime: Number,
uid: Number,
xattr: Map<String, Blob>,
},
contents: struct Directory {
entries: Map<String, Cycle<1>>,
} | struct Symlink {
targetPath: String,
} | struct File {
data: Ref<Blob>,
},
},
},
}({
meta: {},
parents: {
5v82rie0be68915n1q7pmcdi54i9tmgs,
},
value: Filesystem {
root: Inode {
attr: Attr {
ctime: 1.4705227450393803e+09,
gid: 502,
mode: 511,
mtime: 1.4705227450393803e+09,
uid: 110853,
xattr: {},
},
contents: Directory {
entries: {
"usenix_winter91_faulkner.pdf": Inode {
attr: Attr {
ctime: 1.4705228859273868e+09,
gid: 502,
mode: 420,
mtime: 1.468425783e+09,
uid: 110853,
xattr: {
"com.apple.FinderInfo": 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 // 32 B
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00,
"com.apple.quarantine": 30 30 30 31 3b 35 37 38 36 36 36 33 37 3b 53 61 // 21 B
66 61 72 69 3b,
},
},
contents: File {
data: dmc45152ie46mn3ls92vvhnm41ianehn,
},
},
},
},
},
},
})
http://splore.noms.io/?db=https://demo.noms.io/ahl_blog&hash=2nhi5utm4s38hka22vt9ilv5i3l8r2ol

Nom Nom Nom

It took less than 1000 lines of Go code to make Noms appear as a Window in the Finder, eating data as quickly as I could drag and drop (try it!). Imagine what Noms might look like behind other known data interfaces; it could bring git semantics to existing islands of data. Noms could form the basis of a new type of data lake — maybe one that’s simple and powerful enough to bring real results. Beyond the marquee features, Noms is fun to build with. I’m already working on my next Noms application.

--

--

Building computers at Oxide; past: DTrace, ZFS, Delphix CTO, Transposit founder, CEO

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Adam Leventhal

Adam Leventhal

Building computers at Oxide; past: DTrace, ZFS, Delphix CTO, Transposit founder, CEO