Say one needs to create objects in one namespace in a controlled fashion (i.e. avoid duplicates, avoid loss); what strategy can one employ?
The Problem
The basic issue one has to deal with comes from the fact that possibly multiple processes are actively accessing a given namespace thereby collisions can occur: in namespace N, process 1 creates object O1 whilst process 2 does the same (i.e. creates object O1).
The left-hand side diagram illustrates the problem: the competing processes race to win the creation event during which time a meta-stable state exists i.e. no one process can be certain (to a reasonable degree) which will win the race.
Two main strategies
There are (at least) two main strategies to address the problem: an optimistic and a pessimistic one.
The optimistic strategy can be used in patterns where a process is considered reasonably alone in accessing a given namespace. Example: a user accessing is private file folder.
The pessimistic strategy must be employed in situations where multiple processes potentially access a given namespace. Example: an application's user creation flow.
Optimistic Strategy
The optimistic strategy consists in a simple procedure. The system:
- verifies if object[X] exists
- creates object[X] if it does not exist
Pessimistic Strategy
The pessimistic strategy is outfitted with a probabilistic equivocation reduction process. The system:
- verifies if object X exists in the namespace
- generate a reasonable unique identifier UID
- creates a new identifier [X;UID]
- verifies if an object with for identifier [X;UID] exists in the namespace
- following #4, if no object exists, wait for Y units of time
- verifies if any object with identifier prefix X exists
- if the process' object with [X:UID] comes as first created (assuming multiple hits), then the process wins the creation race
Say an application must be crafted with a system for registering users. Obviously, this process must be controlled: the system must be designed to deal with conditions such as:
Use Case: user X tries to register to the application with user-name UN1 at the same time as user Y goes for UN1.
One way to deal with the creation of unique user records on Amazon S3 I propose is:
- Generate a long unique identifier (e.g. 32bytes)
- Prepare a string identifier I/UID where I is the wanted identifier
- PUT-OBJECT with for URI /I/UID
- Wait for Y seconds (e.g. 3seconds)
- LIST-OBJECTS with for prefix /I
- If the object with for prefix /I/UID comes out to be the first created, then the process wins