Sunday, March 9, 2008

Optimistic and Pessimistic Creation

Say one needs to create objects in one namespace in a controlled fashion (i.e. avoid duplicates, avoid loss); what strategy can one employ?

The Problem

The basic issue one has to deal with comes from the fact that possibly multiple processes are actively accessing a given namespace thereby collisions can occur: in namespace N, process 1 creates object O1 whilst process 2 does the same (i.e. creates object O1).

The left-hand side diagram illustrates the problem: the competing processes race to win the creation event during which time a meta-stable state exists i.e. no one process can be certain (to a reasonable degree) which will win the race.

Two main strategies
There are (at least) two main strategies to address the problem: an optimistic and a pessimistic one.

The optimistic strategy can be used in patterns where a process is considered reasonably alone in accessing a given namespace. Example: a user accessing is private file folder.

The pessimistic strategy must be employed in situations where multiple processes potentially access a given namespace. Example: an application's user creation flow.

Optimistic Strategy
The optimistic strategy consists in a simple procedure. The system:

  1. verifies if object[X] exists
  2. creates object[X] if it does not exist
The notation object[X] just means: an object with for identifier X.

Pessimistic Strategy
The pessimistic strategy is outfitted with a probabilistic equivocation reduction process. The system:
  1. verifies if object X exists in the namespace
  2. generate a reasonable unique identifier UID
  3. creates a new identifier [X;UID]
  4. verifies if an object with for identifier [X;UID] exists in the namespace
  5. following #4, if no object exists, wait for Y units of time
  6. verifies if any object with identifier prefix X exists
  7. if the process' object with [X:UID] comes as first created (assuming multiple hits), then the process wins the creation race
An Application: Amazon S3 WEB Service

Say an application must be crafted with a system for registering users. Obviously, this process must be controlled: the system must be designed to deal with conditions such as:
Use Case: user X tries to register to the application with user-name UN1 at the same time as user Y goes for UN1.

One way to deal with the creation of unique user records on Amazon S3 I propose is:
  1. Generate a long unique identifier (e.g. 32bytes)
  2. Prepare a string identifier I/UID where I is the wanted identifier
  3. PUT-OBJECT with for URI /I/UID
  4. Wait for Y seconds (e.g. 3seconds)
  5. LIST-OBJECTS with for prefix /I
  6. If the object with for prefix /I/UID comes out to be the first created, then the process wins
The advantage of this algorithm stems from the fact that only 2 HTTP requests are necessary for each creation instance.