Jean-Lou Dupont's WEBlog: Mar 9, 2008

Say one needs to create objects in one namespace in a controlled fashion (i.e. avoid duplicates, avoid loss); what strategy can one employ?

The Problem

The basic issue one has to deal with comes from the fact that possibly multiple processes are actively accessing a given namespace thereby collisions can occur: in namespace N, process 1 creates object O1 whilst process 2 does the same (i.e. creates object O1).

The left-hand side diagram illustrates the problem: the competing processes race to win the creation event during which time a meta-stable state exists i.e. no one process can be certain (to a reasonable degree) which will win the race.

Two main strategies
There are (at least) two main strategies to address the problem: an optimistic and a pessimistic one.

The optimistic strategy can be used in patterns where a process is considered reasonably alone in accessing a given namespace. Example: a user accessing is private file folder.

The pessimistic strategy must be employed in situations where multiple processes potentially access a given namespace. Example: an application's user creation flow.

Optimistic Strategy
The optimistic strategy consists in a simple procedure. The system:

verifies if object[X] exists
creates object[X] if it does not exist

The notation object[X] just means: an object with for identifier X.

Pessimistic Strategy
The pessimistic strategy is outfitted with a probabilistic equivocation reduction process. The system:

verifies if object X exists in the namespace
generate a reasonable unique identifier UID
creates a new identifier [X;UID]
verifies if an object with for identifier [X;UID] exists in the namespace
following #4, if no object exists, wait for Y units of time
verifies if any object with identifier prefix X exists
if the process' object with [X:UID] comes as first created (assuming multiple hits), then the process wins the creation race

An Application: Amazon S3 WEB Service

Say an application must be crafted with a system for registering users. Obviously, this process must be controlled: the system must be designed to deal with conditions such as:
Use Case: user X tries to register to the application with user-name UN1 at the same time as user Y goes for UN1.

One way to deal with the creation of unique user records on Amazon S3 I propose is:

Generate a long unique identifier (e.g. 32bytes)
Prepare a string identifier I/UID where I is the wanted identifier
PUT-OBJECT with for URI /I/UID
Wait for Y seconds (e.g. 3seconds)
LIST-OBJECTS with for prefix /I
If the object with for prefix /I/UID comes out to be the first created, then the process wins

The advantage of this algorithm stems from the fact that only 2 HTTP requests are necessary for each creation instance.

Jean-Lou Dupont's WEBlog

Sunday, March 9, 2008

Optimistic and Pessimistic Creation

About Me

LinkedIn

Ohloh

Labels

Twitter

Followers

Stackoverflow

Blog Archive