Global IDs – The More You Know

October 2, 2019 — Stephen Hudak

I was reading a Google review near Johnson City, TN by a guy named John Johns who rated the chemical plant loading facilities at only 4 stars because of how “jacked up” the entrance was. Apparently you need to swing wide and use both lanes to get your rig in there—but they did unload him very fast. Apparently.

I wanted to know more about this chemical plant and so began Wikipedia binging. As I was reading about solenoids and transducers more generally, I realized I knew next to nothing about electromagnetism. I can describe the behavior, conduct experiments, predict phenomena based on what I know from observation and high-school physics. But I don’t really know what’s going on.

Yet every day I rely on so much electromagnetism and mechano-electro-magnetic dynamic functioning going just right to carry on a normal existence on this, our wonderful hollow earth. Just moments into a developing thought about the ethical (or unethical?) act of literally beating a dead horse, I had cause to think about another thing I take for granted in my everyday life: Global IDs.

What about the nature of these Global IDs we work with all the time? The SDE database we use as the foundation for most data in our GIS manages them nicely. Right-click and enable it or run some code and there it is. Add a new record—there it is. Query for the value to figure out relationships. It’s the bee’s knees.

I remember when it was common to use Object IDs to track relationships or impart ‘uniqueness’ to a record in a table or shapefile or database object. Serial data types like Object IDs in a geodatabase or a sequence in a SQL database work within the table or database. Move that data somewhere else, however, and that data ceases to be unique.

Those flimflam data types cause all sorts of problems. Do some sort of manipulation without paying attention and cry a little as you realize your folly a week later.

Global IDs are indispensable. They aren’t hard to use, and they solve these problems—you just can’t remember the values unless you are Rain Man. That’s all. But since they are managed so nicely there isn’t any real friction to using them anyway. Also, they are required when working with Esri product functions like offline sync for a feature service or any kind of replication.

To be clear: GUID fields in a geodatabase are fields that can store Global IDs. You can edit and delete the values. But the Global ID field is fully managed by the database/software. You can’t edit those and you can’t force a duplicate value in there without some serious effort.

But what is it? How can it be unique when loads of people on thousands of disconnected machines are creating tons of them all the time? How does that work?

To start we need to cover what it is. A Global ID is a universally unique identifier (UUID). When used in Microsoft-based software, the identifier is usually called a globally unique identifier (GUID). They are generated based on standards that change over time and in different use cases.

There is an age-old debate about pronunciation that runs along the line of ‘goo-id’ and ‘gwid’. I truly don’t care on this one, so I have no dog in this fight. A Stack Exchange post is filled with pure gold on the pronunciation question. One user lists the 4 different pronunciations I’ve seen, and it is interesting to note there are one, two, three, and four syllable variations to say “GUID!”

You can view the discussion here.

These Global IDs are represented as 32 hexadecimal digits. They are displayed in 5 groups with hyphens in between like the following:

f50b59ce-7463-44c3-9e0c-a346a93b2839

Note the form it follows is 8-4-4-4-12 totaling 36 characters when you add the hyphens to the alphanumeric characters. The first character of the 3rd grouping indicates its UUID version number. The above UUID has a 4 in that position, meaning it is version 4 which is generated using a random or pseudo-random number.

Microsoft’s GUIDs are commonly represented with surrounding braces. This explains why we sometimes get different formats of Global IDs. For instance, if I use the ArcGIS API for Python to access the Global ID of a record in a feature service that came from a SQL Server-based SDE database I get the following:

{75974B12-6432-4FEA-BB32-F96C41AD1A8F}

But if I access a feature layer hosted on ArcGIS Online or hosted in my Portal’s Data Store, I’ll get this:

90251196-145e-43ce-a513-666a9318f14e

Notice the first value is flanked by braces and the letters are all capitalized whereas the second has no braces and no capitalization. Portal and Data Store both utilize PostgreSQL databases in one form or another. The function to generate a UUID from random numbers in PostgreSQL is uuid_generate_v4() and the output is always in the standard form like we see above.

PostgreSQL accepts multiple formats for inputs, however. Here are some acceptable formats:

The third variant above is the format we observe when working with ArcGIS Online or Portal content as seen in the URL of an item like the following:

8cf80ed09a3e4b26bb1f93ea6f4dbdb3

Those 32 hexadecimal digits code for a 128-bit number. A 128-bit number would take 2128 different combinations to guess. This is where we understand how these can be unique. The number of combinations possible in a 128-bit number is so large the chance of duplicating these at random is considered negligible. Its interesting to note that the chance is still non-zero!

These duplications are called collisions and for the version of GUIDs we use in our GIS we would need to generate 2.71 quintillion random UUIDs to get to a 50% probability of getting one. This is equivalent to generating 1 billion UUIDs per second for about 85 years which would use… a fair amount of disk.

Collisions are unlikely. Therefore, everyone can keep generating away and no one will collide. One downside, however, is the problem of locality or performance when they are used as primary keys and the database gets large. The random nature of the keys means searching or indexing becomes a chore as the size increases. Some forms of UUIDs alter or add a prefix of a non-random series of values for better query performance.

In any case, maybe you know more about Global IDs now than you did 5 minutes ago. Or maybe you are Rich Zamorski and you mastered this stuff when you were 5. Do I smell coffee? Its like a mix between coffee and Fruity Pebbles. Remember the breakfast cereal Fruity Pebbles? It turned your milk sweeter than boilt peanuts. For the record its properly pronounced, “goo-id” you Philistines. I’d also accept G-U-I-D if you insist.

We Wrote the Book

The Indispensible Guide to ArcGIS Online

Download It for Free

One comment

  • Bryski Buckski says:

    This article is informative and hilarious at the same time. i never thought “goo-i-d” could be so much fun to say. bravo.

What do you think?

Leave a comment, and share your thoughts

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>


This site uses Akismet to reduce spam. Learn how your comment data is processed.