I was recently working with records in a database that were identified by a Universally Unique Identifier, aka a UUID. These IDs are strings of characters that look something like “31ae75f0-cbe0-11e8-b568-0800200c9a66”.
I needed to know which records were generated during in a particular time period, but sadly there was no field about dates to be found. Unambiguously a database design flaw given what I needed to do – but it did lead me to discover that at least “version 1 UUIDs” have a date and time of creation embedded within them.
So how do we get from 31ae75f0-cbe0-11e8-b568-0800200c9a66 to the date and time the UUID was generated? I’d say “simple”, but it isn’t exactly. Thanks to Wikipedia, and famkruithof.net for the behind-the-scenes info of how this works.
So, the key components of the UUID to note by position are those highlighted below:
Take the highlighted parts of the UUID aside, reversing the order of the chunks, so as to get:
There’s your “60-bit timestamp, being the number of 100-nanosecond intervals since midnight 15 October 1582” (thanks Wikipedia).
Rambling sidenote alert:
In case you’re wondering why 15 October 1582 is so special, then it’s because that was the date that the previous “Julian” calendar system was first replaced with the newer “Gregorian” calendar system, now the most widely used calendar throughout the world, under the diktat of Pope Gregory XIII.
Why? The Julian calendar had worked on the premise that the average year is 365.25 days long (the 0.25 days being compensated for by the existence of a leap day every 4 years).
However, that’s slightly different to the true length of the solar year, which is 365 days, 5 hours, 48 minutes, 45.25 seconds. Consequently. it was noticed that there was some “drift” whereby the date the calendar noted that the equinoxes should occur slowly became out of sync with real life observations of the equinox. Effectively, as the Britannica notes, this discrepancy causes the “calendar dates of the seasons to regress almost one day per century”. This is relevant to religions such as Catholicism in that it affects the calculation of, for instance, when to celebrate Easter.
The Gregorian calendar made a couple of changes, perhaps most notably introducing the rule that a century year (e.g. 1900) only counts as a leap year if its number is divisible by 400 with no remainder, instead of adhering to the Julian system where it only needs to be divisible by 4. On average, this shortens the length of the measured year by 0.0075 days, which keeps it in better sync with the reality of the solar year. It’s still not perfect, but leads to a much lower rate of drift of around 1 day per 3,030 years.
In order to account for the drift that had occurred by the time of this realisation, the Pope also advocated for fast forwarding the date by a few days to catch up with reality. So for Gregorian advocates, there was literally never a 14 October 1582. Overnight, the date skipped from October 4th 1582 through to October 15 1582, at least in countries where this system was accepted right away (and subsequently by the relevant ISO standards that most computing systems adhere to).
Not everywhere was keen to adopt this system right away – perhaps unsurprisingly Catholic countries were more likely to take the leap quicker. But type 1 UUIDs presumably don’t care too much about religious politics.
End of rambling side note
Note that the base value of this timestamp, 15 October 1582, is a different date than the classic January 1st, 1970-based timestamp you may know and love from Unix-type systems, which many databases, including Google BigQuery, work with. So it needs conversion.
Let’s start by getting it into decimal (using one of many web-based converters – I picked this one for no particular reason).
1e8cbe031ae75f0 hex = 137583952401430000 decimal
This is in units of 100 nanoseconds. I really don’t care about nanosecond granularity for my use case, so let’s divide it by 10,000,000 to get to seconds.
137583952401430000 100-nanoseconds intervals = 13758395240.1 seconds
This is now the number of seconds that have elapsed between the start of October 15 1582 and the date / time my UUID was created.
To get it into a more conventional 0 = 1970-01-01 based timestamp format, we then need to subtract the number of seconds between October 15 1582 and January 1st 1970 from it (12,219,292,800, as it happens):
13758395240 - 12219292800 = 1539102440
So now you have the number of seconds since Jan 1st 1970 and the time your UUID was generated. Pop it into a Unix timestamp interpreter (this one, for example) to translate it into something human-understandable, and you’ll discover I generated that UUID on October 9th 2018. Hooray.
Sidenote: I generated my test UUID on this site, if you ever need a valid one a hurry.
Flicking between websites and calculators is clearly a painful and inefficient way to do this operation, especially if you have a whole column of UUIDs to decode. Luckily many databases, including Bigquery, have the necessary functions to do it en masse.
In the below, imagine your UUID is stored in the field “uuid”. Pop this into your query, and the “extracted_timestamp” field will now contain the timestamp showing when the UUID was generated.
TIMESTAMP_SECONDS ( CAST ( ( CAST( CONCAT('0x', -- starting the value with 0x will ensure Bigquery realises it's a hex string SUBSTR(uuid, 16, 3), -- extract relevant parts of the UUID needed to recreate timestamp SUBSTR(uuid, 10, 4), SUBSTR(uuid, 1, 8) ) AS INT64) -- convert hex to dec / 10000000) -- divide to get from 100 nanoseconds periods to seconds - (141427 * 24 * 60 * 60) -- rebase from start of Gregorian calendar (1582-10-15) to unixtime (1970-01-01) AS INT64 ) ) AS extracted_timestamp