Universally unique identifier

From Wikipedia, the free encyclopedia
Jump to: navigation, search

A universally unique identifier (UUID) is a 128-bit number used to identify information in computer systems. Microsoft uses the term globally unique identifier (GUID), either as a synonym for UUID or to refer to a particular UUID variant.

When generated according to the standard methods, UUIDs are for practical purposes unique, without requiring a central registration authority or coordination between the parties generating them. The probability that a UUID will be duplicated is not zero, but is so close to zero as to be negligible.

Thus, anyone can create a UUID and use it to identify something with near certainty that the identifier does not duplicate one that has already been created to identify something else, and will not be duplicated in the future. Information labeled with UUIDs by independent parties can therefore be later combined into a single database, or transmitted on the same channel, without needing to resolve conflicts between identifiers.

Adoption of UUIDs and GUIDs is widespread, with many computing platforms providing support for generating them, and for parsing their textual representation.

History[edit]

UUIDs were originally used in the Apollo Network Computing System and later in the Open Software Foundation's (OSF) Distributed Computing Environment (DCE). The initial design of DCE UUIDs was based on UUIDs as defined in the Apollo Network Computing System,[1] whose design was in turn inspired by the (64-bit) unique identifiers defined and used pervasively in Domain/OS, an operating system also designed by Apollo Computer. Later, the Microsoft Windows platforms adopted that design as globally unique identifiers (GUIDs). RFC 4122 registered a URN namespace for UUIDs, and recapitulated the earlier specifications, with the same technical content. By the time RFC 4122 was adopted as a standard, the ITU had also standardized UUIDs, based on the previous standards, and earlier versions of RFC 4122.

Standards[edit]

UUIDs are standardized by the Open Software Foundation (OSF) as part of the Distributed Computing Environment (DCE).

UUIDs are documented as part of ISO/IEC 11578:1996 "Information technology – Open Systems Interconnection – Remote Procedure Call (RPC)" and more recently in ITU-T Rec. X.667 | ISO/IEC 9834-8:2005.[2]

The Internet Engineering Task Force (IETF) published the Standards-Track, RFC 4122, technically equivalent to ITU-T Rec. X.667 | ISO/IEC 9834-8.

Format[edit]

In its canonical textual representation, the sixteen octets of a UUID are represented as 32 lowercase hexadecimal (base 16) digits, displayed in five groups separated by hyphens, in the form 8-4-4-4-12 for a total of 36 characters (32 alphanumeric characters and four hyphens). For example:

123e4567-e89b-12d3-a456-426655440000
xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx

The one to three most significant bits of digit N indicate the UUID variant and the four bits of digit M indicate the UUID version. In the example, M is 1 and N is a (10xx), meaning that the UUID is a variant 1, version 1 UUID; that is, a time-based DCE/RFC 4122 UUID.

The canonical 8-4-4-4-12 format string is based on the "record layout" for the 16 bytes of the UUID:

  • A 4-byte (8 hex digits) "time_low" integer giving the low 32 bits of the time
  • A 2-byte (4 hex digits) "time_mid" integer giving the middle 16 bits of the time
  • A 2-byte (4 hex digits) "time_hi_and_version", with the 4-bit "version" in the most significant bits, followed by the high 12 bits of the time
  • Two 1-byte fields (totalling 4 hex digits) called "clock_seq_hi_res" and "clock_seq_lo", with the "variant" multiplexed into the most significant 1 to 3 bits of clock_seq_hi_res
  • Six bytes (12 hex digits) with the 48-bit "node"

These fields mainly reflect those in version 1 and 2 time-based UUIDs, but the same 8-4-4-4-12 representation is used for all UUIDs, even if the semantics of the UUID is different from the time-based versions 1 and 2, or is opaque.

Microsoft GUIDs are sometimes represented with surrounding braces:

{123e4567-e89b-12d3-a456-426655440000}

This format should not be confused with "registry format", which refers to the format within the curly braces.[3]

RFC 4122 defines a Uniform Resource Name (URN) namespace for UUIDs. A UUID presented as a URN appears as follows:

urn:uuid:123e4567-e89b-12d3-a456-426655440000

Variants[edit]

One of the variants defined by RFC 4122, variant 0 (indicated by the bit pattern 0xxx N=0..7), is for backwards compatibility with the now obsolete Apollo Network Computing System UUID format, and the 3-bit variant bit pattern 111x N=e..f is reserved for possible future variants.

The other two variants, variants 1 and 2, are covered by the UUID specification. Variant 1 UUIDs (10xx N=8..a, 2 bits) are referred to as RFC 4122/DCE 1.1 UUIDs, or "Leach-Salz" UUIDs, after the authors of the original Internet Draft. Variant 2 (110x N=c..d, 3 bits) is characterized in the RFC as "Microsoft GUIDs". Variant bits aside, the two variants are the same except that when reduced to a binary form for storage or transmission, with variant 1 UUIDs using "network" (big-endian) byte order, while variant 2 GUIDs uses "native" (little-endian) byte order. In their textual representations, variants 1 and 2 are the same except for the variant (N) digit.

When byte swapping is required to convert between the network byte order of variant 1 and the little-endian "native" order of variant 2, the fields above define the swapping. The first three fields are unsigned 32- and 16-bit integers and are subject to swapping, while the last two fields consist of uninterpreted bytes, not subject to swapping. This is so even for version 3, 4, and 5 UUID's where the canonical fields do not correspond to the content of the UUID.

Note that while some important GUIDs, such as the identifier for the Component Object Model IUnknown interface, are nominally variant 2 UUIDs, many identifiers generated and used in Microsoft Windows software and referred to as "GUIDs" are actually standard variant 1 RFC 4122/DCE 1.1 network byte-order UUIDs, rather than little-endian variant 2 UUIDs. The current version of the Microsoft guidgen tool produces standard variant 1 UUIDs. Some Microsoft documentation states that "GUID" is a synonym for "UUID",[4] as standardized in RFC 4122. RFC 4122 itself states that UUIDs "are also known as GUIDs". All this suggests that "GUID", while originally referring to a variant of UUID used by Microsoft, has become simply an alternative name for UUID, with both variant 1 and variant 2 GUIDs being extant.

Versions[edit]

For both variants 1 and 2, five "versions" are defined in the standards, and each version may be more appropriate than the others in specific use cases. Version is indicated by the M in the string representation.

Version 1 UUIDs are generated from a time and a node id (usually the MAC address); version 2 UUIDs are generated from an identifier (usually a group or user id), time, and a node id; versions 3 and 5 produce deterministic UUIDs generated by hashing a namespace identifier and name; and version 4 UUIDs are generated using a random or pseudo-random number.

Version 1 (date-time and MAC address)[edit]

Version 1 concatenates the 48-bit MAC address of the "node" (that is, the computer generating the UUID), with a 60-bit timestamp, being the number of 100-nanosecond intervals since midnight October 15, 1582 Coordinated Universal Time (UTC), the date on which the Gregorian calendar was first adopted. A 14-bit "clock sequence" extends the timestamp in order to handle cases where the processor clock does not advance, does not advance fast enough, or where there are multiple processors and UUID generators per node. Thus 16,384 UUID's can be generated for each 100-nanosecond tick of the clock, and since the time and clock sequence total 74 bits, 274 (1.8x1022 or 18 sextillion) version 1 UUIDs can be generated per node id. By representing a single point in space (the node) and time (the number of intervals and clock sequence), the chance of two properly-generated version 1 UUID's being unintentionally the same is practically nil.

In the OSF-specified algorithm for generating new (V1) GUIDs, the user's network card MAC address is used as a base for the last group of GUID digits, which means, for example, that a document can be tracked back to the computer that created it. This privacy hole was used when locating the creator of the Melissa virus.[5]

RFC 4122 does allow the MAC address in a version 1 (or 2) UUID to be replaced by a random 48-bit node id.[6] In that case, the RFC requires that the least significant bit of the first octet of the node id should be set to 1.[6] This corresponds to the multicast bit in MAC addresses and setting it serves to differentiate UUIDs where the node id is randomly-generated from those based on MAC addresses from network cards, which typically have unicast MAC addresses.

Version 2 (date-time and MAC Address, DCE security version)[edit]

RFC 4122 reserves version 2 for "DCE security" UUIDs; but it does not provide any details. For this reason, many UUID implementatons omit version 2. However, more information is provided by the DCE 1.1 Authentication and Security Services specification.

Version 2 UUIDs are similar to version 1, except that the least significant 8 bits of the clock sequence (clock_seq_low) are replaced by a "local domain" number, and the least significant 32 bits of the timestamp (time_low) are replaced by an integer identifier meaningful within the specified local domain. On POSIX systems, local domain numbers 1 and 2 are for user ids (UIDs), and group ids (GIDs), respectively, and other local domain numbers are site-defined. On non-POSIX systems, all local domain numbers are site-defined.

The ability to include a 40-bit domain/identifier in the UUID comes with a tradeoff. On the one hand, 40 bits allow about 1 trillion domain/identifier values. On the other hand, with the clock value truncated to the 28 most significant bits, compared to 60 bits in version 1, the clock in a version 2 UUID will "tick" only once every 429.49 seconds, a little more than 7 minutes, as opposed to every 100 nanoseconds for version 1. And with a clock sequence of only 6 bits, compared to 14 bits in version 1, only 64 unique UUID's per node/domain/id can be generated per 7 minute clock tick, compared to 16,384 clock sequence values for version 1.[7][8][9]

Versions 3 and 5 (namespace name-based)[edit]

Version 3 and 5 UUIDs are generated by hashing a namespace identifier and name. Version 3 uses MD5 as the hashing algorithm, and version 5 uses SHA1.

The namespace identifier is itself a UUID. The specification provides UUIDs to represent the namespaces for URLs, fully qualified domain names, object identifiers, and X.500 distinguished names; but other UUIDs may be used as namespace designators.

To determine the version 3 UUID corresponding to a given namespace and name, the UUID of the namespace is transformed to a string of bytes, concatenated with the input name, then hashed with MD5, yielding 128 bits. Six bits are then replaced by fixed values, the 4-bit version (e.g. 0011 for version 3), and the 2-bit UUID "variant" (e.g 10 indicating RFC 4122 UUIDs). Since 6 bits are thus predetermined, only 122 bits contribute to the uniqueness of the UUID.

Version 5 UUIDs are similar, but SHA1 is used instead of MD5. Since SHA1 generates 160-bit digests, the digest is truncated to 128-bits before the version and variant bits are inserted.

Version 3 and 5 UUIDs have the property that the same namespace and name will map to the same UUID. However, the namespace and name cannot be determined from the version 3 or 5 UUID alone.

RFC 4122 recommends version 5 (SHA1) over version 3 (MD5) and counsels against use of UUIDs of either version as security credentials.

Version 4 (random)[edit]

Version 4 UUIDs are generated using random or pseudo-random numbers.

This method sets the version number (4 bits) and the UUID variant (2 bits) into a 128-bit randomly-generated integer. Therefore, since these 6 bits are predetermined, version 4 UUIDs have 122 random bits, and the total number of possible version 4 UUIDs is 2122, or 5.3x1036 (5.3 undecillion).

Pseudorandom number generation often lacks necessary entropy, and RFC 4122 recommends that when a high-grade source of randomness is not available, that one of the other UUID versions be used instead. Some implementations of version 4 UUIDs do not satisfy this requirement. For example, the WinAPI GUID generator, which uses a pseudo-random number generator, has been shown to produce UUIDs which follow a predictable pattern.[citation needed]

Collisions[edit]

Collision occurs when the same UUID is generated more than once and assigned to different referents. In the case of standard version 1 and 2 UUIDs using MAC addresses from network cards, collisions can occur only when an implementation varies from the standards, either inadvertently or intentionally.

In contrast, with version 1 and 2 UUIDs using randomly-generated node ids, hash-based version 3 and 5 UUIDs, and random version 4 UUIDs, collisions can occur even without implementation problems, albeit with a probability so small that it can normally be ignored. This probability can be computed precisely based on analysis of the birthday problem.[10]

For example, the number of random version 4 UUIDs which need to be generated in order to have a 50% probability of one collision is 2.71 quintillion, computed as follows:

[11]

The smallest number of version 4 UUIDs which must be generated before the probability for finding a collision is at least  p is approximated by the formula:

Thus, for there to be a one in a billion chance of duplication, 103 trillion version 4 UUIDs must be generated.

Uses[edit]

Significant uses include ext2/ext3/ext4 filesystem userspace tools (e2fsprogs uses libuuid provided by util-linux), LUKS encrypted partitions, GNOME, KDE, and Mac OS X,[12] most of which are derived from the original implementation by Theodore Ts'o.[13]

One of the uses of UUIDs in Solaris (using Open Software Foundation implementation) is identification of a running operating system instance for the purpose of pairing crash dump data with Fault Management Event in the case of kernel panic.[14]

In COM[edit]

There are several flavors of GUIDs used in Microsoft's Component Object Model (COM):

  • IID – interface identifier; (The ones that are registered on a system are stored in the Windows Registry at [HKEY_CLASSES_ROOT\Interface][15] )
  • CLSID – class identifier; (Stored at [HKEY_CLASSES_ROOT\CLSID])
  • LIBID – type library identifier; (Stored at [HKEY_CLASSES_ROOT\TypeLib][16])
  • CATID – category identifier; (its presence on a class identifies it as belonging to certain class categories, listed at [HKEY_CLASSES_ROOT\Component Categories][17])

As database keys[edit]

UUIDs are commonly used as a unique key in database tables. The NEWID function in Microsoft SQL Server version 4 Transact-SQL returns standard random version 4 UUIDs, while the NEWSEQUENTIALID function returns non-standard UUIDs which are committed to ascend in sequence until the next system reboot.[18] The Oracle Database SYS_GUID function returns a 16-byte 128-bit RAW value, based on a host identifier and a process or thread identifier, somewhat similar to a version 1 GUID.[19] PostgreSQL contains a UUID datatype,[20] though it leaves generation of the UUIDs themselves to the user. MySQL provides a UUID function, which generates standard version 1 UUIDs.[21]

The random nature of standard version 3,4, and 5 UUIDs and the ordering of the fields within standard version 1 and 2 UUIDs may create problems with database locality or performance when UUIDs are used as primary keys. For example, in 2002 Jimmy Nilsson reported a significant improvement in performance with Microsoft SQL Server when the version 4 UUIDs being used as keys were modified to include a non-random suffix based on system time. This so-called "COMB" (combined time-GUID) approach made the UUIDs non-standard and significantly more likely to be duplicated, as Nilsson acknowledged, but Nilsson only required uniqueness within the application.[22]

See also[edit]

References[edit]

  1. ^ Zahn, Lisa (1990). Network Computing Architecture. Prentice Hall. p. 10. ISBN 0-13-611674-4. 
  2. ^ tsbedh. "ITU-T Study Group 17 - Object Identifiers (OID) and Registration Authorities Recommendations". ITU.int. Retrieved 2016-12-20. 
  3. ^ "Registry Keys and Entries for a Type 1 Online Store". Microsoft Developer Network. Microsoft. 
  4. ^ "Globally Unique Identifiers". Microsoft Developer Network. Microsoft. 
  5. ^ Reiter, Luke (1999-04-02). "Tracking Melissa's Alter Egos". ZDNet. CBS Interactive. Retrieved 2017-01-16. 
  6. ^ a b Leach, P.; Mealling, M.; Salz, R. (July 2005). A Universally Unique IDentifier (UUID) URN Namespace. Internet Engineering Task Force. RFC 4122. https://tools.ietf.org/html/rfc4122. Retrieved 2017-01-17. 
  7. ^ "CDE 1.1: Remote Procedure Call". The Open Group. 1997. 
  8. ^ "DCE 1.1: Authentication and Security Services". The Open Group. 1997. 
  9. ^ Kuchling, A.M. "What's New in Python 2.5". Python.org. Retrieved 23 January 2016. 
  10. ^ Jesus, Paulo; Baquero, Carlos; Almaeida, Paula. "ID Generation in Mobile Environments" (PDF). Repositorium.Sdum.Uminho.pt. 
  11. ^ Mathis, Frank H. (June 1991). "A Generalized Birthday Problem". SIAM Review. Society for Industrial and Applied Mathematics. 33 (2): 265–270. doi:10.1137/1033051. ISSN 0036-1445. JSTOR 2031144. OCLC 37699182. 
  12. ^ gen_uuid.c in Apple's Libc-391, corresponding to Mac OS X 10.4
  13. ^ "ext2/e2fsprogs.git - Ext2/3/4 filesystem userspace utilities". Kernel.org. Retrieved 9 January 2017. 
  14. ^ "Crashdump Restructuring in Solaris". Blogs.Oracle.com. Oracle. Retrieved 9 January 2017. 
  15. ^ "Interface Pointers and Interfaces". Windows Dev Center - Desktop app technologies. Microsoft. Retrieved December 15, 2015. You reference an interface at run time with a globally unique interface identifier (IID). This IID, which is a specific instance of a globally unique identifier (GUID) supported by COM, allows a client to ask an object precisely whether it supports the semantics of the interface, without unnecessary overhead and without the confusion that could arise in a system from having multiple versions of the same interface with the same name. 
  16. ^ "Registering a Type Library". Microsoft Developer Network. Microsoft. Retrieved December 15, 2015. 
  17. ^ "Categorizing by Component Capabilities". Windows Dev Center - Desktop app technologies. Microsoft. Retrieved December 15, 2015. A listing of the CATIDs and the human-readable names is stored in a well-known location in the registry. 
  18. ^ "NEWSEQUENTIALID (Transact-SQL)". Microsoft Developer Network. Microsoft. 2015-08-08. Retrieved 2017-01-14. 
  19. ^ "Oracle Database SQL Reference". Oracle. 
  20. ^ "PostgreSQL 9.4.10 Documentation, Section 8.12 UUID Type". PostgreSQL.org. PostgreSQL Global Development Group. 
  21. ^ "MySQL 5.7 Reference Manual, Section 13.20 Miscellaneous Functions". MySQL.com. Oracle. 
  22. ^ Nilsson, Jimmy. "InformIT". InformIT. Retrieved 2012-06-20. 

External links[edit]