Captivating Internet Minutiae

I can’t claim to know much about how the Internet works. Every time I research the subject, I come across new complexities that have always been there but I never knew existed. I’m just glad that there were (and still are) a few geniuses that were astute enough to put all these technologies together to make the World Wide Web for all of us to enjoy.

While researching ideas for new domain names, I came across an interesting find. More specifically, I was wondering what Top-Level Domains were available for creating new domain names. Top-Level Domains (also known as TLDs) are that part of a URL after the last “dot.” For example: .com, .org, .gov, and so on. I thought it would be cool if I could have a custom TLD like .sarmiento or something like that, then the address to my Web site could look like mh.sarmiento or just hiram.sarmiento.

Much to my surprise, I found that ICANN (the organization responsible for handling the creation of new domain names) announced last June that it will begin to implement a system to allow users to apply for the creation of a new TLD. So, Microsoft could, for example, apply for .msn, Apple could apply for .mac, and I could apply for .sarmiento. Well, I’m not exactly sure if ICANN will accept applications from individuals, but it would be nice. The applications are expected to be available in the second quarter of this year.

Though creating new TLDs might not seem important at first (no one wants to develop more ways to cause chaos on the Internet, after all), it will actually serve to better organize the Internet. For example, people who wanted to research photography could look for sites ending in .photo instead of the generic .com. Another benefit that arises from registering new TLDs is internationalization (meaning that people in other countries can register custom TLDs in their own language), and that leads me to my second interesting find.

If you haven’t noticed, domain names are limited to the 26 Roman letters (A-Z, a-z), the 10 Arabic numerals (0-9), and a few special punctuation characters. (This set of characters is known as ASCII or American Standard Code for Information Interchange—note the American part.) This makes it pretty difficult for anyone wanting to create a domain name in a language whose alphabet consists of characters not found in the ASCII set.

This is where Internationalized Domain Names (IDN) and Internationalizing Domain Names in Applications (IDNA) step in. Under this system, domain names that contain non-ASCII characters can be created by converting them using two processes into domain names consisting solely of ASCII characters. The best way to explain this is, most likely, by an example.

Let’s say I wanted my Web site domain name to be Prü (Prüfung means test in German.) Notice that the u with the umlaut (or diaeresis or those two little dots above it) is not an ASCII character. Before IDNA was invented, it would be impossible to create this domain name, but we can now use it to make an Internet-compatible domain name. First, the domain name is broken apart into two parts: Prüfung and de. The second part is left alone; it doesn’t need any conversion. (Just in case you’re wondering de is the TLD for Germany.) The first part, however, is in need of a little makeover.

We begin that makeover by running it first through a process called Nameprep to get prüfung. We then run that through another process called Punycode to get prfung-4ya. (Notice how the domain name now consists of only ASCII characters.) We then add xn-- to the front of our result to get xn--prfung-4ya. At long last, we combine this output with that second part we broke apart above to get our final domain name:

I know. That whole explanation can seem rather boring and winding, but it’s actually quite interesting. The reason the conversion is necessary is that the computers handling all the traffic on the Internet can only understand the limited set of ASCII characters; it’s the only thing that was available when the Internet was created. (At least I think it was.) So, in order to accommodate the Internet users of the world whose language is something other than English, this IDNA system is necessary.

If you want to try out these conversions for yourself, I found a site that does IDN conversions. Just enter a test domain name that uses non-ASCII characters, and it will output a corresponding ASCII domain name. If you’re interested in actually registering an IDN, VeriSign has provided a list of registrars that are accredited to handle IDN registrations. Going back to the discussion of TLDs, there are currently only a few internationalized TLDs that have been set up as tests and they are likely to be temporary.

With the advent of IANA’s custom TLD program and through the use of IDNA to create non-ASCII domain names, it won’t be long before the Internet is full of Web sites with non-English addresses.

Leave a reply