Internet Basics - Hypertext, HTTP, URL and URI

In my journey to get a better understanding of how current web-based technology works, I run into a lot of technical terms. A lot of terms are abbreviations and known to me but if I start thinking about these terms (for instance HTTP) I have no idea what it precisely does and why it’s used. To get a better understanding of all this terminology and its use, I decided to write down these terms now and then and explain them in one of my posts. The first items I would like to dive into are the key concepts of the web: Hypertext, HTTP(S) and URI/URL/URN.

Hypertext

Hypertext is a piece of text that is displayed on an electronic device with a certain reference. The reference is called “hyperlink”. By clicking on a hyperlink, the reader can immediately access the location of the reference. Hypertext documents (also known as Hypertext Markup Language documents, in short, HTML documents) are interconnected by hyperlinks and can be activated by clicking on the link with a mouse or by pressing the display with the touch of your finger (in case you have a touchscreen). Hypertext enables the publication of information over the internet. The hypertext documents can either be static or dynamic.

Static web pages contain information that does not change automatically. It remains the same for every viewer on the site and has to be manually updated by someone if new content needs to be added. To do this, you have to know a lot about HTML and programming.

Dynamic web pages contain information that changes, depending on the viewer, the time of the day, the time zone, the viewer’s native language, etc. A dynamic web page can contain client-side scripting or server-side scripting.

With client-sided HTML scripting, the page can use a scripting language (for instance JavaScript) to change the data of the page as it’s dynamically built.

With server-side scripting, scripts are run on the server that hosts the page. The process for how the page is built is determined by parameters, defined in the server-side scripting.

Hypertext can support very complex and dynamic systems by linking them and by implementing cross-references.

Dynamic pages are chosen when a page needs access to a database or an external file to get information that needs to be dynamic. Google sends search queries to hundreds of databases and combines all information from those databases into a search result page. News sites also might use a dynamic web page. This way, reporters can submit their stories that can then automatically update the home page by using scripts. This is far more efficient than someone manually editing a static HTML page each time a new story and page is added. A dynamic site is also very user-friendly and allows people without a lot of HTML and programming knowledge, to contribute to the World Wide Web as well. This is done by a Content Management System (CMS). WordPress (which I use: see one of my previous posts for more information about how WordPress/CMS works) is a good example of this. I don’t know a lot about HTML and programming but by using the online editor it’s very easy to post and update my website.

HTTP

HTTP is short for Hypertext Transfer Protocol and is a set of standards that allows users of the World Wide Web to exchange information that is available on web pages. When you access a web page, you tell the browser to communicate over the HTTP by entering HTTP://

HTTP has had different versions:

HTTP/0.9: this was the first version of the HTTP, introduced in 1991
HTTP/1.0: introduced in 1996
HTTP/1.1: released in January 1997
HTTP/2: released in May 2015. It improved the page load times in the browser by compressing HTTP headers and prioritizing and multiplexing data requests
HTTP/3: released in December 2019. The goal of this version is to reduce data congestion by transmitting control messages over UDP.

Because we are currently at HTTP/3 I want to give some extra attention to the improvements. The big change is that TCP/IP is replaced by UDP/IP. TCP/IP is short for Transmission Control Protocol/Internet Protocol and is a set of rules, governing communications among all computers on the internet. TCP/IP dictates how information should be packed, sent and received and how to get to its destination. The Internet Protocol standard dictates the logistics of information packets that are sent out over the networks. It tells the information packets where to go and how to get there. IP allows any computer on the internet to forward a packet to another computer that is one or more intervals closer to the packet’s recipient. You can compare it with people in a big line, handing over a package to the next person from the start of the end of the line. The Transmission Protocol is responsible for ensuring the reliable transmission of data across networks that are connected by the Internet. TCP checks the information packets for errors and submits requests for re-transmissions if any are found.

UDP is short for User Datagram Protocol. It sacrifices reliability for speed and simplicity. Like TCP, UDP transfers packets of information by using the Internet Protocol (IP). But there is a difference in the way the packets are handled by the sender and receiver and the information that the packets contain. TCP is connection-oriented. It requires that a communication session is established and that the sender and receiver agree about what data is transferred. When TCP packets are received and pass an error check, the receiver responds with an acknowledgment. If TCP packets are corrupted or lost in transit, the receiver does not send an acknowledgment, and the sender re-sends the packets. UDP is connectionless. The receiver can request and listen for UDP packets, but no session is established: there is no “beginning” nor “end”, data is just sent and received. If the information packets are corrupted or lost in transit, the receiver may not be aware of the error because the receiver does not report errors to the sender nor acknowledges if the information package was received.

Both TCP and UDP are currently used. TCP is almost always used when files have to be sent that require accuracy. UDP is used when speed is the most important factor. UDP is used a lot for video streams and online games.

HTTPS

HTTPS stands for Hypertext Transfer Protocol Secure and is a protocol for transmitting HTTP over a connection that is encrypted by transport layer security. HTTPS is used to protect transmitted data from eavesdropping. I advise always to use HTTPS to prevent eavesdropping and to increase your security. HTTPS can protect the user of a web page from censorship by a government or an Internet Service Provider (ISP).

URI/URL/URN

URI is short for Uniform Resource Identifier and is intended to identify abstract or physical resources on the internet. The type of resources can vary, depending on the situation. It can be a website but also a sender of an e-mail or a recipient of an e-mail. Applications use this fixed designation to identify a resource or to request data from it.

URL is short for Uniform Resource Locator and is a form of URI. The URL is located at the top of the browser window in the address bar. On a desktop computer and a laptop, the URL is always visible. On smartphones or tablet browsers, the address bar containing the URL disappears when you scroll down and it only shows the domain when visible. When the address bar is not visible, scroll up the page. If only the domain is shown, tapping on the address bar shows the full address.

You can open a URL by clicking on a hyperlink or by manually typing a URL in the browser address bar. Spaces are not allowed in a URL and the URL string can only contain alphanumeric characters like !$-_+*’()

When a URL points to a script that performs additional functions, additional parameters (information) are added to the end of the URL. A search engine URL pointing to a search results page includes a parameter with the search query words. A lot of websites have a search option and this could look like:

https://store.steampowered.com/search/?term=star+wars

URN is short for Uniform Resource Name and is also a form of URI. The URN is location-independent and permanently designates a resource. URLs are primarily known in the form of web addresses but a URN can also appear as an ISBN to permanently identify a book for instance.

Both URL and URN follow the URI syntax so both are URI subsets: they are always URI’s. However, URIs are neither URLs nor URNs.

Final thoughts

There are so many terms in Tech that it is almost impossible to keep up. I hope this information was useful and gives a better understanding of a few basic terms that are used in relation to the internet. Feel free to contact me if you have any questions or if you have any additional advice/tips about this subject. if you want to keep in the loop if I upload a new post, don’t forget to subscribe to receive a notification by e-mail.

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31