HTML basics
Background / history
- HTML = HyperText Markup Language
- HyperText means text with links to other text.
- HyperLinks = links
- HTML is (or at least began life as) an instance of a document format called SGML = Standard Generalized Markup Language.
- Another popular instance is DocBook, for writing technical manuals.
- “Markup” refers originally to the marks a human editor makes on a document.
- HTTP/HTML invented together (early 1990s) by Tim Berners-Lee. He was working at CERN and wanted a way for researchers to share results. First web browser was just called WorldWideWeb, implemented on a NeXT Computer (which is now in the Computer History Museum in Mountain View, California).
- Other organizations started introducing HTTP/HTML-compatible browsers: Mosaic, Netscape (Mozilla → Firefox). Later: IE, Opera, Safari, Chrome, Edge, etc.
- Some of these competed on introducing new innovations to HTTP/HTML, leading to “browser wars.” Cross-browser compatibility is still a concern, but not nearly as problematic as it was in earlier days.
Spaces in HTML
- A plain text file is also valid HTML; however rules about spacing are special.
- HTML does not obey more than one space. Sequences of various kinds of space (including tab, newline, etc.) are collapsed into just one space.
- Here is an example demonstrating the spacing issue. The first box is
the HTML source, the second is how it looks rendered in a browser.
These words are really spaced out!
- So our first bit of markup will be to enforce vertical spacing
(a line break. It is written
<br>
, with a normal ASCII less-than and greater-than sign. This syntax makes it a tag. Tag names likebr
are not case-sensitive. - Here is a similar example, with break tags added:
These words are <br> really <br><br> spaced out!
- Enforced horizontal spaces use a different syntax:
where the ampersand and semi-colon are the delimiters for an entity. The name of the entity,nbsp
, stands for non-breaking space. Example:These words are <br> really <br><br> spaced out!
- The main purpose of
is to indicate a space that will not become a line-break when wrapping words at the right margin. For example, I might want “CS 120 Spring 2019” to appear on my page, but I don’t want the numbers to wrap to the next line unless the corresponding letters go with them. - Here’s a version without non-breaking spaces, set at different widths so you can probably see some line-breaks appear between “CS” and “120” or “Spring” and “2019”.
- Now here is a version with non-breaking spaces, set at the same
widths. The browser will only break at regular spaces, not at
.My favorite course to teach is probably CS 120 in Spring 2019.
Entities
- The entity syntax
&____;
can be used to insert all kinds of special characters into the text. - Most notably, if you need to write the characters that make up the syntax of HTML tags and entities (such as ampersand, less-than, greater-than), you should use these:
< |
< |
> |
> |
& |
& |
- There are also entities for most non-ASCII characters. This helps
with issues of character encodings, so that it’s possible to
transmit the HTTP file as pure ASCII, even if it contains words
like these:
I went to the café on Kantstraße with Iñigo
- It’s also possible just to write those characters directly in your editor, and save it using a particular text encoding, ideally UTF-8. The trouble is that if the user’s browser uses a different encoding by default, the characters could get corrupted:
- To ensure that the browser is also using UTF-8, you want to add this
metadata tag near the top of your file:
<meta charset="utf-8"> I went to the café on Kantstraße with Iñigo
Recommended document structure
<!DOCTYPE HTML> <html lang="en"> <head> <meta charset="utf-8"> <title>My first page</title> </head> <body> <h1>My first page</h1> <p>This is a paragraph about me.</p> </body> </html>
✓ valid HTML
- The
<title>
that appears within the<head>
section is only used as metadata. It may appear in the browser tab or in bookmarks, but it does not appear on the page itself. That’s why we duplicate that title as an<h1>
tag (top-level heading) in the<body>
section.