HTML basics

PDF version

Background / history

  • HTML = HyperText Markup Language
  • HyperText means text with links to other text.
  • HyperLinks = links
  • HTML is (or at least began life as) an instance of a document format called SGML = Standard Generalized Markup Language.
  • Another popular instance is DocBook, for writing technical manuals.
  • “Markup” refers originally to the marks a human editor makes on a document.

proofreading_symbols.jpg

Figure 1: Proofreading and copyediting markup
  • HTTP/HTML invented together (early 1990s) by Tim Berners-Lee. He was working at CERN and wanted a way for researchers to share results. First web browser was just called WorldWideWeb, implemented on a NeXT Computer (which is now in the Computer History Museum in Mountain View, California).
  • Other organizations started introducing HTTP/HTML-compatible browsers: Mosaic, Netscape (Mozilla → Firefox). Later: IE, Opera, Safari, Chrome, Edge, etc.
  • Some of these competed on introducing new innovations to HTTP/HTML, leading to “browser wars.” Cross-browser compatibility is still a concern, but not nearly as problematic as it was in earlier days.

Spaces in HTML

  • A plain text file is also valid HTML; however rules about spacing are special.
  • HTML does not obey more than one space. Sequences of various kinds of space (including tab, newline, etc.) are collapsed into just one space.
  • Here is an example demonstrating the spacing issue. The first box is the HTML source, the second is how it looks rendered in a browser.
    These        words           are
    
           really
    
    spaced                      out!
    
    

    f08c46382e881b0beca38b5feed51068493c7279.svg

  • So our first bit of markup will be to enforce vertical spacing (a line break. It is written <br>, with a normal ASCII less-than and greater-than sign. This syntax makes it a tag. Tag names like br are not case-sensitive.
  • Here is a similar example, with break tags added:
    These        words           are
    <br>
           really
    <br><br>
    spaced                      out!
    
    

    82f877921fd4daf2511c8b5154261a0ae9e27e88.svg

  • Enforced horizontal spaces use a different syntax: &nbsp; where the ampersand and semi-colon are the delimiters for an entity. The name of the entity, nbsp, stands for non-breaking space. Example:
    These &nbsp;  words           are
    <br>
    &nbsp;&nbsp;&nbsp;really
    <br><br>
    spaced   &nbsp;&nbsp;         out!
    

    bba8b72cef0c0ee1b7495c0c4825f3f817d74420.svg

  • The main purpose of &nbsp; is to indicate a space that will not become a line-break when wrapping words at the right margin. For example, I might want “CS 120 Spring 2019” to appear on my page, but I don’t want the numbers to wrap to the next line unless the corresponding letters go with them.
  • Here’s a version without non-breaking spaces, set at different widths so you can probably see some line-breaks appear between “CS” and “120” or “Spring” and “2019”.

    29386731c432fd2775f03cc86e1774f8236a9a9a.svg

    074a0577f8835fabd7e9545fd07f3c13912a526b.svg

    0b35381936f9cd04f180fbd2505bfaa6fe32a7de.svg

    9c38c186e0230fd06072fd393d0c69328e310fb4.svg

  • Now here is a version with non-breaking spaces, set at the same widths. The browser will only break at regular spaces, not at &nbsp;.
    My favorite course to teach is probably
    CS&nbsp;120 in Spring&nbsp;2019.
    

    2138b8c73c9c9682a233fc6970b0b90c0b9013a2.svg

    a1cac1ad64f67c33c9b6ab1243a71ce9e893c0c9.svg

    c55966d9093046972ecb207ad3cdbca4bc022304.svg

    c1c4294bf79f0a61f00c0fe3ec1cea44e7752824.svg

Entities

  • The entity syntax &____; can be used to insert all kinds of special characters into the text.
  • Most notably, if you need to write the characters that make up the syntax of HTML tags and entities (such as ampersand, less-than, greater-than), you should use these:
&lt; <
&gt; >
&amp; &
  • There are also entities for most non-ASCII characters. This helps with issues of character encodings, so that it’s possible to transmit the HTTP file as pure ASCII, even if it contains words like these:
    I went to the caf&eacute; on
    Kantstra&szlig;e with I&ntilde;igo
    

    35fc2860b3573a8c9e8c7bfd209df3a5afd50563.svg

  • It’s also possible just to write those characters directly in your editor, and save it using a particular text encoding, ideally UTF-8. The trouble is that if the user’s browser uses a different encoding by default, the characters could get corrupted:

    e9264bf018e107cc62f8a76ee3e923d1c97948ff.svg

  • To ensure that the browser is also using UTF-8, you want to add this metadata tag near the top of your file:
    <meta charset="utf-8">
    I went to the café on
    Kantstraße with Iñigo
    

Tags with attributes

  • The meta tag just introduced is an example of a tag with an attribute value. Attributes are keywords that appear within the angle brackets (less-than/greater-than) that make up the tag. They usually have an equals sign, and then quoted content.
  • Like tag names, attribute names are case-insensitive. There can be any number of spaces or newlines around the attribute name, around the equals sign, or between attributes.
  • The attribute value must be quoted if it contains spaces or other special characters. Probably it’s a good idea to quote every value, but some coders omit quotes for very simple values like width=32.
  • You can use ASCII double " or single ' quotes, as long as they match. You cannot, however, use any kind of “smart” (curly) quotes around attribute values. If your editor is inserting those for you, make it stop.
  • Besides meta, here’s a very commonly used tag for including images:
    My cat (not really)<br>
    <img width="150" height="134"
     src="./data/17/2d394e-dbdb-4366-ba0a-64a7a6210cfb/kitty.jpg">
    

    d7cbfa9961891869c09de84f4c61911b06da10ec.svg

Open/close (begin/end) tags

  • This is a syntax for tags that can nest and be intermingled with text. For example:
    <p>This is a <i>paragraph</i>
      containing <b>nested tags</b>
    </p>
    <p>Another paragraph appears
      below.</p>
    

    7f09dc003fee199c1d26f681c51b72501cc147f6.svg

  • These tag names stand for paragraph (p), italic (i), and bold (b).