 
        | Technical Level: | Basic/Beginner | Published: | |
|---|---|---|---|
| Author: | Nigel Peck | Last Updated: | - | 
This article is a continuation of part one.
Advanced XHTML Building Blocks
Before we look at any more elements there are a few more basic building blocks of XHTML that we need to cover in order for you to understand the topics we will examine. Hopefully you now have an understanding of elements, start tags, end tags, the basic structure of an XHTML document and the text elements we looked at in the previous section.
In this section we will be looking at the topics listed below, don't worry if the topic titles look a bit scary, they'll make sense when you get to them, but the titles will make it easier to check back for later.
Character References and Entity References
Character references aren't as scary as they sound (no need to sweat). Let's find out why they exist, and then we can look at how you code them and use them.
Take a look at your keyboard, can you type a copyright symbol © or an inverted exclamation mark ¡? Unless you're using a pretty strange keyboard then the answer is no.
Imagine you are a Web browser (User Agent) reading a Web page file and you come across a left angle bracket <. How do you know if it is the start of a tag or an angle bracket used in the content of the document? Answer, you don't.
The solution to these two problems? Entity references and character references (funny, that's also the title of this section).
Entity references and character references are extremely similar in XHTML, and people often confuse the two names. Basically they tell a Web browser (User Agent) that it should insert a certain character in their place.
If you don't know what a character is, it's a catch all word for a letter, number, punctuation mark etc. A is one character, AB is two characters, N!P 3 is five characters (four? you forgot to count the space). You get the idea.
A character reference or entity reference represents one character in XHTML, entity references can represent more than one character in SGML or XML but that's another story that you don't need to worry about right now.
The difference between a character reference and an entity reference is this. Character references use numbers while entity references use names. Let's look at the copyright symbol we saw above. To insert a copyright symbol into your document you would use either of the following:
©
Try the © entity reference
©
Try the © character reference
If you try the examples above (and your Web browser (User Agent) isn't broken) you will see a copyright symbol for both examples. As I said before, the entity reference uses names (copy), the character reference uses numbers (169). Observant readers will notice that the character reference also has a sharp symbol #. Let's take a closer look.
An entity reference begins with an ampersand. This is then followed by the name of the entity reference, which is followed by a semi-colon, much in the same way that you use a left angle bracket and right angles bracket to denote (delimit) the start and finish of a tag.

Character references begin with an ampersand followed by a sharp symbol. This is then followed by the number of the character reference, which is again followed by a semi-colon.

Whether you use an entity reference or a character reference is up to you. I tend to use entity references because I find names easier to remember than numbers but the choice is yours. Just don't forget that you need the sharp symbol with the character reference and not with the entity reference.
I will be explaining some of the entity and character references available to you in later sections, but I will not be showing you all of them individually as there are too many (approximately two hundred and fifty). For your reference I have prepared three articles detailing the three sets available to you. These are at the following locations.
Not all of them work in all browsers so be sure to test the ones you choose to use.
Ampersands and Left Angle Brackets
Although it is possible to enter ampersands & and left angle brackets < with most keyboards, you should always use an entity or character reference when they appear in your content. This is for the reason that I have already mentioned. There is no way for a computer to know the difference between the start of an entity/character reference or a tag from an ampersand or a left angle bracket respectively. Using character or entity references for those characters avoids this problem.
The following code contains an ampersand and a left angle bracket:
<p>Never use a < or an & directly in your content.</p>
The above code is wrong and should be written in one of the two following ways, firstly with entity references and then with character references:
<p>Never use a < or an & directly in your content.</p>
<p>Never use a < or an & directly in your content.</p>
White Space
White space means any characters in your document that do not serve any purpose other than creating space. This includes spaces, tabs, line breaks and zero width spaces. A line break is the character (or 2) at the end of each line that tells the computer to start a new line. A zero width space is used to separate words in languages such as Thai.
There are two issues relating to white space that you need to be aware of.
White Space Between Words
No matter how much space you use between your words, Web browsers (User Agents) will always reduce it to a single space character. There is one exception to this that we will cover in the next section. When I say words I mean any characters that are not white space and have no white space between them.
That might sound a bit complicated, but it's not, it just sounds complicated when you try to describe it. An example should help you to understand.
<p>This		content
   has 	  a 				lot
  of		 white   space
between 			the
words.</p>If you view the above example in a visual Web browser (User Agent) you will see that all of the content is on a single line with a single space between each word. That's all there is to it.
This feature comes in handy, it means that you can use tabs, spaces and new lines to make your code easier to read and not worry about your document looking funny in a visual Web browser (user agent).
Space Around Tags
You need to be careful about putting white space around your tags until you get used to this rule and then it will become second nature.
If you want a space before or after a word that is contained by an element you should put that space outside the element. By this I mean before the start tag and after the end tag. If you put it inside you might not get any white space between your words.
<p>Always leave white space <strong>outside</strong> your elements when you want it and not<strong> inside </strong>.</p>
In the example above the strong element containing the word outside has white space outside the tags, which is the way it should be. The strong element containing the word inside has white space inside the tags and not outside. On some Web browsers (User Agents) there may not be any space displayed between the words not and inside.
I have not linked to an example for this because most Web browsers will display the content without problems, but they don't have to, so it's better to get into the habit of doing it right.
Comments
When you are creating your documents you may want to leave information for yourself or for others viewing the document code but not viewing the document in a Web browser (User Agent). To do this you use what we call a comment. A comment has the following syntax:

You should be careful not to use two dashes together within your comments as this could be thought to be the end of the comment (even without the right angle bracket).
Here's an example:
<!-- This is the first Web page I ever created. -->
<p>My first Web page.</p>
<!-- This is a comment
			spread over two lines. -->As you will see if you view the above example, the text in the comments is ignored. Comments are useful for leaving yourself reminders for later such as what still needs doing to a document.
Summary
In this section we have completed our look at the basic building blocks of XHTML. We've seen how to use special characters in our pages with character references and entity references, we've looked at the way white space is handled and we've also seen how you can add comments to your code.
In the next section we're going to continue our coverage of the elements you can use that relate to text including, amongst others, headings, line breaks and pre-formatted text.
Text That Says Something 2
In this section we will be looking at more of the elements (and a couple of entity references) in the XHTML arsenal that relate to text, further to those covered in the section "Text That Says Something".
Specifically we will be covering:
- Headings with <h1> through to <h6>,
- Subscripts and Superscripts with <sub> and <sup>,
- Line breaks with <br>,
- Non-breaking space with  ,
- Soft Hyphens with ­ and
- Pre-formatted text with <pre>.
Before we start I would like to re-iterate an important point, all elements should be used for their meaning and not their visual effect. You can make any element look any way you want using style sheets, and we'll be covering it later on. So please, do yourself a favour and use elements for the reason they're intended.
There are many benefits to this, the two most important being that it makes your site much more accessible to disabled users and those who are using alternative browsers such as Personal Digital Assistants and in-car browsers. It also helps your search engine placement.
So now that rant's over and done with let's get on with it.
Headings with <h1> through to <h6>
Any document longer than a few sentences needs to be split up into sections to be usable. This is not a concept invented for the web, it was probably conceived not long after writing was invented.
To mark headings in your XHTML there are six elements that each relate to deeper levels of subheadings as the number goes up. For clarity the six elements are:
- <h1>,
- <h2>,
- <h3>,
- <h4>,
- <h5> and
- <h6>.
You should always start with <h1>, followed by <h2> for sub-headings, <h3> for sub-sub-headings. you get the idea. You should never start with <h1> and then go straight to <h3>, or start with <h2>.
In the past Web designers have started with <h2> or <h3> because they wanted the visual effect of smaller text than commonly offered by <h1> but, as already mentioned (getting sick of it yet?), this can be achieved with style sheets and is not a valid reason for starting your headings with anything other than <h1>.
Heading are block level elements, they have space above and below them, as you'd expect.
It is important that you use the heading elements to mark your headings as it ensures users of all user agents can understand your document structure. It also helps you get higher rankings in search engines as the search engines have a better idea what the document is about by examining the headings.
Here's a sample three level document, I'm sure you can work out what a document with deeper levels would look like.
<h1>XHTML Web Design for Beginners: Introduction</h1>
<h2>Introduction</h2>
<p>This article is for readers who have either no prior experience...</p>
<h3>Colour</h3>
<p>I have used colour in the example...</p>
<h3>No Programs</h3>
<p>I will not be showing you how...</p>In general, most XHTML documents should have only a single <h1> element. If you decide to use more than one then you should be sure that they are two separate topics and you have a good reason for having them on the same page. If two topics are on the same page then usually they are connected, and you should have a single <h1> describing both topics and then <h2>s for each sub-topic. It is very rare, if at all, that a page should have two <h1> elements.
A user agent for the blind will often use headings as a way of giving the user an overview of the document so they can decide which part they wish to hear.
Subscripts and Superscripts with <sub> and <sup>
Subscripts are letters or digits which appear smaller and at the bottom of the line such as the 2 in H2O. Superscripts are again smaller and appear at the top such as the th in the 13th of February.
To mark subscripts and superscripts in XHTML you use the <sub> and <sup> elements respectively. An example should make it clear:
<p>The symbol for water is H<sub>2</sub>0.</p>
<p>This example was written on the 13<sup>th</sup> of February.</p>Line Breaks with <br>
When you are writing your documents you may want to indicate that there should be a new line started without closing a paragraph. To do this you can use the <br> element. <br> is an empty element so you must ensure that you use the empty element syntax by writing it as <br />.
Here's an example:
<p>
The Road goes ever on and on<br />
	Down from the door where it began.<br />
	Now far ahead the Road has gone,<br />
	And I must follow, if I can,<br />
	Pursuing it with eager feet,<br />
	Until it joins some larger way<br />
	Where many paths and errands meet.<br />
	And wither then? I cannot say.
</p>This is an element that has no effect outside visual browsers.
Non-breaking space with  
Web browsers may split a set of words onto two lines. Sometimes this is not what you want. The solution is the entity reference   which stands for non-breaking space.
If you insert a   between your words instead of a space, with no spaces on either side, that text will be treated as a single line and never be broken up. Here's an example:
<p>This is a solid line.</p>
If you view the example in a visual browser try making your browser window thin and see if you can make the text go on to 2 lines, you can't. Now try with normal spaces.
<p>This is not a solid line.</p>
This is another element that has no effect outside visual browsers.
Soft Hyphens with ­
Soft Hyphens are used to indicate a point in a word where you would like it to be split on to two lines if that is necessary. It simply makes for a nicer appearance when space is limited such as when you have text in a thin column (which we'll be covering later).
To use it you simply insert it in the word at the point where you would like the potential split to be. Here's an example:
<p>I have no idea what antidisestablishment­arianism means.</p>
In a visual browser, if you collapse your browser window so that the long word (which I won't repeat) is against the right hand edge of the window then it should split the word onto two lines at the point where the soft hyphen occurs.
This is another element that has no effect outside visual browsers.
Pre-formatted text with <pre>
Remember when we covered white space in the last section and I told you that it is always collapsed into a single space? Well there's one exception, the <pre> element allows you to layout your text in the same way you want it to appear in a visual user agent. <pre> is a block level element which to remind you means that it has space above it and below it.
Using <pre> is simple, let's redo the example we did with <br> above using <pre> instead:
<pre>The Road goes ever on and on
Down from the door where it began.
Now far ahead the Road has gone,
And I must follow, if I can,
Pursuing it with eager feet,
Until it joins some larger way
Where many paths and errands meet.
And wither then? I cannot say.</pre>You've guessed it, this is another element that has no effect outside visual browsers.
Summary
That's nearly it for text elements, hopefully you now understand most of the elements and entity references that you can use in your XHTML documents to mark-up your text.