Home > Articles > XHTML Web Design for Beginners: Advanced XHTML Building Blocks

XHTML Web Design for Beginners: Advanced XHTML Building Blocks

skip to navigation

This section of the site features articles published between 2002 and 2004. They remain here for reference purposes and may contain information that is out of date.

Article Index

Advanced XHTML Building Blocks

Before we look at any more elements there are a few more basic building blocks of XHTML that we need to cover in order for you to understand the topics we will examine. Hopefully you now have an understanding of elements, start tags, end tags, the basic structure of an XHTML document and the text elements we looked at in the previous section.

In this section we will be looking at the topics listed below, don't worry if the topic titles look a bit scary, they'll make sense when you get to them, but the titles will make it easier to check back for later.

Character References and Entity References

Character references aren't as scary as they sound (no need to sweat). Let's find out why they exist, and then we can look at how you code them and use them.

Take a look at your keyboard, can you type a copyright symbol © or an inverted exclamation mark ¡? Unless you're using a pretty strange keyboard then the answer is no.

Imagine you are a Web browser (User Agent) reading a Web page file and you come across a left angle bracket <. How do you know if it is the start of a tag or an angle bracket used in the content of the document? Answer, you don't.

The solution to these two problems? Entity references and character references (funny, that's also the title of this section).

Entity references and character references are extremely similar in XHTML, and people often confuse the two names. Basically they tell a Web browser (User Agent) that it should insert a certain character in their place.

If you don't know what a character is, it's a catch all word for a letter, number, punctuation mark etc. A is one character, AB is two characters, N!P 3 is five characters (four? you forgot to count the space). You get the idea.

A character reference or entity reference represents one character in XHTML, entity references can represent more than one character in SGML or XML but that's another story that you don't need to worry about right now.

The difference between a character reference and an entity reference is this. Character references use numbers while entity references use names. Let's look at the copyright symbol we saw above. To insert a copyright symbol into your document you would use either of the following:

&copy;

Try the &copy; entity reference

&#169;

Try the &#169; character reference

If you try the examples above (and your Web browser (User Agent) isn't broken) you will see a copyright symbol for both examples. As I said before, the entity reference uses names (copy), the character reference uses numbers (169). Observant readers will notice that the character reference also has a sharp symbol #. Let's take a closer look.

An entity reference begins with an ampersand. This is then followed by the name of the entity reference, which is followed by a semi-colon, much in the same way that you use a left angle bracket and right angles bracket to denote (delimit) the start and finish of a tag.

XHTML Entity Reference Syntax - An ampersand & followed by a name (e.g. copy) followed by a semi-colon ;

Character references begin with an ampersand followed by a sharp symbol. This is then followed by the number of the character reference, which is again followed by a semi-colon.

XHTML Character Reference Syntax - An ampersand & followed by a sharp symbol # followed by a number (e.g. 169) followed by a semi-colon ;

Whether you use an entity reference or a character reference is up to you. I tend to use entity references because I find names easier to remember than numbers but the choice is yours. Just don't forget that you need the sharp symbol with the character reference and not with the entity reference.

I will be explaining some of the entity and character references available to you in later sections, but I will not be showing you all of them individually as there are too many (approximately two hundred and fifty). For your reference I have prepared three articles detailing the three sets available to you. These are at the following locations.

Not all of them work in all browsers so be sure to test the ones you choose to use.

Ampersands and Left Angle Brackets

Although it is possible to enter ampersands & and left angle brackets < with most keyboards, you should always use an entity or character reference when they appear in your content. This is for the reason that I have already mentioned. There is no way for a computer to know the difference between the start of an entity/character reference or a tag from an ampersand or a left angle bracket respectively. Using character or entity references for those characters avoids this problem.

The following code contains an ampersand and a left angle bracket:

<p>Never use a < or an & directly in your content.</p>

The above code is wrong and should be written in one of the two following ways, firstly with entity references and then with character references:

<p>Never use a &lt; or an &amp; directly in your content.</p>

View example 2

<p>Never use a &#60; or an &#38; directly in your content.</p>

View example 3

White Space

White space means any characters in your document that do not serve any purpose other than creating space. This includes spaces, tabs, line breaks and zero width spaces. A line break is the character (or 2) at the end of each line that tells the computer to start a new line. A zero width space is used to separate words in languages such as Thai.

There are two issues relating to white space that you need to be aware of.

White Space Between Words

No matter how much space you use between your words, Web browsers (User Agents) will always reduce it to a single space character. There is one exception to this that we will cover in the next section. When I say words I mean any characters that are not white space and have no white space between them.

That might sound a bit complicated, but it's not, it just sounds complicated when you try to describe it. An example should help you to understand.

<p>This		content 

   has 	  a 				lot
  of		 white   space     
between 			the 

words.</p>

View example 4

If you view the above example in a visual Web browser (User Agent) you will see that all of the content is on a single line with a single space between each word. That's all there is to it.

This feature comes in handy, it means that you can use tabs, spaces and new lines to make your code easier to read and not worry about your document looking funny in a visual Web browser (user agent).

Space Around Tags

You need to be careful about putting white space around your tags until you get used to this rule and then it will become second nature.

If you want a space before or after a word that is contained by an element you should put that space outside the element. By this I mean before the start tag and after the end tag. If you put it inside you might not get any white space between your words.

<p>Always leave white space <strong>outside</strong> your elements when you want it and not<strong> inside </strong>.</p>

In the example above the strong element containing the word outside has white space outside the tags, which is the way it should be. The strong element containing the word inside has white space inside the tags and not outside. On some Web browsers (User Agents) there may not be any space displayed between the words not and inside.

I have not linked to an example for this because most Web browsers will display the content without problems, but they don't have to, so it's better to get into the habit of doing it right.

Comments

When you are creating your documents you may want to leave information for yourself or for others viewing the document code but not viewing the document in a Web browser (User Agent). To do this you use what we call a comment. A comment has the following syntax:

XHTML Comment Syntax - A left angle bracket < followed by an exclamation mark ! and two dashes --. Your comment text. Then two dashes -- and a right angle bracket >.

You should be careful not to use two dashes together within your comments as this could be thought to be the end of the comment (even without the right angle bracket).

Here's an example:

<!-- This is the first Web page I ever created. -->
<p>My first Web page.</p>
<!-- This is a comment
			spread over two lines. -->

View example 5

As you will see if you view the above example, the text in the comments is ignored. Comments are useful for leaving yourself reminders for later such as what still needs doing to a document.

Summary

In this section we have completed our look at the basic building blocks of XHTML. We've seen how to use special characters in our pages with character references and entity references, we've looked at the way white space is handled and we've also seen how you can add comments to your code.

In the next section we're going to continue our coverage of the elements you can use that relate to text including, amongst others, headings, line breaks and pre-formatted text.

> > Text That Says Something 2