Chapter 2: XML basics – The Metadata Manual

2

XML basics

S.Y. Zoe Chao

Abstract:

The goal of this chapter is to give the reader a basic understanding of extensible Markup Language, or XML. This will help the reader’s ability to learn and use the metadata schemas in the following chapters. XML is a markup language used to store and exchange data behind the scenes, and is known for its simplicity and flexibility. It is an open standard and no special software is required to read or write an XML record. Even though you may never have to create XML records directly, you should be familiar with and able to read an XML record. Example XML records are provided in this chapter.

Key words

XML

eXtensible Markup Language

metadata

encoding

tags

What is XML?

XML, or eXtensible Markup Language, is the language most metadata is stored in behind the scenes. You might never deal directly with XML while creating metadata because most content management systems provide forms to enter your metadata. They may store the metadata in a database or in XML, and transfer metadata to other systems using XML. Understanding XML will give you a useful foundation for understanding metadata. XML is a universal language and many metadata examples are expressed in XML.

XML was developed from Standard Generalized Markup Language (SGML) as a flexible mark-up language. A mark-up language is a system of annotating a resource that distinguishes the annotations (comments, instructions, etc.) from the content that is intended to be viewed. Historically, publishers marked up texts with printing instructions. This tradition was carried forward into the World Wide Web, and mark-up languages, such as XML and Hyper Text Markup Language (HTML), differentiate instructions for the display of the content from the content itself.

According to the World Wide Web Consortium (W3C),

Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879). Originally designed to meet the challenges of large-scale electronic publishing, XML is also playing an increasingly important role in the exchange of a wide variety of data on the Web and elsewhere. (W3C, 2003)

XML is simple and flexible, and an open standard. You can purchase XML editors such as <oXygen/>, or download free software, such as Notepad++, to create XML records. You can even create XML records in simple text editors such as Notepad. However, XML editors give you color-coded syntax and many more features to make debugging XML records much easier than using a simple text editor.

XML is made up of elements and attributes, and uses angle brackets (< >) to separate the elements. An element is also commonly referred to as a field. For example, if you encoded the title of this book in XML, it would look like this:

<title>The Metadata Manual</title>

As you can see, the element name, “title,” is enclosed in the angle brackets to open the element, and follows a / to close the element. You can continue this simple XML record with the authors’ names:

<author>Rebecca L. Lubas</author>

<author>Amy S. Jackson</author>

<author>Ingrid Schneider</author>

XML is so simple and flexible that anything can be an element. As a librarian or cataloger, you can probably see how easy it would be to create a catalog record in XML. We can add the publisher of the book, a date of publication, and a subject to the catalog record, and have a basic XML record for this book:

<title>The Metadata Manual: A practical workbook</title>

<author>Rebecca L. Lubas</author>

<author>Amy S. Jackson</author>

<author>Ingrid Schneider</author>

<publisher>Chandos Publishing</publisher>

<date>2013</date>

<subject>Metadata</subject>

This is XML at its simplest. This record can be pulled into a database or displayed (using eXtensible Stylesheet Language Transformation, or XSLT) on a web page. For a web display, XSLT tells the browser font size, placement, background color, etc., while XML only carries the text. XML itself does not specify how the elements should be written. For example, XML, or the XML schema you’re using, will not tell you if the author’s last name should be first or last. It does not tell you how to encode a date, or how much of the title should be in the <title> element. These types of details are specified in content standards such as AACR2, CCO, and DACS (Describing Archives: A Content Standard).

More information about the XML elements can be provided by using attributes. For example, if we want to specify that the subject is from the LCSH, we can use an attribute in the <subject> element.

<subject type="LCSH">Metadata</subject>

An attribute in the date element could help clarify that the date is a date of publication, and not the date of printing.

<date type="publication">2013</date>

Attributes always come from a controlled vocabulary defined by the metadata schema that you choose. We’ll discuss several metadata schemas later in this book.

As you can see, XML as a container for carrying and transporting metadata is very simple and flexible. However, for XML to be truly powerful, we need to agree on a set of elements, attributes, controlled vocabularies, and encoding standards so that we’re all speaking the same language. For example, if I use the element <author>, but someone else uses the element <creator>, the computer will not recognize them as the same concept. Both of us would need to use the same element name. The metadata schema that you choose will define the elements as well as the attributes.

MARCXML, EAD, VRA Core 4.0, and CDWA Lite are metadata schemas created by metadata professionals in the cultural heritage community to define how XML should be used in specific instances. These metadata schemas are represented through machine-readable XML Schema Documents (XSDs) that specify how the schema is to be used. Dublin Core does not include an XSD, but is commonly encoded in XML using the oai_dc XSD for Dublin Core.

How are XML records created?

XML documents are used to transfer data over the internet. XML is text based, so it can be opened with any programs that can read text files. Additionally, an XML file can be created and edited in a simple text editor program, such as Notepad, which is one big advantage of using XML. Here is a simple XML file example:

<book>

 Harry Potter

</book>

Even though there is only one element here, it is a well-formed XML document. We can name this file “book.xml” or “harrypotter.xml” or others, as long as it’s saved as “.xml”.

W3C defines an XML element as “everything from (including) the element’s start tag to (including) the element’s end tag” (W3Schools, 2012). Similarly to HTML, the start tag is presented by two angle brackets: “<” and “>” wrapping around the text: book. The end tag of this element will be exactly the same, except adding a slash “/” right after the first angle bracket. And everything between the start and the end tags, including the white space before and between the words, is the content of this element. It can be parsed by the XML parser, which will understand that the text “Harry Potter” is describing a book, not a person. And <book> and </book> are “markup.”

In the context of HTML, to markup is to design how the document will display on the browser. The tags in HTML documents are predefined. For example, in HTML, the headings tags <h1>and <h2> designate different font sizes; and content surrounded by paragraph tags, <p>, will have space before and after it. XML tags are not predefined. Not only are you not limited with the aspects of your markup; you can create tags to fit your needs. Therefore, you can mark “Harry Potter” up as:

<my_favorite_book>

 Harry Potter

</my_favorite_book>

As mentioned above, the white space is part of the element; and it is “character data,” just like the string “Harry Potter.” Character data, in the context of XML, means the text that is not markup, does not contain any tags. The whitespace before the text is mainly for aesthetics, and mostly ignored by the computer. It is fine if it’s written in one line with no space: <my_favorite_book>Harry Potter</my_favorite_book>.

While XML is very flexible and allows you to design your own tags and schema, there are several rules that must be complied with in order to create a well-formed XML document.

Rule 1: Open and Close Tags

In XML, it is essential to have open and close tags balance on either side of the content even if it is an empty element. With empty elements such as </br>, it is acceptable to combine the opening and closing tags into a single element.

Rule 2: Tags are Case Sensitive

It’s important to note that, unlike HTML, XML is case sensitive. In HTML, <H1> is the same as <h1>. However, in XML, <book> is not the same as <Book>, and neither is the same as <BOOK>. If the element is started with <book>, you have to close it with </book>; you cannot close it with </Book>.

Rule 3: Tree with One Root

An XML document is structured like a tree. There is a root element at the beginning of every document. As the description of the object goes into more details, it is quite possible the XML will have “branches” formed and “leaves” shaped. Let’s use the same example, “Harry Potter.” Suppose you would like to add the book’s author to your description. You will need to use a more granular markup to differentiate “Harry Potter” and “J K. Rowling.” It could look like this:

<book>

 <title> Harry Potter</title>

 <author> J K. Rowling</author>

<book>

The root element here is <book>, which is the first element in this document that contains all other elements. We can say that <book> is the parent element to both <title> and <author>; and that <title> and <author> are siblings. There will be only one root in every XML document.

In addition to having more tags, another difference you may notice between the first and the second example is that the element <book> has different types of content. In our first example, the content is just text (or character data), “Harry Potter”. However, in the second example, the <book> has element contents because it has two child elements. It is common for an XML file to be composed of some elements that have only child elements and some that have only text in the content. It is not against the rules to mix up markups and text within a tag like this:

<book>

 <title>Harry Potter</title> is my favorite book written by <author>J K. Rowling</author>.

</book>

In a case like this, we can say that the <book> element contains text and child elements, or “mixed content.” Mixed content is common in online articles, reports, and web pages. Because it is harder to parse mixed content in order to extract information, mixed content is much less common in the XML documents that are used for data exchange processes, like the metadata schemas we’ll talk about later.

Since only one root element is the parent of all other elements, the structure of the XML document relies on the scope of the subject you want to describe. The root element <book> is suitable if the scope is only one book (Figure 2.1).

Figure 2.1 <Book> as XML root element

However, if the scope is all the books in your collection, you will need to change the root element, for example, <my_library>, in order to encompass all other elements (Figure 2.2).

Figure 2.2 <My_library> as XML root element

Certainly, you can start with the smaller content blocks and build up to a bigger structure as you go. However, it is critical to neatly assemble the blocks in a XML document, meaning that the child element needs to be completely enclosed by its parent element.

Rule 4: Elements must be correctly nested

It is not uncommon to see tags overlapping like this in a HTML file:

<b><i>The text is bold and italic.</b></i>

Often this type of ill-formed HTML document will display in the browser just fine. However, in XML, the open and close tags must nest properly like a Russian doll, or the document will not parse properly. In an XML document, every child can only have one parent. If the start tag of element B is inside the element A, the end tag of element B must also be inside element A. The element A is the only parent of the element B.

<A><B>value</B></A>

In our example, you must finish describing one aspect of your subject, such as the books you have in your collection, before you can go on to talk about the CDs you have. The trunks and the branches need to have a defined, not a messy, structure.

<my_library>

 <books>

 <book>

  <title>Harry Potter</title>

  <author>J K. Rowling</author>

 </book>

 <book>

  <title>Naked</title>

  <author>David Sedaris</author>

 </book>

 </books>

 <CDs>

 <CD>

  <title>Yellow Submarine</title>

  <performer>The Beatles</performer>

 </CD>

 <CD>

  <title>The Essential Duke Ellington</title>

  <performer>Duke Ellington</performer>

 </CD>

 </CDs>

</my_library>

Rule 5: Attribute values must be quoted

An attribute is a name-value pair attached to the element’s start tag. It can be used to describe the element. Following the previous example, if you would like to differentiate the novels and essays in your book collection, we can add an attribute to the book element:

<my_library>

 <books>

 <book category="novel">

  <title>Harry Potter</title>

  <author>J K. Rowling</author>

 </book>

 <book category="essays">

  <title>Naked</title>

  <author>David Sedaris</author>

 </book>

 </books>

</my_library>

You can see that the name of the attribute is separated from the value by an equals sign. The attribute values are enclosed in the quotation marks. For an XML parser, as long as the attribute value is quoted, it does not matter whether a single or a double quotation mark is used. <book category=“novel”> means the same as <book category=‘novel’>. The other rule for the attribute is that no element can have more than one attribute with a given name. For example, it is wrong to have attribute “category” show up twice in the book element, like this:

<book category="novel" category="fantasy"> ← wrong

You may feel uncertain about when to use an attribute. According to the book “XML in a Nutshell” by Harold and Means, “when and whether one should use child elements or attributes to hold information [is] a subject of heated debate” (2001, p. 16). The rule of thumb is to use attributes only for the information that is not relevant to the data you are describing, because attributes have the following limitations:

 Attributes cannot contain multiple values.

 Attributes cannot contain tree structures.

 Attributes are difficult to read and maintain.

 Attributes are not easily expandable for future changes.

In the previous example, we can pull out the attribute “category” and use it as a child element and solve the problem:

<book>

 <category>novel</category>

 <category>fantasy</category>

 <title>Harry Potter</title>

 <author>J K. Rowling</author>

</book>

There are situations in which you need to be more specific with the element itself, such as when adding an ID number. In that case, adding attributes is the way to go. In the following example, the id attributes are to differentiate the two book elements; they’re not a part of the information of the books themselves.

<my_library>

 <books>

 <book id="1">

  <title>Harry Potter</title>

  <author>J K. Rowling</author>

 </book>

 <book id="2">

  <title>Naked</title>

  <author>David Sedaris</author>

 </book>

 </books>

</my_library>

You may have noticed that we’re using an underscore (_) to bridge multiple words in the element in our examples. That’s because element names must not contain a space. However, not all characters are allowed in the element names. We’ll talk about that soon, but first we need to talk about some specific characters in the XML document.

Rule 6: Certain characters have a special meaning in XML

Suppose we want to put “< 500 pages” (fewer than 500 pages) in a <page_count> element for the Harry Potter book.

<book>

 <title>Harry Potter</title>

 <author>J K. Rowling</author>

 <page_count> < 500 pages</page_count>

</book>

A record like this example will generate an error because the “less than” character (<) has a special meaning in XML and will be interpreted as the start of a new tag. So will the characters for “greater than” (>), “ampersand” (&), “apostrophe” (‘), and “quotation mark” (“). Each of these characters means something in the XML syntax. So, instead of using these five characters in our XML document, we will need to replace them with the predefined entity references:

Character Entity reference Meaning
< &lt; less than
> &gt; greater than
& &amp; ampersand
&apos; apostrophe
&quot; quotation

The XML should look like this:

<book>

 <title>Harry Potter</title>

 <author>J K. Rowling</author>

 <page_count> &lt; 500 pages</page_count>

</book>

Strictly speaking, only the characters “<“ and “&” are illegal in XML. The other three will not cause errors, but it is a good practice to replace them with the entity reference.

Rule 7: Proper Element Names

From our examples, we know that English letters, both A to Z and a to z, and “_” (underscore) are allowed. We can use digits 0 through 9, and certainly, in other countries, people can use the characters in their language. However, besides characters “-” (hyphen) and “.” (period), XML names cannot have other punctuation characters, such as percent symbols, dollar signs, slashes, semicolons, or, of course, quotation marks and apostrophes. Element names must start only with letters or the underscore characters. (Be sure you don’t leave a space between “<“ and the element name.) You cannot start an element name with a number, a hyphen, or a period. Also, “xml” (or “XML”, or any case combination) cannot be used as an element name or as the start of an element name.

The following examples are valid element names:

<My.Element.Name>

<_element10>

<my-Element>

Though you can be creative with the combination of all the legitimate characters within rules, W3C recommends that we follow these best practices for element names:

 Make names descriptive. It’s best to use an underscore for two or more words: <first_name>.

 Make names short and simple.

 Avoid “-” and “.” Some software may have different interpretations for these two characters.

 Avoid “:” Colons are reserved to be used in namespaces.

Other content in XML

The XML declaration

You may notice that, in metadata examples in this book, the first line of the XML record looks like this:

<? xml version="1.0" encoding="UTF-8" standalone="no"?>

This is the XML declaration, not an element. It specifies what types of characters are being used for this document and whether this file needs to refer to an external document. The XML declaration is optional; it is recommended by W3C to include it in the beginning of an XML document. When including an XML declaration, you need to make sure that it is at the very beginning of the document; even a single white space before the declaration will cause XML parsing error.

There are three attributes that can be in the XML declaration: version, encoding, and standalone. The version attribute must be included in the declaration. Though there is an XML version 1.1,”1.0” is the only value that you can assign to the version attribute; it will cause parser error otherwise.

By default, the XML document is assumed to be encoded in UTF-8 (Unicode) when the encoding attribute is not specified. We recommend not including an encoding attribute if you are unsure. It causes an error when the attribute specification does not agree with the characters the document uses.

The standalone attribute indicates whether the XML document needs to refer to an external document, such as a DTD (Document Type Definition) or an XSD (XML Schema Document). For the book collection examples we have in this chapter, there are no external files to be referred to, and the standalone will have the value “yes.”

<?xml version="1.0" encoding="utf-8" standalone="yes"?>

<my_library>

 <books>

 <book>

  <title>Harry Potter</title>

  <author>J K. Rowling</author>

 </book>

 <book>

  <title>Naked</title>

  <author>David Sedaris</author>

 </book>

 </books>

</my_library>

The standalone attribute is optional. If it is omitted, the value “no” is assumed.

Comments

Often we need to leave comments in an XML document, either to serve as notes for ourselves or to communicate with co-workers who work on the same file. Similarly to HTML, the comment syntax starts with <!-- and ends with -->. For example:

<!-- This is my comment. -->

Any characters are allowed in the comments except the double hyphen (--), because it’s already used to set aside the comment. Comments can be anywhere in the document except inside a tag, like this:

<book <!-- my comment. -->> ← wrong

It should be:

<book> <!-- my comment -->

 <title>Harry Potter</title>

 <author>J K. Rowling</author>

</book>

CDATA Sections

A CDATA section is the place that allows you to have the character data you don’t want to be parsed by the parser in a XML document. Unlike the comments, in which the character data will be completely ignored, the content of CDATA section is still part of the data, but will not be treated as regular XML data. For example, you might want to include a link from Amazon for the book “Harry Potter.”

<book>

 <title>Harry Potter</title>

 <author>J K. Rowling</author>

 <related_link> http://www.amazon.com/s/&field-keywords=harry+potter </related_link>

</book>

From the XML rules, we learned that “&” needs to be replaced by “&amp,” or there will be a parsing error. However, there is another way to do it: tell the parser not to parse this part of the text.

<book>

 <title>Harry Potter</title>

 <author>J K. Rowling</author>

 <related_link><![CDATA[ http://www.amazon.com/s/&field-heywords=harry+potter]]>

 </related_link>

</book>

A CDATA section comes in handy when you want to include scripting codes in your XML file, such as this example from W3C:

 <script>

 < ! [CDATA[

 function matchwo(a,b)

 {

 if (a < b && a < 0) then

  {

 return 1;

  }

 else

  {

 return 0;

 }

 }

 ]]>

 < /script>

We don’t need to worry about replacing “&” and “<” once we put them inside the CDATA section.

There are two restrictions on CDATA sections. First, “]]>” is not allowed in the CDATA section because it marks the end of the section. Second, you cannot nest a CDATA section inside another CDATA section.

Well-formed vs. valid XML

What we have learned in this chapter will help us to create a well-formed XML document. However, to create a valid XML file, depending on the type of metadata, you will need to follow a specific set of rules, such as a DTD or one of the schemas covered in later chapters. For now, remember to adhere to these rules for well-formed XML documents:

 Open and close elements with angle brackets: <element>text</element>.

 Element names are case-sensitive – these are different: name, Name, NAME, nAmE.

 Structure the document like a tree with one root.

 Elements must be correctly nested.

 Attributes’ values must always be quoted.

 Use entity references (&lt;, &gt;, &amp;, &quot;, and &apos;) for these characters: <, >, &, ", '

 Element names must follow the naming guideline.

Why do we use XML?

As you can see, XML is a very simple, flexible, and open format. When used correctly, it’s also very powerful.

Open standards are significant in a data-driven environment because this means that your data will not be locked in any proprietary format. A proprietary format relies on specific software to read the data, and could not be read without that software. If, for some reason, the company that created the software disappeared in a few years, and you no longer had access to the software, your data would be locked in that format and lost forever. For example, if you use Microsoft Excel to create and store data, and keep the data in the .xls (or .xlsx) format, only Microsoft Excel will be able to read the data in the future. Other programs (such as Open Office) can often make a guess at what the files say, but, ultimately, only Excel will be able to read all of the data. However, if you save the data as an XML file, almost any program will be able to read it, because XML is an open standard that can be read on multiple platforms. Eric Lease Morgan describes XML as “an open standard providing the means to share data and information between computers and computer programs as unambiguously as possible” (Morgan, 2008).

In the cultural heritage community, standards are important to maintain and follow so that data can be exchanged, shared, and reused. The XML specification is a W3C standard, and, as such, has the support of a significant organization for maintenance and development. By following these standards, the cultural heritage community demonstrates willingness to participate in international standards to share data across domains.

XML also makes it easy to exchange information (records) across platforms and systems. In the libraries, archives, and museums community, a protocol called the Open Archives Initiatives Protocol for Metadata Harvesting (OAI-PMH) is used to pass XML metadata records about items in our collections to other platforms for harvesting and sharing. More information about OAI-PMH is in the last chapter in this book. As soon as XML records are created or harvested, they can be searched across in an XML format using XPath (a language designed for navigating through parts of an XML document), or imported into a database and searched as database records.

Another advantage of XML is its simplicity. Individual communities with special needs can easily create schemas to fit their data, and new standards are easy to share within a community. XML is easy to access and update in bulk; and individual files, especially for descriptive metadata, are small and easy to store.

XML also offers a way to separate data and design. In the libraries, archives, and museum community, we may need access to data, but the way it’s displayed is not important. Or, we may want to change the way our data is displayed without having to re-enter all of our data. XML makes this possible.

A final advantage of XML is that multiple applications can use (and reuse) the same data. The data only has to be entered once, and, through the use of XSLT, it can be transformed to fit different environments and needs.

XML example records

Below are examples of the same image cataloged using each of the standards discussed later in this book:

Figure 2.3 “Bennie” Photograph: Lee Marmon Pictorial Collection (PICT 2000-017), Center for Southwest Research, University Libraries, University of New Mexico. 2000-017-0012. Available at: http://econtent.unm.edu/u?/Marmon,9

Dublin Core

Dublin Core uses 15 core elements to describe resources.

<?xml version="1.0" encoding="UTF-8"?>

<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/0AI/2.0/oai_dc/" xmlns:dc="http://www.purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/0AI/2.0/oai_dc/ http://www.openarchives.org/0AI/2.0/oai_dc.xsd">

 <dc:title>Bennie</dc:title>

 <dc:creator>Marmon, Lee</dc:creator>

 <dc:subject>Indians of North America – New Mexico</dc:subject>

 <dc:description>Portrait of Bennie, with sheep in background</dc:description>

 <dc:publisher>Center for Southwest Research, University Libraries, University of New Mexico</dc:publisher>

 <dc:date>1984</dc:date>

 <dc:type>Still image; photograph</dc:type>

 <dc:format>Image/jpeg</dc:format>

 <dc:ident ifier>2000-017-0012.tif</dc:identifier>

 <dc:source>ZIM CSWR Pict Colls PICT 2000–017 </dc:source>

 <dc:relation>Lee Marmon Pictorial Collection http://rmoa.unm.edu/docviewer.php?docId=nmupict2000–017.xml</dc:relation>

 <dc:rights>Copyright of the Lee Marmon Pictorial Collection has been transferred to the CSWR. No institutional restrictions placed on use of this collection. Rights to the digital resource are held by the University of New Mexico http://www.unm.edu/disclaimer.htmlhttp://www.unm.edu/disclaimer.html</dc:rights>

 <dc:identifier>http://econtent.unm.edu/u?/Marmon,9 </dc:identifier>

</oai dc:dc>

MARCXML record

Although MARCXML is not discussed in the book, an example record is shown below for those familiar with the MARC standard.

<?xml version="1.0" encoding="utf-8"?>

<record xmlns="http://www.loc.gov/MARC21/slim" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd">

 <leader> am 3u </leader>

 <datafield tag="042" ind1=" " ind2=" ">

 <subfield code="a">dc</subfield>

 </datafield>

 <datafield tag="720" ind1=" " ind2=" ">

 <subfield code="a">Marmon, Lee</subfield>

 <subfield code="e">author</subfield>

 </datafield>

 <datafield tag="260" ind1=" " ind2=" ">

 < subfield code="c">1984</subfield>

 </datafield>

 <datafield tag="520" ind1=" " ind2=" ">

 <subfield code="a">Portrait of Bennie, with sheep in background</subfield>

 </datafield>

 <datafield tag="856" ind1=" " ind2=" ">

 <subfield code="q">Image/jpeg</subfield>

 </datafield>

 <datafield tag="024" ind1="8" ind2=" ">

 <subfield code="a">2000-017-0012.tif</subfield>

 </datafield>

 <datafield tag="024" ind1="8" ind2=" ">

 <subfield code="a"> http://econtent.unm.edu/u?/Marmon,9</subfield>

 </datafield>

 <datafield tag="260" ind1=" " ind2=" ">

 <subfield code="b">Center for Southwest Research, University Libraries, University of New Mexico</subfield>

 </datafield>

 <datafield tag="787" ind1="0" ind2=" ">

 <subfield code="n">Lee Marmon Pictorial Collectionhttp://rmoa.unm.edu/docviewer.php?docId=nmupict2000-017.xml </subfield>

 </datafield>

 <datafield tag="540" ind1=" " ind2=" ">

 <subfield code="a">Copyright of the Lee Marmon Pictorial Collection has been transferred to the CSWR. No institutional restrictions placed on use of this collection. Rights to the digital resource are held by the University of New Mexicohttp://www.unm.edu/disclaimer.html </subfield>

 </datafield>

 <datafield tag="786" ind1="0" ind2=" ">

 <subfield code="n">ZIM CSWR Pict Colls PICT 2000-017</subfield>

 </datafield>

 <datafield tag="653" ind1=" " ind2=" ">

 <subfield code="a">Indians of North America -- New Mexico</subfield>

 </datafield>

 <datafield tag="245" ind1="0" ind2="0">

 <subfield code="a">Bennie</subfield>

 </datafield>

 <datafield tag="655" ind1="7" ind2=" ">

 <subfield code="a">Still image; photograph</subfield>

 <subfield code="2">local</subfield>

 </datafield>

</record>

Abbreviated EAD record for the Lee Marmon Collection

Note that the EAD record describes the entire collection, not just the single image. Also, this is an abbreviated record. The full record is over 7700 lines. Remember that your software will supply forms for you toenter this information, and very rarely, if ever, will you have to key an entire XML record from scratch.

<ead>

 <eadheader findaidstatus="edited-full-draft" langencoding="iso639-2b" audience="internal" repositoryencoding="iso15511" countryencoding="iso3166-1" scriptencoding="iso15924" dateencoding="iso8601" relatedencoding="Dublin Core">

 <eadid publicid="-//University of New Mexico Center for Southwest Research//TEXT(US::NmU::PICT 2000–017)//EN" countrycode="us" mainagencycode="NmU" encodinganalog="Identifier"/>

 <filedesc>

  <titlestmt>

  <titleproper encodinganalog="Title">Inventory of the Lee Marmon Pictorial Collection, <date>19362010</date></titleproper>

  </titlestmt>

  <publicationstmt>

  <publisher>University of New Mexico, University Libraries, Center for Southwest Research</publisher>

  <date era="ce" calendar="gregorian" encodinganalog="Date">© 2007</date>

   <p>The University of New Mexico</p>

  </publicationstmt>

 </filedesc>

 <profiledesc>

  <langusage>Finding aid is in <language encodinganalog="Language" langcode="eng">English</language></langusage>

 </profiledesc>

 </eadheader>

 <archdesc level="collection" relatedencoding="MARC 21">

 <did>

 <head>Collection Summary</head>

 <unittitle encodinganalog="245" label="Title">Lee Marmon Pictorial Collection</unittitle>

 <unitdate type="inclusive" era="ce" calendar ="gregorian" normal="1936/2008">1936—2 010</unitdate>

 <unitid countrycode="us" label="Collection Number">PICT 2000-017</unitid>

 <origination label="Creator">

 <persname>Marmon, Lee</persname>

 </origination>

 <physdesc encodinganalog="300" label="Size">

 <extent>36 boxes</extent>

 </physdesc>

 <physloc>B2. Filed by Accession Number.</physloc>

 <repository encodinganalog="852" label=" Repository">

 <corpname>University of New Mexico Center for Southwest Research</corpname>

 </repository>

 <abstract>This collection contains photographs taken by Lee Marmon throughout his life. These include images of elders and community members from Laguna and Acoma Pueblos, visual documentation of uranium mines and mills throughout New Mexico, photos of fashion and social life in 1960s and 1970s Palm Springs, CA, among other things.</abstract>

 </did>

 <arrangement>

 <head>Arrangement of the Collection:</head>

 <p>The Lee Marmon Pictorial Collection is arranged into series: </p>

 <list type="marked">

  <item>Original Lee Marmon Collection (2000)</item>

  <item> American Indian Colleges </item>

  <item>General Photographs</item>

  <item>Moving Images</item>

 </list>

 </arrangement>

 <dsc type="in-depth">

 <head>Contents List</head>

 <c01 level="series">

 <did>

  <unittitle id="orig">0RIGINAL MARMON COLLECTION (2000)</unittitle>

  <unitdate>1949–1999</unitdate>

 </did>

 <c02 level="file">

  <did>

  <container type="box">1</container>

  <container type="folder">1</container>

  <unittitle>Portraits – Men </unittitle>

  <unitdate>1949–1963</unitdate>

 </did>

 <scopecontent><p>0001:Lee Marmon with Station Wagon,1949. 0002: Jeff Sousea "White Man’s Moccasins,"1954. 0003: Gov. James Solomon w/ Lincoln cane, 1958. 0004: Mateos Mexicano, 1962; 0005: Jose Sanshu, 1963; 0006: Jose Teofilo, 1961; 0007: Fernando, 1950; 0008: John Riley, 1949</p></scopecontent>

 </c02>

 <c02 level="file">

 <did>

 <container type="box">2</container>

 <container type="folder">1</container>

 <unittitle>Portraits – Men and Women</unittitle>

 <unitdate>1952–1987</unitdate>

 </did>

 <scopecontent><p>0009: Benson, Navajo Sheepherder, 1985; 0010: Bronco Martinez,1984; 0011: Platero, Navajo, 1962; 0012: Bennie, 1984; 0013: Fr. Kenneth, Acoma, 1952; 0014: Esther – Zuni Pueblo, 1975; 0015: Susie Rayos Marmon, 110th birthday, 1987; 0016: Lucy Louis – Acoma, 1960</p></scopecontent>

  </c02>

 </c01>

 </dsc>

 </ead>

VRA Core 4.0 Record

A key feature of VRA Core 4.0 is the ability to distinguish from the original photograph (all elements inside the <work> wrapper) and the digitized image (all elements inside the <image> wrapper).

<?xml version="1.0" encoding="UTF-8"?>

<vra xmlns="http://www.vraweb.org/vracore4.htm" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.vraweb.org/vracore4.htm http://www.loc.gov/standards/vracore/vra-strict.xsd">

 <work>

 <agentSet>

  <agent>

  <name vocab="NAF" refid="nr 97009000">Marmon, Lee </name>

  <culture>American</culture>

  <role>photographer</role>

  </agent>

 </agentSet>

 <culturalContextSet>

  <culturalContext>American</culturalContext>

 </culturalContextSet>

 <dateSet>

  <display>1984</display>

  <date type="creation">

  <earliestDate>1984</earliestDate>

  <latestDate>1984</latestDate>

  </date>

 </dateSet>

 <descriptionSet>

  <description>Portrait of Bennie, with sheep in background</description>

 </descriptionSet>

 <locationSet>

  <location type="creation">

  <name type="geographic" vocab="TGN" refid="7007566" extent="state">New Mexico</name>

  <name type="geographic" vocab="TGN" refid="7012149" extent="nation">United States</name>

  </location>

 </locationSet>

 <materialSet>

  <display>black and white film</display>

  <material/>

 </materialSet>

 <measurementsSet>

 <display>4 in (height) × 3.2 5 in (width)</display>

 <measurements type="height" unit="in">4</measurements>

 <measurements type="width" unit="in">3.25</ measurements>

 </measurementsSet>

 <relationSet>

  <relation type="partOf">Lee Marmon Pictorial Collection</relation>

 </relationSet>

 <rightsSet>

<display>Copyright of the Lee Marmon Pictorial Collection has been transferred to the CSWR. No institutional restrictions placed on use of this collection. Rights to the digital resource are held by the University of New Mexicohttp://www.unm.edu/disclaimer.html </display>

 <rights/>

 </rightsSet>

 <subjectSet>

 <display>Indians of North America -- New Mexico</ display>

 <subject>

  <term type="descriptiveTopic" vocab="LCSH" refid="sh85065489">Indians of North America -- New Mexico</term>

 </subject>

 </subjectSet>

 <techniqueSet>

  <display>photography</display>

  <technique vocab="AAT" refid="300054225">photography</technique>

 </techniqueSet>

 <titleSet>

  <display>Bennie</display>

  <title type="descriptive" pref="true" xml:lang="en">Bennie</title>

 </titleSet>

 <worktypeSet>

  <display>black-and-white photographs</display>

  <worktype vocab="AAT" refid="300128347">black-and-white photographs</worktype>

 </worktypeSet>

 </work>

 <image>

 <dateSet>

  <display>2008-11-21</display>

  <date type="creation">

  <earliestDate>2008-11-21</earliestDate>

  <latestDate>2008-11-21</latestDate>

  </date>

 </dateSet>

 <descriptionSet>

  <description>Created on an Epson Expression 1640XL, 500 ppi, 24 bit</description>

 </descriptionSet>

 <measurementsSet>

  <display>56.28 KB</display>

  <measurements/>

 </measurementsSet>

 <relationSet>

  <relation type="image0f" refid="2000-017-0012"/>

 </relationSet>

 <rightsSet>

<display>Copyright of the Lee Marmon Pictorial Collection has been transferred to the CSWR. No institutional restrictions placed on use of this collection. Rights to the digital resource are held by the University of New Mexicohttp://www.unm.edu/disclaimer.html </display>

  <rights/>

 </rightsSet>

 <techniqueSet>

  <display>digital imaging</display>

  <technique/>

 </techniqueSet>

 <titleSet>

  <title>Digitized image from photograph</title>

 </titleSet>

 <worktypeSet>

  <display>digital image</display>

  <worktype/>

 </worktypeSet>

 </image>

 </vra>

CDWALite record

CDWALite is primarily used to describe museum resources. This record describes the original photograph.

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>

<cdwa:cdwalite xmlns:cdwa="http://www.getty.edu/CDWA/CDWALite" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.getty.edu/CDWA/CDWALite http://www.getty.edu/CDWA/CDWALite/CDWALite-<sd-public-v1-1.xsd">

 <cdwa:descriptiveMetadata>

  <cdwa:objectWorkTypeWrap>

  <cdwa:obj ectWorkType>photograph</cdwa:objectWorkType>

 </cdwa:objectWorkTypeWrap>

 <cdwa:titleWrap>

 <cdwa:titleSet>

  <cdwa:title>Bennie</cdwa:title>

 </cdwa:titleSet>

 </cdwa:titleWrap>

 <cdwa:displayCreator>Lee Marmon </cdwa:displayCreator>

 <cdwa:indexingCreatorWrap>

 <cdwa:indexingCreatorSet>

  <cdwa:nameCreatorSet>

  <cdwa:nameCreator>Marmon, Lee</cdwa:nameCreator>

  </cdwa:nameCreatorSet>

  <cdwa:roleCreator>Photographer</cdwa:roleCreator>

 </cdwa:indexingCreatorSet>

 </cdwa:indexingCreatorWrap>

 <cdwa:displayMaterialsTech>black-and-white photograph</cdwa:displayMaterialsTech>

 <cdwa:displayCreationDate>1984 </cdwa:displayCreationDate> <cdwa:indexingDatesWrap> <cdwa:indexingDatesSet>

 <cdwa:earliestDate>1984</cdwa:earliestDate>

 <cdwa:latestDate>1984</cdwa:latestDate>

 </cdwa:indexingDatesSet>

 </cdwa:indexingDatesWrap>

 <cdwa:locationWrap>

 <cdwa:locationSet>

  <cdwa:locationName>Center for Southwest Research, University Libraries, University of New Mexico, Albuquerque, New Mexico</cdwa:locationName>

 </cdwa:locationSet>

 </cdwa:locationWrap>

 </cdwa:descriptiveMetadata>

 <cdwa:administrativeMetadata>

 </cdwa:administrativeMetadata>

 </cdwa:cdwalite>

Example exercise

Create your own XML documents, using a free XML editor (if you don’t have one, you can still create an XML document in any text editor). Get comfortable with the coding. Don’t worry about following any schemas yet. Schemas will be discussed thoroughly in the following chapters.