List of Tables
Chapter 1. The case for the digital Babel fish
Table 1.1. Tika’s main methods of media type detection. These techniques can be performed in isolation or combined together to formulate a powerful and comprehensive automatic file detection mechanism.
Chapter 2. Getting started with Tika
Chapter 3. The information landscape
Table 3.1. Some underlying principles of the REST architecture and their influence on the web’s scalability. These are only a cross-section of the full description of REST from Fielding’s dissertation.
Table 3.2. Information representative of the type collected about users of e-commerce sites. This would then be fed into a collaborative filtering, clustering, or categorization technique to provide recommendations, find similarities between your purchasing history with that of other users, and so on.
Chapter 4. Document type detection
Table 4.1. Officially specified top-level media types by IANA. These types form the basis for a detailed classification framework of available document types. Children are allowed for each top-level type, indicating some specialization of the parent (a more specific schema, a slightly different encoding format, and so on).
Chapter 5. Content extraction
Table 5.1. The arguments for the org.apache.tika.parser.Parser’s parse() method. Some of the arguments are only read, such as the InputStream and the ParseContext; some are callbacks (such as the ContentHandler); and some objects are actually written to, such as the Metadata argument.
Table 5.2. Potential problems that can be encountered during the parse() method. Outside of SAX parsing errors and I/O errors, Tika wraps the remaining parsing exceptions in its own custom TikaException class.
Chapter 6. Understanding metadata
Table 6.1. Relevant components of a metadata standard (or metadata model). Metadata standards help to differentiate between metadata fields, allow for their comparison and validation, and ultimately clearly describe the use of metadata fields in software.
Chapter 8. What’s in a file?
Table 8.1. Simplified representation of content within Hierarchical Data Format (HDF) files. HDF represents observational data and metadata information using a small set of constructs: named scalars, vectors, and matrices.
Chapter 10. Tika and the Lucene search stack
Chapter 12. Powering NASA science data systems
Appendix A. Tika quick reference