Consider
a set of documents on the Web that contain hyperlinks to other documents. These
documents, although not completely unstructured, cannot be modelled naturally
in the relational data model because the pattern of hyperlinks is not regular
across documents.
While
some data is completely unstructured -- for example video streams, audio
streams, and image data -- lot of data is neither completely unstructured nor
completely structured. We refer to data with partial structure as semi
structured data.
There
are many reasons why data might be semi structured. First, the structure of data might be implicit, hidden, unknown, or
the user might choose to ignore it. Second,
consider the problem of integrating data from several heterogeneous sources
where data exchange and transformation are important problems. Third, we cannot query a structured
database without knowing the schema, but sometimes we want to query the data
without full knowledge of the schema.
We
now discuss one of the proposed data models for semi structured data, called
the object exchange model (OEM). Each object is described by a triple
consisting of a label, a type, and the value of the
object. Since each object has a label
that can be thought of as the column name in the relational model, and each
object has a type that can be thought of as the column type in the relational
model, the object exchange model is basically self-describing. Labels in the
object exchange model should be as informative as possible, since they can
serve two purposes: They can be used to identify an object as well as to convey
the meaning of an object. For example,
we can represent the last name of an author as follows: <lastName, string, "Feynman">
No comments:
Post a Comment