ABSTRACT
An immeasurable amount of multimedia
information is available today in digital archives, on the Web, in broadcast
data streams, and in personal and professional databases and this amount
continues to grow. Yet, the value of that information depends on how easily we
can manage, find, retrieve, access, and filter it. The transition between two
millennia abounds with new ways to produce, offer, filter, search, and manage
digitized multimedia information. Broadband is being offered with increasing
audio and video quality, using ever-improving access speeds on both fixed and
mobile networks. As a result, users are confronted with numerous content sources.
Wading through these sources, and finding what you need and what you like in
the vast content sea, is becoming a daunting task.
MPEG-7—developed
by the Moving Picture Experts Group (MPEG)—addresses this content management
challenge. (This same International Organization for Standardization [ISO]
committee also developed the successful standards known as MPEG-1 [1992],
MPEG-2 [1995], and MPEG-4 [version 1 in 1998 and version 2 in 1999].) The
recently completed ISO/IEC International Standard 15938, formally called the
Multimedia Content Description Interface (but better known as MPEG-7), provides
a rich set of tools for completely describing multimedia content. The standard
wasn’t just designed from a content management viewpoint (classical archival
information). It includes an innovative description of the media’s content,
which we can extract via content analysis and processing. MPEG-7 also isn’t
aimed at any one application; rather, the elements that MPEG-7 standardizes
support as broad a range of applications as possible. This is one of the key
differences between MPEG-7 and other metadata standards; it aims to be generic,
not targeted to a specific application or application domain. This article
provides a comprehensive overview of MPEG-7’s motivation, objectives, scope,
and components. MPEG-7 offers a comprehensive set
of audiovisual Description Tools (the metadata elements and their structure and
relationships, that are defined by the standard in the form of Descriptors and
Description Schemes) to create descriptions (i.e., a set of instantiated
Description Schemes and their corresponding Descriptors at the users will),
which will form the basis for applications enabling the needed effective and
efficient access (search, filtering and browsing) to multimedia content.
INTRODUCTION
Established in 1988, the Moving Picture Experts Group (MPEG) has
developed digital audiovisual compression standards that have changed the way
audiovisual content is produced by manifold industries, delivered through all
sorts of distribution channels and consumed by a variety of devices.
Accessing audio and video used to be a simple matter
- simple because of the simplicity of the access mechanisms and because of the
poverty of the sources. An incommensurable amount of audiovisual information is
becoming available in digital form, in digital archives, on the World Wide Web,
in broadcast data streams and in personal and professional databases, and this
amount is only growing. The value of information often depends on how easy it
can be found, retrieved, accessed, filtered and managed.
The transition between the second and third
millennium abounds with new ways to produce, offer, filter, search, and manage
digitized multimedia information. Broadband is being offered with increasing
audio and video quality and speed of access. The trend is clear: in the next
few years, users will be confronted with such a large number of contents
provided by multiple sources that efficient and accurate access to this almost
infinite amount of content seems unimaginable today. Inspite of the fact that
users have increasing access to these resources, identifying and managing them
efficiently is becoming more difficult, because of the sheer volume. This
applies to professional as well as end users. The question of identifying and
managing content is not just restricted to database retrieval applications such
as digital libraries, but extends to areas like broadcast channel selection,
multimedia editing, and multimedia directory services.
This challenging situation demands a timely solution
to the problem. MPEG-7 is the answer to this need.
MPEG-7 is an ISO/IEC standard developed by MPEG
(Moving Picture Experts Group), the committee that also developed the
successful standards known as MPEG-1 (1992) and MPEG-2 (1994), and the MPEG-4
standard (Version 1 in 1998, and version 2 in 1999). The MPEG-1 and MPEG-2
standards have enabled the production of widely adopted commercial products,
such as Video CD, MP3, digital audio broadcasting (DAB), DVD, digital
television (DVB and ATSC), and many video-on-demand trials and commercial
services. MPEG-4 is the first real multimedia representation standard, allowing
interactivity and a combination of natural and synthetic material coded in the
form of objects (it models audiovisual data as a composition of these objects).
MPEG-4 provides the standardized technological elements enabling the
integration of the production, distribution and content access paradigms of the
fields of interactive multimedia, mobile multimedia, interactive graphics and
enhanced digital television.
The MPEG-7 standard, formally named "Multimedia
Content Description Interface", provides a rich set of standardized tools
to describe multimedia content. Both human users and automatic systems that
process audiovisual information are within the scope of MPEG-7.
MPEG-7 offers a comprehensive set of audiovisual
Description Tools (the metadata elements and their structure and relationships,
that are defined by the standard in the form of Descriptors and Description
Schemes) to create descriptions (i.e., a set of instantiated Description
Schemes and their corresponding Descriptors at the users will), which will form
the basis for applications enabling the needed effective and efficient access
(search, filtering and browsing) to multimedia content. This is a challenging
task given the broad spectrum of requirements and targeted multimedia
applications, and the broad number of audiovisual features of importance in
such context.
MPEG-7 has been developed by experts representing
broadcasters, electronics manufacturers, content creators and managers,
publishers, intellectual property rights managers, telecommunication service
providers and academia.
The family of MPEG
standards
MPEG is a working group of the International
Organization for Standardization/International Electronics Commission
(ISO/IEC), in charge of developing international standards for compression,
decompression, processing, and coded representation of moving pictures, audio,
and their combination. So far, MPEG has produced MPEG- 1, MPEG-2, MPEG-4
version 1, and is currently working on MPEG-4 version 2 and MPEG-7.
MPEG-1:
Storage and retrieval
MPEG-1 is the standard for storage and
retrieval of moving pictures and audio on storage media. MPEG-1 provides a
nominal data stream compression rate of about 1.2 Mbits per second—the typical
CD-ROM data transfer rate—but can deliver data at a rate of up to 1,856,000
bps. MPEG-1 distinguishes four types of image coding for processing: I
(intra-coded pictures), P (predictive coded pictures), B (bidirectionally
predictive pictures), and D-Frame (coding based on discrete cosine only
parameter) images. To allow audio compression in acceptable quality, MPEG-1
enables audio data rates between 32 and 448 Kbps. MPEG-1 explicitly considers
other standards and functionalities, such as JPEG and H.261, suitable for
symmetric and asymmetric compression. It also provides a system definition to
specify the combination of several individual data streams. Note that MPEG-1
doesn’t prescribe compression in real time. Furthermore, though MPEG-1 defines
the process of decoding, it doesn’t define the decoder itself. The quality of
an MPEG-1 video without sound at roughly 1.2 Mbps (the single speed CD-ROM
transfer rate) is equivalent to a VHS recording. We should mention that MPEG-1
provides a means for transmitting metadata. In general, two mechanisms exist,
the transmission of z user data extensions within a video stream or z data in a
separated private data stream that gets multiplexed with the audio and video stream
as part of the system stream. Since both methods attach additional data into
the MPEG-1 stream, they either increase the demand of bandwidth for
transmission/storage or reduce the quality of the audio-visual streams for a
given bandwidth. No format for the coding of those extra streams was defined,
which led to proprietary solutions. This might explain why these mechanisms
aren’t widely adopted.
MPEG-2:
Digital television
MPEG-2, the digital television standard,
strives for a higher resolution—up to 100 Mbps—that resembles the digital video
studio standard CCIR 601 and the video quality needed in HDTV. As a compatible
extension to MPEG-1, MPEG-2 supports interlaced video formats and a number of
other advanced features, such as those to support HDTV. As a generic standard,
MPEG-2 was defined in terms of extensible profiles, each supporting the feature
required by an important application class. The Main Profile, for example,
supports digital video transmission at a range of 2 to 80 Mbps over cable,
satellite, and other broadcast channels. Furthermore, it supports digital
storage and other communications applications. An essential extension from
MPEG-1 to MPEG-2 is the ability to scale the compressed video, which allows the
encoding of video at different qualities (spatial-, rate-, and amplitude-based
scaling2). The MPEG-2 audio coding was developed for low bit-rate coding of
multichannel audio. MPEG- 2 extends the MPEG-1 standard by providing five full
bandwidth channels, two surround channels, one channel to improve low
frequencies, and/or seven multilingual channels, and the coding of mono and
stereo (at 16 kHz, 22.05 kHz, and 24 kHz).
Nevertheless, MPEG-2 is still backward compatible with MPEG-1. MPEG-2
provides an MPEG-2 system with definitions of how video, audio, and other data
combine into single or multiple streams suitable for storage and transmission.
Furthermore, it provides syntactical and semantical rules that synchronize
decoding and presentation of audio and video information.
With respect to transmission/storage,
the same mechanisms developed for MPEG-1 were assigned to MPEG-2. Additionally,
some of the MPEG-2 header contains a structured information block, covering
such application-related information as copyright and conditional access. The
amount of information is restricted to a number of bytes. Reimers3 described an
extensive structuring of content, coding, and access of such metadata within
MPEG-2. Originally, there were plans to specify MPEG-3 as a standard
approaching HDTV. However, during the development of MPEG-2, researchers found
that it scaled up adequately to meet HDTV requirements. Thus, MPEG-3 was
dropped.
MPEG-4:
Multimedia production, distribution, and content access
Though the results of MPEG-1 and MPEG-2 served well
for wide-ranging developments in such fields as
interactive video, CD-ROM, and digital TV, it soon became apparent that
multimedia applications required more than the established achievements. Thus,
in 1993 MPEG started working to provide the standardized technological elements
enabling the integration of the production, distribution, and content access
paradigms of digital TV, interactive graphics applications (synthetic content),
and interactive multimedia (distribution of and access to enhanced content on
the Web). MPEG-4 version 1, formally called ISO/IEC 14496, has been available
as an international standard since December 1998. The second version will be
finished in December 1999. MPEG-4 aims to provide a set of technologies to
satisfy the needs of authors, service providers, and end users, by avoiding the
emergence of a multitude of proprietary, incompatible formats and players. The
standard should allow the development of systems that can be configured for a
vast number of applications (among others, real-time communications,
surveillance, and mobile multimedia). To achieve this requires providing
standardized ways to z Interact with the material, based on encoding units of
aural, visual, or audio-visual content,called media objects. These media
objects can be natural or synthetic, which means they could be recorded with a
camera or microphone, or generated with a computer. z Interact with the
content, based on the
Introduction to MPEG-7
MPEG-7 is a standard for describing features of multimedia content.MPEG-7, formally named “Multimedia Content Description Inter-face,”
is the standard that describes multimedia content so users can search, browse,
and retrieve that content more efficiently and effectively than they could using
today’s mainly text-based search engines. It’s a standard for describing the
features of multimedia content.
Qualifying MPEG-7
MPEG-7
provides the world’s richest set of audio-visual descriptions.
These descriptions are based on
catalogue (e.g., title, creator, rights), semantic (e.g., the who, what, when,
where information about objects and events) and structural (e.g., the colour
histogram - measurement of the amount of colour associated with an image or the
timbre of an recorded instrument) features of the AV content and leverages on
AV data representation defined by MPEG-1, 2 and 4.
Comprehensive
Scope of Data Interoperability:
MPEG-7 uses XML(Extensible Mark up Language) Schema
as the language of choice for content description MPEG-7 will be interoperable
with other leading standards such as, SMPTE Metadata Dictionary, Dublin Core,
EBU P/Meta, and TV Anytime.
XML has not been designed to deal ideally in a
real-time, constrained and streamed environment like in the multimedia or
mobile industry. As long as structured documents (HTML, for instance) were
basically composed of only few embedded tags, the overhead induced by textual
representation was not critical. MPEG-7 standardizes an XML language for
audiovisual metadata. MPEG-7 uses XML to model this rich and structured data.
To overcome the lack of efficiency of textual XML, MPEG-7 Systems defines a
generic framework to facilitate the carriage and processing of MPEG-7
descriptions: BiM (Binary Format for MPEG-7). It enables the streaming and the
compression of any XML documents.
BiM coders and decoders can deal with any XML
language. Technically, the schema definition (DTD or XML Schema) of the XML
document is processed and used to generate a binary format. This binary format
has two main properties. First, due to the schema knowledge, structural
redundancy (element name, attribute names, aso) is removed from the document.
Therefore the document structure is highly compressed (98% in average). Second,
elements and attributes values are encoded according to some dedicated codecs.
A library of basic datatype codecs is provided by the specification (IEEE 754,
UTF_8, compact integers, VLC integers, lists of values, aso...). Other codecs
can easily be plugged using the type-codec mapping mechanism.
MPEG-7 Elements
In
October 1996, MPEG started a new work item to provide a solution to the
questions described above. The new member of the MPEG family, named
"Multimedia Content Description Interface" (in short MPEG-7),
provides standardized core technologies allowing the description of audiovisual
data content in multimedia environments. It extends the limited capabilities of
proprietary solutions in identifying content that exist today, notably by
including more data types.
The main elements of the MPEG-7 standard are:
·
Description Tools:
Descriptors (D), that define the syntax and the semantics of each feature
(metadata element); and Description Schemes (DS), that specify the structure
and semantics of the relationships between their components, that may be both
Descriptors and Description Schemes; A Descriptor defines the syntax and semantics of each feature. For
example, for the color feature, the color histogram or the text of the title is
the descriptor. The director of a multimedia document or a texture in a single picture is also an
example of a descriptor .A Description Scheme specifies the structure and
semantics of the relationships between its components which may be both
Descriptors and Description Schemes.
·
A Description
Definition Language (DDL) to define the syntax of the MPEG-7 Description Tools
and to allow the creation of new Description Schemes and, possibly, Descriptors
and to allow the extension and modification of existing Description Schemes;
·
System tools, to support
binary coded representation for efficient storage and transmission,
transmission mechanisms (both for textual and binary formats), multiplexing of
descriptions, synchronization of descriptions with content, management and
protection of intellectual property in MPEG-7 descriptions,etc
Basic structures
There are five Visual related Basic structures:
the Grid layout, the Time series, Multiple view, the Spatial 2D coordinates,
and Temporal interpolation.
Grid layout
The grid layout is a splitting of the image into a set of equally
sized rectangular regions, so that each region can be described separately.
Each region of the grid can be described in terms of other Descriptors such as
color or texture. Furthermore, the descriptor allows to assign the sub Descriptors
to all rectangular areas, as well as to an arbitrary subset of rectangular
regions.
Time Series
This descriptor defines a temporal series of
Descriptors in a video segment and provides image to video-frame matching and
video-frames to video-frames matching functionalities. Two types of TimeSeries
are available: Regular TimeSeries and Irregular TimeSeries. In the former,
Descriptors locate regularly (with constant intervals) within a given time
span. This enables a simple representation for the application that requires
low complexity. On the other hand, Descriptors locate irregularly (with various
intervals) within a given time span in the latter. This enables an efficient
representation for the application that has the requirement of narrow
transmission bandwidth or low storage capability. These are useful in
particular to build Descriptors that contain time series of Descriptors.
2D-3D Multiple View
The 2D/3D Descriptor specifies a structure
which combines 2D Descriptors representing a visual feature of a 3D object seen
from different view angles. The descriptor forms a complete 3D view-based
representation of the object. Any 2D visual descriptor, such as for example
contour-shape, region-shape, colour or texture can be used. The 2D/3D
descriptor supports integration of the 2D Descriptors used in the image plane
to describe features of the 3D (real world) objects. The descriptor allows the
matching of 3D objects by comparing their views, as well as comparing pure 2D
views to 3D objects.
Spatial 2D Coordinates
It supports two kinds of coordinate systems: "local" and
"integrated" (see Figure 3.2). In a "local" coordinate
system, the coordinates used for the calculation of the description is mapped
to the current coordinate system applicable. In an "integrated"
coordinate system, each image (frame) of e.g. a video may be mapped to
different areas with respect to the first frame of a shot or video. The
integrated coordinate system can for instance be used to represent coordinates
on a mosaic of a video shot.
The
Temporal Interpolation Dimensional describes a temporal interpolation using
connected polynomials. This can be used to approximate multi-dimensional
variable values that change with time—such as an object position in a video. The
description size of the temporal interpolation is usually much smaller than describing
all values. In Figure 3.3 real values are represented by five linear
interpolation functions and two quadratic interpolation functions. The
beginning of the temporal interpolation is always aligned to time 0.
Encoding
and Delivery
BiM(Binary
format for MPEG-7) coders and decoders can deal with any XML language.
Technically, the schema definition (DTD or XML Schema) of the XML document is
processed and used to generate a binary format. This binary format has two main
properties. First, due to the schema knowledge, structural redundancy (element
name, attribute names) is removed from the document. Second, elements and
attributes values are encoded according to some dedicated codecs.
This section contains a detailed overview of the
different MPEG-7 technologies that are currently standardized. First the MPEG-7
Multimedia Descriptions Schemes are described as the other Description Tools
(Visual and Audio ones) are used always wrapped in some MPEG-7 MDS
descriptions. Afterwards the Visual and Audio Description Tools are described
in detail. Then the DDL is described, paving the ground for describing the
MPEG-7 formats, both textual (TeM) and binary (BiM). Then the MPEG-7 terminal
architecture is presented, followed by the Reference Software. Finally the
MPEG-7 Conformance specification and the Extraction and Use of Descriptions
Technical Report are explained.
MPEG-7 Multimedia Description Schemes
MPEG-7 Multimedia Description
Schemes (DSs) are metadata structures for describing and annotating
audio-visual (AV) content. The DSs provide a standardized way of describing in
XML the important concepts related to AV content description and content
management in order to facilitate searching, indexing, filtering, and access.
The DSs are defined using the MPEG-7 Description Definition Language (DDL),
which is based on the XML Schema Language, and are instantiated as documents or
streams. The resulting descriptions can be expressed in a textual form (i.e.,
human readable XML for editing, searching, filtering) or compressed binary form
(i.e., for storage or transmission). In this paper, they provide an overview of the MPEG-7 Multimedia
DSs and describe their targeted functionality and use in multimedia
applications.
The goal of the MPEG-7 standard is
to allow interoperable searching, indexing, filtering and access of audio-visual
(AV) content by enabling interoperability among devices and applications that
deal with AV content description. MPEG-7 describes specific features of AV
content as well as information related to AV content management. MPEG-7
descriptions take two possible forms: (1) a textual XML form suitable for
editing, searching, and filtering, and (2) a binary form suitable for storage,
transmission, and streaming delivery. Overall, the standard specifies four
types of normative elements: Descriptors, Description Schemes (DSs), a
Description Definition Language (DDL), and coding schemes.
The MPEG-7 Descriptors are designed
primarily to describe low-level audio or visual features such as color,
texture, motion, audio energy, and so forth, as well as attributes of AV
content such as location, time, quality, and so forth. It is expected that most
Descriptors for low-level features shall be extracted automatically in
applications.
On the other hand, the MPEG-7 DSs
are designed primarily to describe higher-level AV features such as regions,
segments, objects, events; and other immutable metadata related to creation and
production, usage, and so forth. The DSs produce more complex descriptions by
integrating together multiple Descriptors and DSs, and by declaring relationships
among the description components. In MPEG-7, the DSs are categorized as
pertaining to the multimedia, audio, or visual domain. Typically, the
multimedia DSs describe content consisting of a combination of audio, visual
data, and possibly textual data, whereas, the audio or visual DSs refer
specifically to features unique to the audio or visual domain, respectively. In
some cases, automatic tools can be used for instantiating the DSs, but in many
cases instantiating DSs requires human assisted extraction or authoring tools.
Content Management
MPEG-7 provides DSs for AV content management.
These tools describe the following information: (1) creation and production,
(2) media coding, storage and file formats, and (3) content usage. More details
about the MPEG-7 tools for content management are described as follows [Many of
the components of the content management DSs are optional. The instantiation of
the optional components is often decided in view of the specific multimedia
application.]: The Creation Information describes the creation and
classification of the AV content and of other related materials. The Creation
information provides a title (which may itself be textual or another piece of
AV content), textual annotation, and information such as creators, creation
locations, and dates. The classification information describes how the AV
material is classified into categories such as genre, subject, purpose,
language, and so forth. It provides also review and guidance information such
as age classification, parental guidance, and subjective review. Finally, the
Related Material information describes whether there exists other AV materials
that are related to the content being described.The Media Information describes
the storage media such as the format, compression, and coding of the AV
content. The Media Information DS identifies the master media, which is the
original source from which different instances of the AV content are produced.
The instances of the AV content are referred to as Media Profiles, which are
versions of the master obtained perhaps by using different encodings, or
storage and delivery formats. Each Media Profile is described individually in
terms of the encoding parameters, storage media information and location. The
Usage Information describes the usage information related to the AV content
such as usage rights, usage record, and financial information. The rights
information is not explicitly included in the MPEG-7 description, instead,
links are provided to the rights holders and other information related to
rights management and protection. The Rights DS provides these references in
the form of unique identifiers that are under management by external authorities.
The underlying strategy is to enable MPEG-7 descriptions to provide access to
current rights owner information without dealing with information and
negotiation directly. The Usage Record DS and Availability DSs provide
information related to the use of the content such as broadcasting, on demand
delivery, CD sales, and so forth. Finally, the Financial DS provides
information related to the cost of production and the income resulting from
content use. The Usage Information is typically dynamic in that it is subject
to change during the lifetime of the AV content.
The Content Management Description Tools allow the description of
the life cycle of the content, from content to consumption.
The content described by MPEG-7 descriptions can be available in different
modalities, formats, Coding Schemes, and there can be several instances. For
example, a concert can be recorded in two different modalities: audio and
audio-visual. Each of these modalities can be encoded by different Coding
Schemes. This creates several media profiles. Finally, several instances of the
same encoded content may be available. These concepts of modality, profile and
instance
·
Content: One reality such as a
concert in the world can be represented as several types of media, e.g., audio
media, audio-visual media. A content is an entity that has a specific structure
to represent the reality.
·
Media Information: Physical
format of a content is described by Media Information DS. One description
instance of the DS will be attached to one content entity to describe it. The
DS is centered about an identifier for the content entity and it also has sets
of Descriptors for the storage format of the entity.
·
Media Profile: One content
entity can have one or more media profiles that correspond to different Coding
Schemes of the entity. One of the profiles is the original one, called master
profile, that corresponds to initially created or recorded one. The others will
be transcoded from the master. If the content is encoded with the same encoding
tool but with different parameters, different media profiles are created.
·
Media Instance: A content
entity can be instantiated as physical entities called media instances. An
identifier and a locator specify the media instance.
·
Creation Information:
Information about the creation process of a content entity is described by
Creation Information DS. One description instance of the DS will be attached to
one content entity to describe it.
·
Usage Information: Information
about the usage of a content entity is described by Usage Information DS. One
description instance of the DS will be attached to one content entity to
describe it.
The only part of the description that depends on the storage media
or the encoding format is the Media Information described in this section. The
remaining part of the MPEG-7 description does not depend on the various
profiles or instances and, as a result, can be used to describe jointly all
possible copies of the content.
MPEG-7
Application Domains
The elements
that MPEG-7 standardizes will support a broad a range of applications (for
example, multimedia digital libraries, broadcast media selection, multimedia
editing, home entertainment devices, etc.). MPEG-7 will also make the web as
searchable for multimedia content as it is searchable for text today. This
would apply especially to large content archives, which are being made
accessible to the public, as well as to multimedia catalogues enabling people
to identify content for purchase. The information used for content retrieval
may also be used by agents, for
the selection and filtering of broadcasted "push" material or for
personalized advertising. Additionally, MPEG-7 descriptions will allow fast and
cost-effective usage of the underlying data, by enabling semi-automatic
multimedia presentation and editing. All domains making use of multimedia will benefit from MPEG-7 including,
Ø Digital libraries, Education (image
catalogue, musical dictionary, Bio-medical imaging catalogues…)
Ø Multimedia editing (personalised
electronic news service, media autho Cultural services (history museums, art
galleries, etc.),
Ø Multimedia directory services (e.g.
yellow pages, Tourist information, Geographical information systems)
Ø Broadcast media selection (radio
channel, TV channel,…)
Ø Journalism (e.g. searching speeches
of a certain politician using his name, his voice or his face)
Ø E-Commerce (personalised
advertising, on-line catalogues, directories of e-shops,…) Surveillance (traffic control, surface
transportation, non-destructive testing in hostile environments, etc.)
Ø Investigation services (human
characteristics recognition, forensics)
Ø Home Entertainment (systems for the
management of personal multimedia collections, including manipulation of content,
e.g. home video editing, searching a game, karaoke,…)
Ø Social (e.g. dating services)
Typical
applications enabled by MPEG-7 technology include
• Audio: I
want to search for songs by humming or whistling a tune or, using an excerpt of
Pavarotti’s voice, get a list of Pavarotti’s records and video clips in which
Pavarotti sings or simply makes an appearance. Or, play a few notes on a keyboard and retrieve a
list of musical pieces similar to the required tune, or images matching the
notes in a certain way, e.g. in terms of emotions.
• Graphics: Sketch a few lines on a screen and
get a set of images containing similar graphics, logos, and ideograms.
• Image: Define objects, including color
patches or textures, and get examples from which you select items to compose
your image. Or check if your company logo was advertised on a TV channel as
contracted.
• Visual: Allow mobile phone access to video clips of goals scored in a soccer
game, or automatically search and retrieve any unusual movements from surveillance
videos.
• Multimedia: On a given set of multimedia
objects, describe movements and relations between objects and so search for
animations fulfilling the described temporal and spatial relations. Or,
describe actions and get a list of scenarios containing such actions.
In
this chapter they have introduce in brief some potential application areas and
real world applications for MPEG-7. Basically, all application domains making
use of multimedia can benefit from MPEG-7. The list below shows
some application areas and examples that MPEG-7 is capable of boosting [1].
Ø Broadcast media selection: media selection
for radio and TV channels
Ø E-commerce: personalized advertising,
on-line catalogues
Ø Home entertainment: systems for the
management of personal multimedia collections
Ø Multimedia editing: personalized electronic
news services
Ø Shopping: searching clothes that one likes
Ø Surveillance: traffic control
CONCLUSION
Mpeg-7 is an ambitious standardization effort from the Motion
Pictures Expert Group. A number of open questions still exist, but the established
results point to a promising future. However, the most important question still
needs to be answered, that is: What is the balance between flexibility and compatibility
within MPEG-7?
The MPEG-7 working group has to decide whether they follow a
specific, bottom-up approach for a few individual domains, or if the intention
is to let anyone create their own MPEG-7 solution. The group’s decision will
have a clear influence on the option of standardizing only the DDL, or a DDL
and a core set of the descriptors and description Schemes. MPEG-7 should make a
strong showing in some more applications by establishing Descriptions Schemes
and variants that would serve the video, image, music, speech, and sound
indexing communities well, allowing a number of initial products to target those
basic standards.MPEG-7 should provide a level of genericity (in the
Descriptors) and power (in the DDL) that will let specialized communities (such
as biomedical or remote sensing imaging) adapt the standard to their uses.
Furthermore, MPEG-7’s core goal is to provide interoperability. At
the end of MPEG-7, whether version 1 or 2, there should exist a single DDL, a
generic set of Descriptors for audio and visual features, and a specific
description scheme that serves specific applications. However, even the authors
are divided on the question of how to handle cases where a Feature cannot be
captured by simply structuring existing Descriptors into a novel Description
Scheme. The problem is that a Descriptor built using the DDL might allow the
novel Description Scheme to be perfectly parsable, but the new defined
Descriptor at the bottom of whatever structure might provide semantic
information that other computers can’t understand. On the other side,
introducing a registration body seems more problematic, especially since this
might also lead to forced incompatibilities due to a variety of competing but
incompatible Descriptors. Ultimately, struggling with these sorts of questions
makes the MPEG-7 process intellectually stimulating and rewarding.
We have faith that we will see a standard that provides the
compatibility of content descriptions, allowing a given community to adopt it
early.MPEG-7 should also offer the flexibility for that community to grow and
include other special interests.
No comments:
Post a Comment
leave your opinion