Moving Picture Experts Group-7(MPEG-7) - Seminar Report

Moving Picture Experts Group-7(MPEG-7)
An immeasurable amount of multimedia information is available today in digital archives, on the Web, in broadcast data streams, and in personal and professional databases and this amount continues to grow. Yet, the value of that information depends on how easily we can manage, find, retrieve, access, and filter it. The transition between two millennia abounds with new ways to produce, offer, filter, search, and manage digitized multimedia information. Broadband is being offered with increasing audio and video quality, using ever-improving access speeds on both fixed and mobile networks. As a result, users are confronted with numerous content sources. Wading through these sources, and finding what you need and what you like in the vast content sea, is becoming a daunting task.
MPEG-7—developed by the Moving Picture Experts Group (MPEG)—addresses this content management challenge. (This same International Organization for Standardization [ISO] committee also developed the successful standards known as MPEG-1 [1992], MPEG-2 [1995], and MPEG-4 [version 1 in 1998 and version 2 in 1999].) The recently completed ISO/IEC International Standard 15938, formally called the Multimedia Content Description Interface (but better known as MPEG-7), provides a rich set of tools for completely describing multimedia content. The standard wasn’t just designed from a content management viewpoint (classical archival information). It includes an innovative description of the media’s content, which we can extract via content analysis and processing. MPEG-7 also isn’t aimed at any one application; rather, the elements that MPEG-7 standardizes support as broad a range of applications as possible. This is one of the key differences between MPEG-7 and other metadata standards; it aims to be generic, not targeted to a specific application or application domain. This article provides a comprehensive overview of MPEG-7’s motivation, objectives, scope, and components. MPEG-7 offers a comprehensive set of audiovisual Description Tools (the metadata elements and their structure and relationships, that are defined by the standard in the form of Descriptors and Description Schemes) to create descriptions (i.e., a set of instantiated Description Schemes and their corresponding Descriptors at the users will), which will form the basis for applications enabling the needed effective and efficient access (search, filtering and browsing) to multimedia content.

Established in 1988, the Moving Picture Experts Group (MPEG) has developed digital audiovisual compression standards that have changed the way audiovisual content is produced by manifold industries, delivered through all sorts of distribution channels and consumed by a variety of devices.

Accessing audio and video used to be a simple matter - simple because of the simplicity of the access mechanisms and because of the poverty of the sources. An incommensurable amount of audiovisual information is becoming available in digital form, in digital archives, on the World Wide Web, in broadcast data streams and in personal and professional databases, and this amount is only growing. The value of information often depends on how easy it can be found, retrieved, accessed, filtered and managed.

The transition between the second and third millennium abounds with new ways to produce, offer, filter, search, and manage digitized multimedia information. Broadband is being offered with increasing audio and video quality and speed of access. The trend is clear: in the next few years, users will be confronted with such a large number of contents provided by multiple sources that efficient and accurate access to this almost infinite amount of content seems unimaginable today. Inspite of the fact that users have increasing access to these resources, identifying and managing them efficiently is becoming more difficult, because of the sheer volume. This applies to professional as well as end users. The question of identifying and managing content is not just restricted to database retrieval applications such as digital libraries, but extends to areas like broadcast channel selection, multimedia editing, and multimedia directory services.
This challenging situation demands a timely solution to the problem. MPEG-7 is the answer to this need.
MPEG-7 is an ISO/IEC standard developed by MPEG (Moving Picture Experts Group), the committee that also developed the successful standards known as MPEG-1 (1992) and MPEG-2 (1994), and the MPEG-4 standard (Version 1 in 1998, and version 2 in 1999). The MPEG-1 and MPEG-2 standards have enabled the production of widely adopted commercial products, such as Video CD, MP3, digital audio broadcasting (DAB), DVD, digital television (DVB and ATSC), and many video-on-demand trials and commercial services. MPEG-4 is the first real multimedia representation standard, allowing interactivity and a combination of natural and synthetic material coded in the form of objects (it models audiovisual data as a composition of these objects). MPEG-4 provides the standardized technological elements enabling the integration of the production, distribution and content access paradigms of the fields of interactive multimedia, mobile multimedia, interactive graphics and enhanced digital television.

The MPEG-7 standard, formally named "Multimedia Content Description Interface", provides a rich set of standardized tools to describe multimedia content. Both human users and automatic systems that process audiovisual information are within the scope of MPEG-7.

MPEG-7 offers a comprehensive set of audiovisual Description Tools (the metadata elements and their structure and relationships, that are defined by the standard in the form of Descriptors and Description Schemes) to create descriptions (i.e., a set of instantiated Description Schemes and their corresponding Descriptors at the users will), which will form the basis for applications enabling the needed effective and efficient access (search, filtering and browsing) to multimedia content. This is a challenging task given the broad spectrum of requirements and targeted multimedia applications, and the broad number of audiovisual features of importance in such context.

MPEG-7 has been developed by experts representing broadcasters, electronics manufacturers, content creators and managers, publishers, intellectual property rights managers, telecommunication service providers and academia.

The family of MPEG standards
MPEG is a working group of the International Organization for Standardization/International Electronics Commission (ISO/IEC), in charge of developing international standards for compression, decompression, processing, and coded representation of moving pictures, audio, and their combination. So far, MPEG has produced MPEG- 1, MPEG-2, MPEG-4 version 1, and is currently working on MPEG-4 version 2 and MPEG-7.

MPEG-1: Storage and retrieval
MPEG-1 is the standard for storage and retrieval of moving pictures and audio on storage media. MPEG-1 provides a nominal data stream compression rate of about 1.2 Mbits per second—the typical CD-ROM data transfer rate—but can deliver data at a rate of up to 1,856,000 bps. MPEG-1 distinguishes four types of image coding for processing: I (intra-coded pictures), P (predictive coded pictures), B (bidirectionally predictive pictures), and D-Frame (coding based on discrete cosine only parameter) images. To allow audio compression in acceptable quality, MPEG-1 enables audio data rates between 32 and 448 Kbps. MPEG-1 explicitly considers other standards and functionalities, such as JPEG and H.261, suitable for symmetric and asymmetric compression. It also provides a system definition to specify the combination of several individual data streams. Note that MPEG-1 doesn’t prescribe compression in real time. Furthermore, though MPEG-1 defines the process of decoding, it doesn’t define the decoder itself. The quality of an MPEG-1 video without sound at roughly 1.2 Mbps (the single speed CD-ROM transfer rate) is equivalent to a VHS recording. We should mention that MPEG-1 provides a means for transmitting metadata. In general, two mechanisms exist, the transmission of z user data extensions within a video stream or z data in a separated private data stream that gets multiplexed with the audio and video stream as part of the system stream. Since both methods attach additional data into the MPEG-1 stream, they either increase the demand of bandwidth for transmission/storage or reduce the quality of the audio-visual streams for a given bandwidth. No format for the coding of those extra streams was defined, which led to proprietary solutions. This might explain why these mechanisms aren’t widely adopted.

MPEG-2: Digital television
MPEG-2, the digital television standard, strives for a higher resolution—up to 100 Mbps—that resembles the digital video studio standard CCIR 601 and the video quality needed in HDTV. As a compatible extension to MPEG-1, MPEG-2 supports interlaced video formats and a number of other advanced features, such as those to support HDTV. As a generic standard, MPEG-2 was defined in terms of extensible profiles, each supporting the feature required by an important application class. The Main Profile, for example, supports digital video transmission at a range of 2 to 80 Mbps over cable, satellite, and other broadcast channels. Furthermore, it supports digital storage and other communications applications. An essential extension from MPEG-1 to MPEG-2 is the ability to scale the compressed video, which allows the encoding of video at different qualities (spatial-, rate-, and amplitude-based scaling2). The MPEG-2 audio coding was developed for low bit-rate coding of multichannel audio. MPEG- 2 extends the MPEG-1 standard by providing five full bandwidth channels, two surround channels, one channel to improve low frequencies, and/or seven multilingual channels, and the coding of mono and stereo (at 16 kHz, 22.05 kHz, and 24 kHz).  Nevertheless, MPEG-2 is still backward compatible with MPEG-1. MPEG-2 provides an MPEG-2 system with definitions of how video, audio, and other data combine into single or multiple streams suitable for storage and transmission. Furthermore, it provides syntactical and semantical rules that synchronize decoding and presentation of audio and video information.
With respect to transmission/storage, the same mechanisms developed for MPEG-1 were assigned to MPEG-2. Additionally, some of the MPEG-2 header contains a structured information block, covering such application-related information as copyright and conditional access. The amount of information is restricted to a number of bytes. Reimers3 described an extensive structuring of content, coding, and access of such metadata within MPEG-2. Originally, there were plans to specify MPEG-3 as a standard approaching HDTV. However, during the development of MPEG-2, researchers found that it scaled up adequately to meet HDTV requirements. Thus, MPEG-3 was dropped.

MPEG-4: Multimedia production, distribution, and content   access
Though the results of MPEG-1 and MPEG-2 served well for wide-ranging developments in such fields as interactive video, CD-ROM, and digital TV, it soon became apparent that multimedia applications required more than the established achievements. Thus, in 1993 MPEG started working to provide the standardized technological elements enabling the integration of the production, distribution, and content access paradigms of digital TV, interactive graphics applications (synthetic content), and interactive multimedia (distribution of and access to enhanced content on the Web). MPEG-4 version 1, formally called ISO/IEC 14496, has been available as an international standard since December 1998. The second version will be finished in December 1999. MPEG-4 aims to provide a set of technologies to satisfy the needs of authors, service providers, and end users, by avoiding the emergence of a multitude of proprietary, incompatible formats and players. The standard should allow the development of systems that can be configured for a vast number of applications (among others, real-time communications, surveillance, and mobile multimedia). To achieve this requires providing standardized ways to z Interact with the material, based on encoding units of aural, visual, or audio-visual content,called media objects. These media objects can be natural or synthetic, which means they could be recorded with a camera or microphone, or generated with a computer. z Interact with the content, based on the

Introduction to MPEG-7

MPEG-7 is a standard for describing features of multimedia content.MPEG-7, formally named “Multimedia Content Description Inter-face,” is the standard that describes multimedia content so users can search, browse, and retrieve that content more efficiently and effectively than they could using today’s mainly text-based search engines. It’s a standard for describing the features of multimedia content.

 Qualifying MPEG-7

MPEG-7 provides the world’s richest set of audio-visual descriptions.
            These descriptions are based on catalogue (e.g., title, creator, rights), semantic (e.g., the who, what, when, where information about objects and events) and structural (e.g., the colour histogram - measurement of the amount of colour associated with an image or the timbre of an recorded instrument) features of the AV content and leverages on AV data representation defined by MPEG-1, 2 and 4.
Comprehensive Scope of Data Interoperability:
            MPEG-7 uses XML(Extensible Mark up Language) Schema as the language of choice for content description MPEG-7 will be interoperable with other leading standards such as, SMPTE Metadata Dictionary, Dublin Core, EBU P/Meta, and TV Anytime. XML has not been designed to deal ideally in a real-time, constrained and streamed environment like in the multimedia or mobile industry. As long as structured documents (HTML, for instance) were basically composed of only few embedded tags, the overhead induced by textual representation was not critical. MPEG-7 standardizes an XML language for audiovisual metadata. MPEG-7 uses XML to model this rich and structured data. To overcome the lack of efficiency of textual XML, MPEG-7 Systems defines a generic framework to facilitate the carriage and processing of MPEG-7 descriptions: BiM (Binary Format for MPEG-7). It enables the streaming and the compression of any XML documents.
BiM coders and decoders can deal with any XML language. Technically, the schema definition (DTD or XML Schema) of the XML document is processed and used to generate a binary format. This binary format has two main properties. First, due to the schema knowledge, structural redundancy (element name, attribute names, aso) is removed from the document. Therefore the document structure is highly compressed (98% in average). Second, elements and attributes values are encoded according to some dedicated codecs. A library of basic datatype codecs is provided by the specification (IEEE 754, UTF_8, compact integers, VLC integers, lists of values, aso...). Other codecs can easily be plugged using the type-codec mapping mechanism.
MPEG-7 Elements
              In October 1996, MPEG started a new work item to provide a solution to the questions described above. The new member of the MPEG family, named "Multimedia Content Description Interface" (in short MPEG-7), provides standardized core technologies allowing the description of audiovisual data content in multimedia environments. It extends the limited capabilities of proprietary solutions in identifying content that exist today, notably by including more data types.
The main elements of the MPEG-7 standard are:
·         Description Tools: Descriptors (D), that define the syntax and the semantics of each feature (metadata element); and Description Schemes (DS), that specify the structure and semantics of the relationships between their components, that may be both Descriptors and Description Schemes; A Descriptor defines the syntax and semantics of each feature. For example, for the color feature, the color histogram or the text of the title is the descriptor. The director of a multimedia document or a texture in a single picture is also an example of a descriptor .A Description Scheme specifies the structure and semantics of the relationships between its components which may be both Descriptors and Description Schemes.  
·           A Description Definition Language (DDL) to define the syntax of the MPEG-7 Description Tools and to allow the creation of new Description Schemes and, possibly, Descriptors and to allow the extension and modification of existing Description Schemes;    
·          System tools, to support binary coded representation for efficient storage and transmission, transmission mechanisms (both for textual and binary formats), multiplexing of descriptions, synchronization of descriptions with content, management and protection of intellectual property in MPEG-7 descriptions,etc

Basic structures

There are five Visual related Basic structures: the Grid layout, the Time series, Multiple view, the Spatial 2D coordinates, and Temporal interpolation.

Grid layout

The grid layout is a splitting of the image into a set of equally sized rectangular regions, so that each region can be described separately. Each region of the grid can be described in terms of other Descriptors such as color or texture. Furthermore, the descriptor allows to assign the sub Descriptors to all rectangular areas, as well as to an arbitrary subset of rectangular regions.

Time Series

This descriptor defines a temporal series of Descriptors in a video segment and provides image to video-frame matching and video-frames to video-frames matching functionalities. Two types of TimeSeries are available: Regular TimeSeries and Irregular TimeSeries. In the former, Descriptors locate regularly (with constant intervals) within a given time span. This enables a simple representation for the application that requires low complexity. On the other hand, Descriptors locate irregularly (with various intervals) within a given time span in the latter. This enables an efficient representation for the application that has the requirement of narrow transmission bandwidth or low storage capability. These are useful in particular to build Descriptors that contain time series of Descriptors.

2D-3D Multiple View

The 2D/3D Descriptor specifies a structure which combines 2D Descriptors representing a visual feature of a 3D object seen from different view angles. The descriptor forms a complete 3D view-based representation of the object. Any 2D visual descriptor, such as for example contour-shape, region-shape, colour or texture can be used. The 2D/3D descriptor supports integration of the 2D Descriptors used in the image plane to describe features of the 3D (real world) objects. The descriptor allows the matching of 3D objects by comparing their views, as well as comparing pure 2D views to 3D objects.

Spatial 2D Coordinates

It supports two kinds of coordinate systems: "local" and "integrated" (see Figure 3.2). In a "local" coordinate system, the coordinates used for the calculation of the description is mapped to the current coordinate system applicable. In an "integrated" coordinate system, each image (frame) of e.g. a video may be mapped to different areas with respect to the first frame of a shot or video. The integrated coordinate system can for instance be used to represent coordinates on a mosaic of a video shot.
Temporal Interpolation
The Temporal Interpolation Dimensional describes a temporal interpolation using connected polynomials. This can be used to approximate multi-dimensional variable values that change with time—such as an object position in a video. The description size of the temporal interpolation is usually much smaller than describing all values. In Figure 3.3 real values are represented by five linear interpolation functions and two quadratic interpolation functions. The beginning of the temporal interpolation is always aligned to time 0.
Encoding and Delivery
    BiM(Binary format for MPEG-7) coders and decoders can deal with any XML language. Technically, the schema definition (DTD or XML Schema) of the XML document is processed and used to generate a binary format. This binary format has two main properties. First, due to the schema knowledge, structural redundancy (element name, attribute names) is removed from the document. Second, elements and attributes values are encoded according to some dedicated codecs.

Technical description of MPEG-7
This section contains a detailed overview of the different MPEG-7 technologies that are currently standardized. First the MPEG-7 Multimedia Descriptions Schemes are described as the other Description Tools (Visual and Audio ones) are used always wrapped in some MPEG-7 MDS descriptions. Afterwards the Visual and Audio Description Tools are described in detail. Then the DDL is described, paving the ground for describing the MPEG-7 formats, both textual (TeM) and binary (BiM). Then the MPEG-7 terminal architecture is presented, followed by the Reference Software. Finally the MPEG-7 Conformance specification and the Extraction and Use of Descriptions Technical Report are explained.

MPEG-7 Multimedia Description Schemes
MPEG-7 Multimedia Description Schemes (DSs) are metadata structures for describing and annotating audio-visual (AV) content. The DSs provide a standardized way of describing in XML the important concepts related to AV content description and content management in order to facilitate searching, indexing, filtering, and access. The DSs are defined using the MPEG-7 Description Definition Language (DDL), which is based on the XML Schema Language, and are instantiated as documents or streams. The resulting descriptions can be expressed in a textual form (i.e., human readable XML for editing, searching, filtering) or compressed binary form (i.e., for storage or transmission). In this paper, they  provide an overview of the MPEG-7 Multimedia DSs and describe their targeted functionality and use in multimedia applications.
The goal of the MPEG-7 standard is to allow interoperable searching, indexing, filtering and access of audio-visual (AV) content by enabling interoperability among devices and applications that deal with AV content description. MPEG-7 describes specific features of AV content as well as information related to AV content management. MPEG-7 descriptions take two possible forms: (1) a textual XML form suitable for editing, searching, and filtering, and (2) a binary form suitable for storage, transmission, and streaming delivery. Overall, the standard specifies four types of normative elements: Descriptors, Description Schemes (DSs), a Description Definition Language (DDL), and coding schemes.
The MPEG-7 Descriptors are designed primarily to describe low-level audio or visual features such as color, texture, motion, audio energy, and so forth, as well as attributes of AV content such as location, time, quality, and so forth. It is expected that most Descriptors for low-level features shall be extracted automatically in applications.
On the other hand, the MPEG-7 DSs are designed primarily to describe higher-level AV features such as regions, segments, objects, events; and other immutable metadata related to creation and production, usage, and so forth. The DSs produce more complex descriptions by integrating together multiple Descriptors and DSs, and by declaring relationships among the description components. In MPEG-7, the DSs are categorized as pertaining to the multimedia, audio, or visual domain. Typically, the multimedia DSs describe content consisting of a combination of audio, visual data, and possibly textual data, whereas, the audio or visual DSs refer specifically to features unique to the audio or visual domain, respectively. In some cases, automatic tools can be used for instantiating the DSs, but in many cases instantiating DSs requires human assisted extraction or authoring tools.

Content Management

MPEG-7 provides DSs for AV content management. These tools describe the following information: (1) creation and production, (2) media coding, storage and file formats, and (3) content usage. More details about the MPEG-7 tools for content management are described as follows [Many of the components of the content management DSs are optional. The instantiation of the optional components is often decided in view of the specific multimedia application.]: The Creation Information describes the creation and classification of the AV content and of other related materials. The Creation information provides a title (which may itself be textual or another piece of AV content), textual annotation, and information such as creators, creation locations, and dates. The classification information describes how the AV material is classified into categories such as genre, subject, purpose, language, and so forth. It provides also review and guidance information such as age classification, parental guidance, and subjective review. Finally, the Related Material information describes whether there exists other AV materials that are related to the content being described.The Media Information describes the storage media such as the format, compression, and coding of the AV content. The Media Information DS identifies the master media, which is the original source from which different instances of the AV content are produced. The instances of the AV content are referred to as Media Profiles, which are versions of the master obtained perhaps by using different encodings, or storage and delivery formats. Each Media Profile is described individually in terms of the encoding parameters, storage media information and location. The Usage Information describes the usage information related to the AV content such as usage rights, usage record, and financial information. The rights information is not explicitly included in the MPEG-7 description, instead, links are provided to the rights holders and other information related to rights management and protection. The Rights DS provides these references in the form of unique identifiers that are under management by external authorities. The underlying strategy is to enable MPEG-7 descriptions to provide access to current rights owner information without dealing with information and negotiation directly. The Usage Record DS and Availability DSs provide information related to the use of the content such as broadcasting, on demand delivery, CD sales, and so forth. Finally, the Financial DS provides information related to the cost of production and the income resulting from content use. The Usage Information is typically dynamic in that it is subject to change during the lifetime of the AV content.
The Content Management Description Tools allow the description of the life cycle of the content, from content to consumption.
The content described by MPEG-7 descriptions can be available in different modalities, formats, Coding Schemes, and there can be several instances. For example, a concert can be recorded in two different modalities: audio and audio-visual. Each of these modalities can be encoded by different Coding Schemes. This creates several media profiles. Finally, several instances of the same encoded content may be available. These concepts of modality, profile and instance
·         Content: One reality such as a concert in the world can be represented as several types of media, e.g., audio media, audio-visual media. A content is an entity that has a specific structure to represent the reality.
·         Media Information: Physical format of a content is described by Media Information DS. One description instance of the DS will be attached to one content entity to describe it. The DS is centered about an identifier for the content entity and it also has sets of Descriptors for the storage format of the entity.
·         Media Profile: One content entity can have one or more media profiles that correspond to different Coding Schemes of the entity. One of the profiles is the original one, called master profile, that corresponds to initially created or recorded one. The others will be transcoded from the master. If the content is encoded with the same encoding tool but with different parameters, different media profiles are created.

·         Media Instance: A content entity can be instantiated as physical entities called media instances. An identifier and a locator specify the media instance.

·         Creation Information: Information about the creation process of a content entity is described by Creation Information DS. One description instance of the DS will be attached to one content entity to describe it.

·         Usage Information: Information about the usage of a content entity is described by Usage Information DS. One description instance of the DS will be attached to one content entity to describe it.
The only part of the description that depends on the storage media or the encoding format is the Media Information described in this section. The remaining part of the MPEG-7 description does not depend on the various profiles or instances and, as a result, can be used to describe jointly all possible copies of the content.

MPEG-7 Application Domains

            The elements that MPEG-7 standardizes will support a broad a range of applications (for example, multimedia digital libraries, broadcast media selection, multimedia editing, home entertainment devices, etc.). MPEG-7 will also make the web as searchable for multimedia content as it is searchable for text today. This would apply especially to large content archives, which are being made accessible to the public, as well as to multimedia catalogues enabling people to identify content for purchase. The information used for content retrieval may also be used by agents, for the selection and filtering of broadcasted "push" material or for personalized advertising. Additionally, MPEG-7 descriptions will allow fast and cost-effective usage of the underlying data, by enabling semi-automatic multimedia presentation and editing. All domains making use of multimedia will benefit from MPEG-7 including,
Ø  Digital libraries, Education (image catalogue, musical dictionary, Bio-medical imaging catalogues…)
Ø  Multimedia editing (personalised electronic news service, media autho Cultural services (history museums, art galleries, etc.),
Ø  Multimedia directory services (e.g. yellow pages, Tourist information, Geographical information systems)
Ø  Broadcast media selection (radio channel, TV channel,…)
Ø  Journalism (e.g. searching speeches of a certain politician using his name, his voice or his face)
Ø  E-Commerce (personalised advertising, on-line catalogues, directories of e-shops,…) Surveillance (traffic control, surface transportation, non-destructive testing in hostile environments, etc.)
Ø  Investigation services (human characteristics recognition, forensics)
Ø  Home Entertainment (systems for the management of personal multimedia collections, including manipulation of content, e.g. home video editing, searching a game, karaoke,…)
Ø  Social (e.g. dating services)

Typical applications enabled by MPEG-7 technology include

Audio: I want to search for songs by humming or whistling a tune or, using an excerpt of Pavarotti’s voice, get a list of Pavarotti’s records and video clips in which Pavarotti sings or simply makes an appearance. Or, play a few notes on a keyboard and retrieve a list of musical pieces similar to the required tune, or images matching the notes in a certain way, e.g. in terms of emotions.
Graphics: Sketch a few lines on a screen and get a set of images containing similar graphics, logos, and ideograms.
Image: Define objects, including color patches or textures, and get examples from which you select items to compose your image. Or check if your company logo was advertised on a TV channel as contracted.
• Visual: Allow mobile phone access to video clips of goals scored in a soccer game, or automatically search and retrieve any unusual movements from surveillance videos.
• Multimedia: On a given set of multimedia objects, describe movements and relations between objects and so search for animations fulfilling the described temporal and spatial relations. Or, describe actions and get a list of scenarios containing such actions.
In this chapter they have introduce in brief some potential application areas and real world applications for MPEG-7. Basically, all application domains making use of multimedia can benefit from MPEG-7. The list below shows some application areas and examples that MPEG-7 is capable of boosting [1].
Ø  Broadcast media selection: media selection for radio and TV channels
Ø  E-commerce: personalized advertising, on-line catalogues
Ø  Home entertainment: systems for the management of personal multimedia collections
Ø  Multimedia editing: personalized electronic news services
Ø  Shopping: searching clothes that one likes
Ø  Surveillance: traffic control

Mpeg-7 is an ambitious standardization effort from the Motion Pictures Expert Group. A number of open questions still exist, but the established results point to a promising future. However, the most important question still needs to be answered, that is: What is the balance between flexibility and compatibility within MPEG-7?
The MPEG-7 working group has to decide whether they follow a specific, bottom-up approach for a few individual domains, or if the intention is to let anyone create their own MPEG-7 solution. The group’s decision will have a clear influence on the option of standardizing only the DDL, or a DDL and a core set of the descriptors and description Schemes. MPEG-7 should make a strong showing in some more applications by establishing Descriptions Schemes and variants that would serve the video, image, music, speech, and sound indexing communities well, allowing a number of initial products to target those basic standards.MPEG-7 should provide a level of genericity (in the Descriptors) and power (in the DDL) that will let specialized communities (such as biomedical or remote sensing imaging) adapt the standard to their uses.
Furthermore, MPEG-7’s core goal is to provide interoperability. At the end of MPEG-7, whether version 1 or 2, there should exist a single DDL, a generic set of Descriptors for audio and visual features, and a specific description scheme that serves specific applications. However, even the authors are divided on the question of how to handle cases where a Feature cannot be captured by simply structuring existing Descriptors into a novel Description Scheme. The problem is that a Descriptor built using the DDL might allow the novel Description Scheme to be perfectly parsable, but the new defined Descriptor at the bottom of whatever structure might provide semantic information that other computers can’t understand. On the other side, introducing a registration body seems more problematic, especially since this might also lead to forced incompatibilities due to a variety of competing but incompatible Descriptors. Ultimately, struggling with these sorts of questions makes the MPEG-7 process intellectually stimulating and rewarding.
We have faith that we will see a standard that provides the compatibility of content descriptions, allowing a given community to adopt it early.MPEG-7 should also offer the flexibility for that community to grow and include other special interests.

No comments:

Post a Comment

leave your opinion