Stenography goal is to conceal the presence of a secret message within an innocuous-looking communication.In other words, steganography consists of hiding a secret

If Eve detects any sign of conspiracy, she will ution, except that it is generated from independently repeated experiments.

Alice uses a covertext that is known to Eve and modifies it for embedding hidden information.Shannon 's pioneering work. Information theory is today regarded as the \right" approach

Alice operates in one of two modes. In the first case, Alice is inactive and sends an innocent,

*hidden text*message within a public*cover text*to obtain a*stego text*in such a way that any observer (except, of c_{the}se, the intended recipient) is unable to distinguish between a covertext*with*a hiddentext and one*without*. The model is perhaps best illustrated by Simmons”Prisoners' Problem". Alice and Bob are in jail,locked up in separate cells far apart from each other, and wish to devise an escape plan. Theyare allowed to communicate by means of sending authenticated messages via trusted c_{the}iers,provided they do not deal with escape plans. The c_{the}iers are agents of the warden Eve (the adversary) and will leak all communication to her.If Eve detects any sign of conspiracy, she will ution, except that it is generated from independently repeated experiments.

This paper views steganography as information hiding with a passive adversary thwart the escape plans by transferring both prisoners to high-security cells from which nobody has ever escaped. Alice and Bob are well aware of these facts, so that before getting locked up, they have shared a few secret codewords that they are now going to exploit for adding a hidden meaning to their seemingly innocent messages. Alice and Bob succeed if they can exchange information allowing them to coordinate their escape and Eve does not become suspicious.

Of c

_{the}se, Eve knows what a \legitimate" conversation among prisoners is like, and shealso knows about the tricks that prisoners apply to embed a hidden meaning in a seemingly

innocent message. Following the approach of information theory, we capture this knowledge

by a probabilistic model, and view Eve's task of detecting hidden messages as a problem of

We consider only the scenario where Alice sends a message to Bob. Eve

models an innocent message from Alice as a covertext C with probability distribution

_{PC}. Amessage with embedded hidden information is called stegotext and denoted by S.

In general, Eve might not know the process by which stegotext is generated; thus, Eve's

task would be to decide whether the observed message has been produced under the known

covertext distribution or under another distribution unknown to her. However, we adopt a

stronger model and assume that Eve has complete knowledge of the embedding and extraction

processes in a steganographic system, except for a short secret key K shared by Alice and Bob.

Upon observing the message sent by Alice , Eve has to decide whether it is covertext or

stegotext. This is the problem of choosing one of two different explanations for observed data,

known as ” hypothesis testing" in information theory Recall that Eve knows the probability distributions of covertext and stegotext, and draws her conclusion about the observed

message only from this knowledge. However, Eve does not know if Alice produced the message

according to

_{Pc}or_{PS}, nor is she willing to assign any a priori probabilities to these two explanations.We define the security of the steganographic system used by Alice and Bob (or stegosystem

for short) in terms of the relative entropy D(

_{PC}||_{PS}) between_{PC}and_{PS}, which yields quantitativebounds on Eve's detection performance. If the covertext and the stegotext distributions

are equal, D(

_{PC}||_{PS}) = 0 and we have a perfectly secure stegosystem; Eve cannot distinguishthe two distributions and has no information at all about the presence of an embedded message.

This parallels Shannon 's notion of perfect secrecy for cryptosystems .

Note how

_{the}model differs from the scenario sometimes considered for steganography, whereSuch schemes can only o_er protection against adversaries with limited capability of comparing

the modi_ed stegotext to the covertext. For instance, this applies to the popular use of

steganography on visual images, where a stegoimage may be perceptually indistinguishable from

the coverimage for humans, but not for an algorithm with access to the coverimage.

Limitations. How well

_{the}information-theoretic model covers real-world steganographic applications depends crucially on the assumption that there is a probabilistic model of the covertext.Moreover, the users of a stegosystem need at least some knowledge about the covertext

distribution, as will become clear in the description of

_{the}stegosystems.Probabilistic modeling of information is the subject of information theory, originating with

to quantifying information and to reasoning about the performance of communication channels.

This confidence in the theory stems from many practical coding schemes that have been built

according to the theory and perform well in real applications.

But the situation in steganography is more involved, since even a perfectly secure stegosystem

requires that the users and the adversary share the same probabilistic model of the covertext.

For instance, if the covertext distribution consists of uniformly random bits, then encrypting

a message under a one-time pad results in a perfectly secure stegosystem according to

_{the}notion of security. But no reasonable warden would allow the prisoners to exchange randomlooking messages in the Prisoners' Problem, since the use of encryption is clearly forbidden! Thus, the validity of a formal treatment of steganography is determined by the accuracy of a probabilistic model for the real data. Assuming knowledge of the covertext distribution seems to render

_{the}model somewhat unrealistic for the practical purposes of steganography. But what are the alternatives? Should we rather study the perception and detection capabilities of the human cognition since most coverdata (images, text, and sound) is ultimately addressed to humans? Viewed in this way, steganography could fall entirely into the realms of image, language, and audio processing. However, it seems that an information-theoretic model, or any other formal approach, is more useful for deriving statements about the security of steganography schemes and a formal security notion is one of the main reasons for introducing a mathematical model of steganography. So far most formal models of information hiding address the case of active adversaries. This problem is different from the one considered here since the existence of a hidden message is typically known publicly, as for example in copyright protection schemes. Information hiding with active adversaries can be divided into watermarking and fingerprinting. Watermarking supplies digital objects with an identification of origin; all objects are marked in the same way. Fingerprinting, conversely, attempts to identify individual copies of an object by means of embedding a unique marker in every copy that is distributed to a user. It is proposed a slightly different terminology and define watermarking in general as hiding covertext-dependent information, regardless of the adversary model. As most objects to be protected by watermarking consist of audio, image, or video data, these domains have received the most attention so far. A large number of hiding techniques and domain-specific models have been developed for robust, imperceptible information hiding .

**: A general model for information hiding with active adversaries was formulated by Mittelholzer, but its hiding property also relies on the similarity of stegodata and coverdatain terms of a perceptually motivated distortion measure. Zollner use information theoretic methods to conclude that the embedding process in steganography must involve uncertainty. A complexity-theoretic model for steganography, which shares**

*Ettinger models active adversaries with game-theoretic techniques*_{the}focus on the indistinguishability of the stegotext from a given covertext distribution, has recently been proposed by Hopper, Langford, and van Ahn.

Another related work was by Maurer on unconditionally secure authentication in

cryptography, which demonstrates the generality of the hypothesis testing approach.

A large number of techniques for undetectable communication originate in the military

domain, where they have found many applications. This includes radar, spread-spectrum communication, and covert channels. It is likely that

_{the}model is also applicable to those areas.**Model**

**Preliminaries**.

We define the basic properties of a stegosystem using the notions of entropy,

mutual information, and relative entropy [2, 3].

The entropy of a probability distribution P

_{X }over an alphabet X is defined asH(X) = -∑ P

_{X}(x) log P_{X}(x): xєX

When X denotes a random variable with distribution P

_{X}, the quantity H(X) is simply calledthe entropy of the random variable X (with the standard convention 0 log 0 = 0 and logarithms

to the base 2). Similarly, the conditional entropy of a random variable X given a random

variable Y is

H(X|Y ) = ∑ P

_{Y}(y)H(X|Y = y); yєY

where H(X|Y = y) denotes the entropy of the conditional probability distribution P(X|Y) =y.

The entropy of any distribution satisfies 0 <= H(X)<= log |X |, where |X | denotes the cardinality

of X.

The mutual information between X and Y is defined as the reduction of entropy that

Y provides about X, i.e., I(X; Y ) = H(X)- H(X|Y ). It is symmetric in X and Y , i.e.,

I(X; Y ) = I(Y ;X), and always non-negative.

The relative entropy or discrimination between two probability distributions PQ0 and PQ1

is defined as

D(P

_{Q0}||P_{Q1}) = ∑ P_{Q0}(q) log PQ0(q)/P_{Q1}(q) qєQ

(with 0 log 0/0 = 0 and p log p/0 = ∞ if p > 0).

The conditional relative entropy between PQ0 and PQ1 given a random variable V de_ned

in both probability spaces is

D(P

_{Q0||V}||P_{Q1|V}) =∑ P_{V}(v) ∑ P_{Q0|V}_{=v(q) }log( P_{Q0|V =v(q) })/ P_{Q1|V =v(q)} vєV qєQ

The relative entropy between two distributions is non-negative and it is equal to 0 if and

only if the distributions are equal. Although relative entropy is not a true distance measure in

the mathematical sense, because it is not symmetric and does not satisfy the triangle inequality,

it may be useful to think of it as a distance.

**Stegosystems**.

We adopt the standard terminology of information hiding for the model of a stegosystem. There are two parties, Alice and Bob, who are the users of the stegosystem. Alice wishes to send an innocent-looking message with a hidden meaning over a public channel to Bob, such that the presence of hidden information goes unnoticed by a third party, the adversary Eve, who has perfect read-only access to the public channel.

legitimate message containing no hidden information, called covertext and denoted by C; it is

generated according to a distribution

_{PC}known to Eve. One may imagine that the covertextis generated by a sAlice has access. In the second case, Alice is active

_{the}ce to which only. When Alice is active, the switch is in position 1 and she outputs stegotext S, which contains a hidden message E and was produced using knowledge of the secret key K shared by Alice and Bob. When Alice is inactive, the switch is in position 0, no embedding occurs, and Alice outputs covertext C sends stegotext S computed from an embedding function A, containing an embedded message E for Bob. The message is a random variable drawn from a message space E. Alice 's embedding algorithm may access a private random s

_{the}ce R; we assume that R is independent of E and C. The embedding function A and the distributions of C, E, and R are known to Eve. For the moment, we assume that Alice and Bob know the covertext and stegotext distributions; this will be relaxed later. In addition, Alice and Bob share a secret key K, which is unknown to Eve. The key has been chosen at random and communicated over a secure channel prior to the use of the stegosystem| in any case before the message E that Alice wants to communicate to Bob becomes known.Thus, we assume that K is independent of E, R, and C.Figure 1 shows the operation of a stegosystem in more detail. The switch at Alice 's end of

the public channel determines if Alice is active or not.

In the first case (switch in position 0), Alice is inactive and sends only legitimate covertext

C to Bob over the public channel. The covertext is generated by a covertext s

_{the}ce;no embedding takes place. The adversary Eve observes C.

In the second case (switch in position 1), Alice is active and is given a message E that she

\embeds" into the given covertext C using the embedding function A. More precisely, A is an algorithm that takes C, the shared key K, and private randomness R as inputs and

produces stegotext S. The stegotext is sent to Bob over the public channel. The adversary

Eve and the receiver Bob observe S. Using his extracting algorithm B, Bob extracts a

decision value ^ E from S and K, in the hope that this gives him some information about E.

The embedding algorithm may exploit knowledge about the covertext distribution. However,

we require that given a covertext distribution, the embedding function A is universal, i.e.,

works for any distribution of E over E. Thus, A must not depend on knowledge of the message

distribution PE. This makes the stegosystem robust in the sense that the legitimate users do

not have to worry about the adversary's knowledge of E. Such a precaution is often made in

cryptographic contexts.Furthermore, we assume that Bob has an oracle that tells him if Alice is active or not.This is a strong assumption, and we make it here in order to focus on the security properties of a stegosystem in general. Removing it does not hurt the security of a stegosystem with respect to Eve's detection capability|if Bob was trying to extract an embedded message from

the covertext when Alice is inactive, he would merely obtain garbage. As discussed below, the

oracle also does not open the way to trivial stegosystems. Later on in Example 2, we discuss

a more practical class of stegosystems, in which this assumption is not necessary, because Bob

may detect the presence of Alice 's message from redundancy in the embedded information.

From the point of view of Eve, who does not know if Alice is active, the two cases above

look similar: she observes data that is sent from Alice to Bob over the public channel. Let

M denote the message on the channel. If Alice is active, then M = S and M = C if Alice is

inactive. Thus, M was generated either according to

_{PC}or according to_{PS}; these are the twoexplanations that Eve has for the observation. Eve, upon observing the message sent by Alice , has to decide whether it was covertext C or stegotext S, i.e., whether Alice is inactive or active. We quantify the security of a stegosystem in terms of the relative entropy between

_{PC}and_{PS}.**Definition 1.**

Fix a covertext distribution C and a message space E. A pair of algorithms

(A; B) is called a stegosystem if there exist random variables K and R as described above such

that for all random variables E over E with H(E) > 0, it holds I(E^;E) > 0.

Moreover, a stegosystem is called perfectly secure (against passive adversaries) if

D(

_{PC}||_{PS}) = 0;and a stegosystem is called є -secure (against passive adversaries) if

D(

_{PC}||_{PS})<=є This model describes a stegosystem for one-time use, where Alice is always active or not.

If Alice sends multiple dependent messages to Bob and at least one of them contains hidden

information, she is considered to be active at all times and S consists of the concatenation of

all her messages.

Some remarks on the definition.

1. The condition in the definition of a stegosystem, I(E^;E) > 0, implies that a stegosystem

is “useful" in the sense that Bob obtains at least some information about E. We chose

not to model “useless" stegosystems.

2. It would be natural to require explicitly that a perfectly secure stegosystem provides also

perfect secrecy for E in the sense of Shannon by demanding that S and E are statistically

independent However, this is not necessary since we required Alice 's embedding algorithm to work for any distribution PE, without depending on the distribution itself. This guarantees perfect secrecy for E against Eve as follows. Fix a covertext distribution and an embedding function A. For all possible distributions of E, A must produce S with the same distribution as C. Since a

concrete message value corresponds to a particular distribution of E but the distribution

of S is the same for all values, S is statistically independent from E.

Analogously, we do not impose a secrecy constraint on E for stegosystems with _ > 0. The

implications for the secrecy of E are more involved and not investigate here; however, it is

easy to construct stegosystems with perfect secrecy also in this case

3. In Alice is active or not.

_{the}definition of a stegosystem, Bob knows from an oracle ifHence, one might be tempted to construct the following “perfect" stegosystem that exploits

this knowledge for transmitting hidden information without using a shared secret

key. W.l.o.g. consider a deterministic embedding algorithm A consisting of an ideal s

_{the}ceencoder that manages to compress some message E1 into stegotext S1, which consists of

independent and uniformly random bits. If the given C is a sequence of independent

and uniformly random bits of the same length, the two distributions are the same and

Eve cannot distinguish a compressed message from covertext. In this case, Bob obtains

E1 without any secret key. His advantage to distinguish stegotext from covertext stems

entirely from the oracle, and one might conclude that assuming such an oracle allows for

trivial stegosystems.

However, this conclusion does not hold because the described stegosystem is not perfectly

secure according to Definition 1. Recall that A is required to work for any message

distribution, so it must work also for some E2 with strictly less entropy than E1|for

instance, when Eve has partial knowledge of the message. Let S2 = A(E2). Then it is

intuitively clear that the deterministic A will not output enough random bits and the

distributions of C and S2 will differ.

Formally, this can be seen by expanding the mutual information between the message

and the stegotext in two different ways. Since the encoder is deterministic and perfect,

we have H(S1) = H(E1) from expanding I(E1; S1). The same encoder applied to E2

also uniquely determines S2, and therefore H(S2) = H(E2) -H(E2/S2) <=H(E2) from

expanding I(E2; S2). Hence, H(S2) / H(E2) < H(E1) = H(S1) by the assumption

on E2, which implies that the distributions of S1 and S2 differ and this contradicts the

assumption that the stegosystem is perfect.

Let all random variables in

_{ }the model above be extended to stochastic processes and letn denote the number of repetitions. Assume that the covertext is generated by a stationary

information source. Hence, the normalized relative entropy between the covertext and stegotext

processes determines the security in cases where Eve is restricted to see a finite part of the

covertext sequence

.

Definition 2.

A stegosystem for stochastic processes with stationary covertext is called perfectly

secure on average (against passive adversaries) whenever

lim 1/n D(

_{PC}||_{PS}) = 0: n->∞

Analogously, a stegosystem for stochastic processes is called _-secure on average (against passive

adversaries) whenever)

lim 1/n D(

_{PC}||_{PS}) <= є n->∞

Notice that Alice is still either active or inactive during the entire experiment, and the

stegotext distribution will not be ergodic in general.

**Detection Performance**

This section analyzes Eve's capabilities of detecting an embedded message. Basic bounds on her

performance are obtained from the theory of hypothesis testing. A brief review of hypothesis

testing is given first.

**Hypothesis Testing**

Hypothesis testing is the task of deciding which one of two hypotheses H0 or H1 is the true

explanation for an observed measurement Q. In other words, there are two plausible probability

distributions, denoted by PQ0 and PQ1 , over the space Q of possible measurements. If H0 is

true, then Q was generated according to PQ0 , and if H1 is true, then Q was generated according

to PQ1 . A decision rule is a binary partition of Q that assigns one of the two hypotheses to each

possible measurement q 2 Q. The two errors that can be made in a decision are called a type I

error for accepting hypothesis H1 when H0 is actually true and a type II error for accepting H0

when H1 is true. The probability of a type I error is denoted by α , the probability of a type II

error by β .

A basic property in hypothesis testing is that deterministic processing cannot increase the

relative entropy between two distributions. For any function f : Q -> T , if T0 = f(Q0) and

T1 = f(Q1), then

D(Pt

_{0}||P_{T1}) <=D(P_{Q0}||P_{Q1})Let d(α,β) denote the binary relative entropy of two distributions with parameters (α; 1- α)

and (1 - β; β), respectively,

d (α,β) = α log α /(1- β)+ (1 - α) log(1 - α)/ β

Because deciding between H

_{0}and H_{1 }is a special form of processing by a binary function,the type I and type II error probabilities _ and _ satisfy

d( α,β) <= D(P

_{Q0}||P_{Q1}):This bound is typically used as follows: Suppose that D(P

_{Q0}||P_{Q1}):< ∞ and that an upperbound α on the type I error probability is given. Then this yields a lower bound on the type II

error probability β. The first one connects entropy, relative entropy, and the size of the alphabet for any rand`om variable X 2 X: If P

_{U}is the uniform distribution over X, thenH(X) + D(P

_{X}||P_{U}) = log |X |The second property states that conditioning on derived information (side information, which

has the same distribution in both cases) can only increase the discrimination: If there is a

deterministic function f : Q -> V such that the random variables f(Q0) and f(Q1) have the

same distribution P

_{V}, thenD(Pq

_{0}||Pq_{1}) <=D(P_{Q0\v}||P_{Q1\v})**Bounds for Secure Stegosystems**

Consider Eve's decision process for detecting a hidden message in a stegosystem as a hypothesis

testing problem. Any particular decision rule is a binary partition (C0; C1) of the set C of possible

covertexts. She decides that Alice is active if and only if the observed message c is contained in

C1. Ideally, she would always detect a hidden message. (But this occurs only if Alice chooses

an encoding such that valid covertexts and stegotexts are disjoint.) If Eve fails to detect that

she observed stegotext S, she makes a type II error; its probability is denoted by β

The opposite error, which usually receives less attention, is the type I error: Eve decides

that Alice sent stegotext although it was a legitimate cover message C; its probability is denoted

by β. An important special case is that Eve makes no type I error and never accuses Alice of

sending hidden information when she is inactive (α = 0). Such a restriction might be imposed

on Eve by external mechanisms, justi_ed by the desire to protect innocent users.

The deterministic processing property (1) bounds the detection performance achievable by

Eve. From (2) we obtain the following result.

Theorem 1. In a stegosystem that is Є-secure against passive adversaries, the probability β that the adversary does not detect the presence of the embedded message and the probability β

that the adversary falsely announces the presence of an embedded message satisfy

d (α,β) <=Є

In particular, if α = 0, then β≥ 2

^{-є}:In a perfectly secure system, we have D(

_{PC||PS}) = 0 and therefore_{PC}=_{PS}; thus, theobserved message does not give Eve any information about whether Alice is active or not.

**Secure Stegosystems**

According to the model, we obtain a secure stegosystem whenever the stegotext distribution

is close to the covertext distribution for an observer with no knowledge of the secret key. The

embedding function depends crucially on the covertext distribution. We assume in this section

that the covertext distribution is known to the users Alice and Bob, and describe how secure

stegosystems can be constructed.

One-time pad. As already mentioned in the introduction, the one-time pad is a perfectly

secure stegosystem whenever the covertext consists of uniformly random bits. Assuming such

a covertext would be rather unrealistic, but in order to illustrate the model, we briey describe

this system formally.The one-time pad stegosystem is equivalent to the basic scheme of visual cryptography.This technique hides a monochrome picture by splitting it into two random

layers of dots. When these are superimposed, the picture appears. Using a slight modification

of the basic scheme, it is also possible to produce two innocent-looking pictures such that both

of them together reveal a hidden embedded message that is perfectly secure against an observer

who has only one picture.

**General distributions**.

For arbitrary covertext distributions, we now describe a system that embeds a one-bit message in the stegotext. The extension to larger message spaces is straightforward, but might require even more detailed knowledge of the covertext distribution.

**Theorem 2.**

The one-bit stegosystem has security

1/ln 2( Pr[C є C0] - Pr[C є C1]

^{2}against passive adversaries.

In general, determining the optimal embedding function from a covertext distribution is an

NP-hard combinatorial optimization problem. We can also find efficient embedding

algorithmsľ for the above one-bit stegosystem that achieves perfect security whenever possible.

**Universal Stegosystems**

The stegosystems described above require that the covertext distribution is known to the users

Alice and Bob. This seems not realistic for many applications. In this section, we describe

a method for obtaining a universal stegosystem where such knowledge is not needed. The

idea is that Alice and Bob exploit a covertext signal that is produced by an infinite sequence

of independent repetitions of the same experiment. Alice applies a universal data compression

scheme to compute an approximation of the covertext distribution. She then produces stegotext

with the approximate distribution of the covertext from her own randomness and embeds a

message into the stegotext using the method of the one-time pad. Eve may have complete

knowledge of the covertext distribution, but as long as she is restricted to observe only a finite

part of the covertext sequence, this stegosystem achieves perfect average security asymptotically.

There are many practical universal data compression algorithms and most encoding

methods for perceptual data rely on them in some form. It is conceivable to combine them

with

_{the}universal stegosystem for embedding messages in perceptual coverdata such as audioor video.

**A universal stegosystem**.

Suppose the covertext, which is given as input to Alice , consists

of n independent realizations of a random variable X.

_{The}universal stegosystem applies theabove data compression scheme to the covertext. If Alice is active, she generates stegotext

containing hidden information using the derived encoder and her private random s

_{the}ce.More precisely, given ľ > H(X) and n, A maps

_{s}the incoming covertext X^{n}to its encodingZ = E(XAlice 's random

^{n}). W.l.o.g. assume the output of the encoder is a binary m-bit string for m = [log |A|] and the shared key K is a uniformly random l-bit string with l<=m.Furthermore, let the message E to be embedded be an `-bit string and lets

_{the}ce R generate uniformly random (m -l)-bit strings.If E outputs Z = Є(XAlice sends S = X

^{n}),^{n}and no message is embedded. Otherwise, shecomputes the m-bit string

T = (E Ø K)||R;

where k denotes the concatenation of bit strings, and sends S = D(T).

Bob extracts the embedded message from the received stegotext S as follows. If E(S) = ð,

he declares a transmission failure and outputs a default value. Otherwise, he outputs

E^ = E(S

_{)[1…,l`]}Ø K;where Z[1…,l] stands for the prefix of length l of a binary string Z.

Note that this stegosystem relies on Alice 's private random source in a crucial way.

**Conclusion:**

The approach of this paper is to view steganography with a passive adversary as a problem of

hypothesis testing because the adversary succeeds if he merely detects the presence of hidden

information.Although these conditions may be necessary, they are not sufficient to guarantee undetectable communication, as can be seen from the following insecure stegosystem.

## No comments:

## Post a Comment

leave your opinion