### DATA SECURITY An Information-Theoretic Model for Steganography

Stenography goal is to conceal the presence of a secret message within  an innocuous-looking communication.In other words, steganography consists of hiding a secret hidden text message within a public cover text to obtain a stego text in such a way that any observer (except, of cthese, the intended recipient) is unable to distinguish between a covertext with a hiddentext and one without.  The model is perhaps best illustrated by Simmons”Prisoners' Problem". Alice and Bob are in jail,locked up in separate cells far apart from each other, and wish to devise an escape plan. Theyare allowed to communicate by means of sending authenticated messages via trusted ctheiers,provided they do not deal with escape plans. The ctheiers are agents of the warden Eve (the adversary) and will leak all communication to her.
If Eve detects any sign of conspiracy, she will ution, except that it is generated from independently repeated experiments.

This paper views steganography as information hiding with a passive adversary thwart the escape plans by transferring both prisoners to high-security cells from which nobody has ever escaped. Alice and Bob are well aware of these facts, so that before getting locked up, they have shared a few secret codewords that they are now going to exploit for adding a hidden meaning to their seemingly innocent messages. Alice and Bob succeed if they can exchange information allowing them to coordinate their escape and Eve does not become suspicious.
Of cthese, Eve knows what a \legitimate" conversation among prisoners is like, and she
also knows about the tricks that prisoners apply to embed a hidden meaning in a seemingly
innocent message. Following the approach of information theory, we capture this knowledge
by a probabilistic model, and view Eve's task of detecting hidden messages as a problem of
hypothesis testing.

We consider only the scenario where Alice sends a message to Bob. Eve
models an innocent message from Alice as a covertext C with probability distribution PC. A
message with embedded hidden information is called stegotext and denoted by S.
In general, Eve might not know the process by which stegotext is generated; thus, Eve's
task would be to decide whether the observed message has been produced under the known
covertext distribution or under another distribution unknown to her. However, we adopt a
stronger model and assume that Eve has complete knowledge of the embedding and extraction
processes in a steganographic system, except for a short secret key K shared by Alice and Bob.
Upon observing the message sent by Alice, Eve has to decide whether it is covertext or
stegotext. This is the problem of choosing one of two different explanations for observed data,
known as  ” hypothesis testing" in information theory Recall that Eve knows the probability distributions of covertext and stegotext, and draws her conclusion about the observed
message only from this knowledge. However, Eve does not know if Alice produced the message
according to Pcor PS, nor is she willing to assign any a priori probabilities to these two explanations.
We define the security of the steganographic system used by Alice and Bob (or stegosystem
for short) in terms of the relative entropy D(PC||PS) between PC and PS, which yields quantitative
bounds on Eve's detection performance. If the covertext and the stegotext distributions
are equal, D(PC||PS) = 0 and we have a perfectly secure stegosystem; Eve cannot distinguish
the two distributions and has no information at all about the presence of an embedded message.
This parallels Shannon's notion of perfect secrecy for cryptosystems .
Note how the model differs from the scenario sometimes considered for steganography, where
Alice uses a covertext that is known to Eve and modifies it for embedding hidden information.
Such schemes can only o_er protection against adversaries with limited capability of comparing
the modi_ed stegotext to the covertext. For instance, this applies to the popular use of
steganography on visual images, where a stegoimage may be perceptually indistinguishable from
the coverimage for humans, but not for an algorithm with access to the coverimage.
Limitations. How well the information-theoretic model covers real-world steganographic applications depends crucially on the assumption that there is a probabilistic model of the covertext.
Moreover, the users of a stegosystem need at least some knowledge about the covertext
distribution, as will become clear in the description of the stegosystems.
Probabilistic modeling of information is the subject of information theory, originating with
Shannon's pioneering work. Information theory is today regarded as the \right" approach
to quantifying information and to reasoning about the performance of communication channels.
This confidence in the theory stems from many practical coding schemes that have been built
according to the theory and perform well in real applications.
But the situation in steganography is more involved, since even a perfectly secure stegosystem
requires that the users and the adversary share the same probabilistic model of the covertext.
For instance, if the covertext distribution consists of uniformly random bits, then encrypting
a message under a one-time pad results in a perfectly secure stegosystem according to the
notion of security. But no reasonable warden would allow the prisoners to exchange randomlooking messages in the Prisoners' Problem, since the use of encryption is clearly forbidden! Thus, the validity of a formal treatment of steganography is determined by the accuracy of a probabilistic model for the real data. Assuming knowledge of the covertext distribution seems to render the model somewhat unrealistic for the practical purposes of steganography. But what are the alternatives? Should we rather study the perception and detection capabilities of the human cognition since most coverdata (images, text, and sound) is ultimately addressed to humans? Viewed in this way, steganography could fall entirely into the realms of image, language, and audio processing. However, it seems that an information-theoretic model, or any other formal approach, is more useful for deriving statements about the security of steganography schemes and a formal security notion is one of the main reasons for introducing a mathematical model of steganography.
So far most formal models of information hiding address the case of active adversaries. This problem is different from the one considered here since the existence of a hidden message is typically known publicly, as for example in copyright protection schemes. Information hiding with active adversaries can be divided into watermarking and fingerprinting. Watermarking supplies digital objects with an identification of origin; all objects are marked in the same way. Fingerprinting, conversely, attempts to identify individual copies of an object by means of embedding a unique marker in every copy that is distributed to a user. It is  proposed a slightly different terminology and define watermarking in general as hiding covertext-dependent information, regardless of the adversary model. As most objects to be protected by watermarking consist of audio, image, or video data, these domains have received the most attention so far. A large number of hiding techniques and domain-specific models have been developed for robust, imperceptible information hiding .

Ettinger  models active adversaries with game-theoretic techniques: A general model for information hiding with active adversaries was formulated by Mittelholzer, but its hiding property also relies on the similarity of stegodata and coverdatain terms of a perceptually motivated distortion measure. Zollner use information theoretic methods to conclude that the embedding process in steganography must involve uncertainty. A complexity-theoretic model for steganography, which shares the focus on the indistinguishability of the stegotext from a given covertext distribution, has recently been proposed by Hopper, Langford, and van Ahn.
Another related work was by Maurer  on unconditionally secure authentication in
cryptography, which demonstrates the generality of the hypothesis testing approach.
A large number of techniques for undetectable communication originate in the military
domain, where they have found many applications. This includes radar, spread-spectrum communication, and covert channels. It is likely that the model is also applicable to those areas.

Model

Preliminaries.
We define the basic properties of a stegosystem using the notions of entropy,
mutual information, and relative entropy [2, 3].
The entropy of a probability distribution PX over an alphabet X is defined as
H(X) = -∑ PX(x) log PX(x):
xєX
When X denotes a random variable with distribution PX, the quantity H(X) is simply called
the entropy of the random variable X (with the standard convention 0 log 0 = 0 and logarithms
to the base 2). Similarly, the conditional entropy of a random variable X given a random
variable Y is
H(X|Y ) = ∑ PY (y)H(X|Y = y);
yєY
where H(X|Y = y) denotes the entropy of the conditional probability distribution P(X|Y) =y.
The entropy of any distribution satisfies 0 <= H(X)<= log |X |, where |X | denotes the cardinality
of X.
The mutual information between X and Y is defined as the reduction of entropy that
Y provides about X, i.e., I(X; Y ) = H(X)- H(X|Y ). It is symmetric in X and Y , i.e.,
I(X; Y ) = I(Y ;X), and always non-negative.
The relative entropy or discrimination between two probability distributions PQ0 and PQ1
is defined as
D(PQ0||PQ1) = ∑ PQ0(q) log PQ0(q)/PQ1(q)
qєQ
(with 0 log 0/0 = 0 and p log p/0 = ∞ if p > 0).
The conditional relative entropy between PQ0 and PQ1 given a random variable V de_ned
in both probability spaces is
D(PQ0||V ||PQ1|V ) =∑ PV (v) ∑ PQ0|V =v(q)  log( PQ0|V =v(q)  )/ PQ1|V =v(q)
vєV         qєQ

The relative entropy between two distributions is non-negative and it is equal to 0 if and
only if the distributions are equal. Although relative entropy is not a true distance measure in
the mathematical sense, because it is not symmetric and does not satisfy the triangle inequality,
it may be useful to think of it as a distance.

Stegosystems.

We adopt the standard terminology of information hiding  for the model of a stegosystem. There are two parties, Alice and Bob, who are the users of the stegosystem. Alice wishes to send an innocent-looking message with a hidden meaning over a public channel to Bob, such that the presence of hidden information goes unnoticed by a third party, the adversary Eve, who has perfect read-only access to the public channel.
Alice operates in one of two modes. In the first case, Alice is inactive and sends an innocent,
legitimate message containing no hidden information, called covertext and denoted by C; it is
generated according to a distribution PC known to Eve. One may imagine that the covertext
is generated by a sthece to which only Alice has access. In the second case, Alice is active

.             When Alice is active, the switch is in position 1 and she outputs stegotext S, which contains a hidden message E and was produced using knowledge of the secret key K shared by Alice and Bob. When Alice is inactive, the switch is in position 0, no embedding occurs, and Alice outputs covertext C sends stegotext S computed from an embedding function A, containing an embedded message E for Bob. The message is a random variable drawn from a message space E. Alice's embedding algorithm may access a private random sthece R; we assume that R is independent of E and C. The embedding function A and the distributions of C, E, and R are known to Eve. For the moment, we assume that Alice and Bob know the covertext and stegotext distributions; this will be relaxed later. In addition, Alice and Bob share a secret key K, which is unknown to Eve. The key has been chosen at random and communicated over a secure channel prior to the use of the stegosystem| in any case before the message E that Alice wants to communicate to Bob becomes known.Thus, we assume that K is independent of E, R, and C.
Figure 1 shows the operation of a stegosystem in more detail. The switch at Alice's end of
the public channel determines if Alice is active or not.
In the first case (switch in position 0), Alice is inactive and sends only legitimate covertext
C to Bob over the public channel. The covertext is generated by a covertext sthece;
no embedding takes place. The adversary Eve observes C.
In the second case (switch in position 1), Alice is active and is given a message E that she
\embeds" into the given covertext C using the embedding function A. More precisely, A is an algorithm that takes C, the shared key K, and private randomness R as inputs and
produces stegotext S. The stegotext is sent to Bob over the public channel. The adversary
Eve and the receiver Bob observe S. Using his extracting algorithm B, Bob extracts a
decision value ^ E from S and K, in the hope that this gives him some information about E.
The embedding algorithm may exploit knowledge about the covertext distribution. However,
we require that given a covertext distribution, the embedding function A is universal, i.e.,
works for any distribution of E over E. Thus, A must not depend on knowledge of the message
distribution PE. This makes the stegosystem robust in the sense that the legitimate users do
not have to worry about the adversary's knowledge of E. Such a precaution is often made in
cryptographic contexts.Furthermore, we assume that Bob has an oracle that tells him if Alice is active or not.This is a strong assumption, and we make it here in order to focus on the security properties of a stegosystem in general. Removing it does not hurt the security of a stegosystem with respect to Eve's detection capability|if Bob was trying to extract an embedded message from
the covertext when Alice is inactive, he would merely obtain garbage. As discussed below, the
oracle also does not open the way to trivial stegosystems. Later on in Example 2, we discuss
a more practical class of stegosystems, in which this assumption is not necessary, because Bob
may detect the presence of Alice's message from redundancy in the embedded information.
From the point of view of Eve, who does not know if Alice is active, the two cases above
look similar: she observes data that is sent from Alice to Bob over the public channel. Let
M denote the message on the channel. If Alice is active, then M = S and M = C if Alice is
inactive. Thus, M was generated either according to PC or according to PS; these are the two
explanations that Eve has for the observation. Eve, upon observing the message sent by Alice, has to decide whether it was covertext C or stegotext S, i.e., whether Alice is inactive or active. We quantify the security of a stegosystem in terms of the relative entropy between PC and PS.

Definition 1.
Fix a covertext distribution C and a message space E. A pair of algorithms
(A; B) is called a stegosystem if there exist random variables K and R as described above such
that for all random variables E over E with H(E) > 0, it holds I(E^;E) > 0.
Moreover, a stegosystem is called perfectly secure (against passive adversaries) if
D(PC||PS) = 0;
and a stegosystem is called є -secure (against passive adversaries) if
D(PC||PS)<=є

This model describes a stegosystem for one-time use, where Alice is always active or not.
If Alice sends multiple dependent messages to Bob and at least one of them contains hidden
information, she is considered to be active at all times and S consists of the concatenation of
all her messages.

Some remarks on the definition.

1. The condition in the definition of a stegosystem, I(E^;E) > 0, implies that a stegosystem
is “useful" in the sense that Bob obtains at least some information about E. We chose
not to model “useless" stegosystems.
2. It would be natural to require explicitly that a perfectly secure stegosystem provides also
perfect secrecy for E in the sense of Shannon by demanding that S and E are statistically
independent However, this is not necessary since we required Alice's embedding algorithm to work for any distribution PE, without depending on the distribution itself. This guarantees perfect secrecy for E against Eve as follows. Fix a covertext distribution and an embedding function A. For all possible distributions of E, A must produce S with the same distribution as C. Since a
concrete message value corresponds to a particular distribution of E but the distribution
of S is the same for all values, S is statistically independent from E.
Analogously, we do not impose a secrecy constraint on E for stegosystems with _ > 0. The
implications for the secrecy of E are more involved and not investigate here; however, it is
easy to construct stegosystems with perfect secrecy also in this case
3. In the definition of a stegosystem, Bob knows from an oracle if Alice is active or not.
Hence, one might be tempted to construct the following “perfect" stegosystem that exploits
this knowledge for transmitting hidden information without using a shared secret
key. W.l.o.g. consider a deterministic embedding algorithm A consisting of an ideal sthece
encoder that manages to compress some message E1 into stegotext S1, which consists of
independent and uniformly random bits. If the given C is a sequence of independent
and uniformly random bits of the same length, the two distributions are the same and
Eve cannot distinguish a compressed message from covertext. In this case, Bob obtains
E1 without any secret key. His advantage to distinguish stegotext from covertext stems
entirely from the oracle, and one might conclude that assuming such an oracle allows for
trivial stegosystems.
However, this conclusion does not hold because the described stegosystem is not perfectly
secure according to Definition 1. Recall that A is required to work for any message
distribution, so it must work also for some E2 with strictly less entropy than E1|for
instance, when Eve has partial knowledge of the message. Let S2 = A(E2). Then it is
intuitively clear that the deterministic A will not output enough random bits and the
distributions of C and S2 will differ.
Formally, this can be seen by expanding the mutual information between the message
and the stegotext in two different ways. Since the encoder is deterministic and perfect,
we have H(S1) = H(E1) from expanding I(E1; S1). The same encoder applied to E2
also uniquely determines S2, and therefore H(S2) = H(E2) -H(E2/S2) <=H(E2) from
expanding I(E2; S2). Hence, H(S2) / H(E2) < H(E1) = H(S1) by the assumption
on E2, which implies that the distributions of S1 and S2 differ and this contradicts the
assumption that the stegosystem is perfect.
Let all random variables in  the model above be extended to stochastic processes and let
n denote the number of repetitions. Assume that the covertext is generated by a stationary
information source. Hence, the normalized relative entropy between the covertext and stegotext
processes determines the security in cases where Eve is restricted to see a finite part of the
covertext sequence
.
Definition 2.
A stegosystem for stochastic processes with stationary covertext is called perfectly
secure on average (against passive adversaries) whenever
lim     1/n D(PC||PS) = 0:
n->∞

Analogously, a stegosystem for stochastic processes is called _-secure on average (against passive
lim      1/n D(PC||PS )    <=   є
n->∞

Notice that Alice is still either active or inactive during the entire experiment, and the
stegotext distribution will not be ergodic in general.

Detection Performance

This section analyzes Eve's capabilities of detecting an embedded message. Basic bounds on her
performance are obtained from the theory of hypothesis testing. A brief review of hypothesis
testing is given first.

Hypothesis Testing

Hypothesis testing is the task of deciding which one of two hypotheses H0 or H1 is the true
explanation for an observed measurement Q. In other words, there are two plausible probability
distributions, denoted by PQ0 and PQ1 , over the space Q of possible measurements. If H0 is
true, then Q was generated according to PQ0 , and if H1 is true, then Q was generated according
to PQ1 . A decision rule is a binary partition of Q that assigns one of the two hypotheses to each
possible measurement q 2 Q. The two errors that can be made in a decision are called a type I
error for accepting hypothesis H1 when H0 is actually true and a type II error for accepting H0
when H1 is true. The probability of a type I error is denoted by α , the probability of a type II
error by β .
A basic property in hypothesis testing is that deterministic processing cannot increase the
relative entropy between two distributions. For any function f : Q -> T , if T0 = f(Q0) and
T1 = f(Q1), then
D(Pt0||PT1) <=D(PQ0||PQ1)
Let d(α,β) denote the binary relative entropy of two distributions with parameters (α; 1- α)
and (1 - β; β), respectively,
d (α,β)  = α  log α /(1- β)+ (1 - α) log(1 - α)/ β

Because deciding between H0 and H1 is a special form of processing by a binary function,
the type I and type II error probabilities _ and _ satisfy
d( α,β) <= D(PQ0||PQ1):
This bound is typically used as follows: Suppose that D(PQ0||PQ1):< ∞ and that an upper
bound α on the type I error probability is given. Then this  yields a lower bound on the type II
error probability β. The first one connects entropy, relative entropy, and the size of the alphabet for any random variable X 2 X: If PU is the uniform distribution over X, then
H(X) + D(PX||PU) = log |X |
The second property states that conditioning on derived information (side information, which
has the same distribution in both cases) can only increase the discrimination: If there is a
deterministic function f : Q -> V such that the random variables f(Q0) and f(Q1) have the
same distribution PV , then
D(Pq0||Pq1) <=D(PQ0\v||PQ1\v)

Bounds for Secure Stegosystems

Consider Eve's decision process for detecting a hidden message in a stegosystem as a hypothesis
testing problem. Any particular decision rule is a binary partition (C0; C1) of the set C of possible
covertexts. She decides that Alice is active if and only if the observed message c is contained in
C1. Ideally, she would always detect a hidden message. (But this occurs only if Alice chooses
an encoding such that valid covertexts and stegotexts are disjoint.) If Eve fails to detect that
she observed stegotext S, she makes a type II error; its probability is denoted by β
The opposite error, which usually receives less attention, is the type I error: Eve decides
that Alice sent stegotext although it was a legitimate cover message C; its probability is denoted
by β. An important special case is that Eve makes no type I error and never accuses Alice of
sending hidden information when she is inactive (α = 0). Such a restriction might be imposed
on Eve by external mechanisms, justi_ed by the desire to protect innocent users.
The deterministic processing property (1) bounds the detection performance achievable by
Eve. From (2) we obtain the following result.
Theorem 1. In a stegosystem that is Є-secure against passive adversaries, the probability β that the adversary does not detect the presence of the embedded message and the probability β
that the adversary falsely announces the presence of an embedded message satisfy
d (α,β) <=Є
In particular, if α = 0, then β≥ 2:
In a perfectly secure system, we have D(PC||PS) = 0 and therefore PC = PS; thus, the
observed message does not give Eve any information about whether Alice is active or not.

Secure Stegosystems

According to the model, we obtain a secure stegosystem whenever the stegotext distribution
is close to the covertext distribution for an observer with no knowledge of the secret key. The
embedding function depends crucially on the covertext distribution. We assume in this section
that the covertext distribution is known to the users Alice and Bob, and describe how secure
stegosystems can be constructed.
secure stegosystem whenever the covertext consists of uniformly random bits. Assuming such
a covertext would be rather unrealistic, but in order to illustrate the model, we briey describe
this system formally.The one-time pad stegosystem is equivalent to the basic scheme of visual cryptography.This technique hides a monochrome picture by splitting it into two random
layers of dots. When these are superimposed, the picture appears. Using a slight modification
of the basic scheme, it is also possible to produce two innocent-looking pictures such that both
of them together reveal a hidden embedded message that is perfectly secure against an observer
who has only one picture.

General distributions.
For arbitrary covertext distributions, we now describe a system that embeds a one-bit message in the stegotext. The extension to larger message spaces is straightforward, but might require even more detailed knowledge of the covertext distribution.

Theorem 2.

The one-bit stegosystem has security
1/ln 2( Pr[C є C0] - Pr[C є C1] 2

In general, determining the optimal embedding function from a covertext distribution is an
NP-hard combinatorial optimization problem. We can also find efficient embedding
algorithmsľ for the above one-bit stegosystem that achieves perfect security whenever possible.

Universal Stegosystems

The stegosystems described above require that the covertext distribution is known to the users
Alice and Bob. This seems not realistic for many applications. In this section, we describe
a method for obtaining a universal stegosystem where such knowledge is not needed. The
idea is that Alice and Bob exploit a covertext signal that is produced by an infinite sequence
of independent repetitions of the same experiment. Alice applies a universal data compression
scheme to compute an approximation of the covertext distribution. She then produces stegotext
with the approximate distribution of the covertext from her own randomness and embeds a
message into the stegotext using the method of the one-time pad. Eve may have complete
knowledge of the covertext distribution, but as long as she is restricted to observe only a finite
part of the covertext sequence, this stegosystem achieves perfect average security asymptotically.
There are many practical universal data compression algorithms and most encoding
methods for perceptual data rely on them in some form. It is conceivable to combine them
with the universal stegosystem for embedding messages in perceptual coverdata such as audio
or video.

A universal stegosystem.

Suppose the covertext, which is given as input to Alice, consists
of n independent realizations of a random variable X. The universal stegosystem applies the
above data compression scheme to the covertext. If Alice is active, she generates stegotext
containing hidden information using the derived encoder and her private random sthece.
More precisely, given ľ > H(X) and n, A mapss the incoming covertext Xn to its encoding
Z = E(Xn). W.l.o.g. assume the output of the encoder is a binary m-bit string for m = [log |A|]  and the shared key K is a uniformly random l-bit string with l<=m.Furthermore, let the message E to be embedded be an -bit string and let Alice's random
sthece R generate uniformly random (m -l)-bit strings.
If E outputs Z = Є(Xn), Alice sends S = Xn and no message is embedded. Otherwise, she
computes the m-bit string
T = (E Ø K)||R;
where k denotes the concatenation of bit strings, and sends S = D(T).
Bob extracts the embedded message from the received stegotext S as follows. If E(S) = ð,
he declares a transmission failure and outputs a default value. Otherwise, he outputs
E^ = E(S)[1…,l`]Ø K;
where Z[1…,l] stands for the prefix of length l of a binary string Z.
Note that this stegosystem relies on Alice's private random source in a crucial way.

Conclusion:
The approach of this paper is to view steganography with a passive adversary as a problem of
hypothesis testing because the adversary succeeds if he merely detects the presence of hidden
information.Although these conditions may be necessary, they are not sufficient to guarantee undetectable communication, as can be seen from the following insecure stegosystem.