The second invited talk at TCC 2012 was given by Dr. Sergey Yekhanin from the MIcrosoft Research. The talk was on Locally Decodable Codes (LCD), an important research topic in Coding theory as well in Cryptography. In the talk the speaker brilliantly explained the concept of LCD and its relation with one of the fundamental cryptographic primitives, namely Private Information Retrieval (PIR). This was followed by some technical constructions of the LCD codes.

To see the motivation of LCD codes, consider the well known problem of data storage, where we want to reliably store some data and at the same time, we would like the data to be readily available for the users. One simple and common approach would be to replicate the data at several places. This approach provides us with local recovery, as the data can be locally accessed by the users from any one of the several places, where the data is stored. But this will be an overkill for large sized data. An alternative approach would be to use erasure coding, where we encode the data using an erasure correcting code and store the various components of the codeword at various places. Though this approach is cheaper than the idea of replicating the whole data, it no longer provides us the local recovery property. This is because now to access the data, the users have to retrieve several components of the codeword to perform the decoding. What we would like now is to have an erasure code which can provide us with local decoding.

Consider the following example: let x be the message, consisting of 3 bits x

Another interesting property of the above LCD is the following: each component of the codeword occurs uniformly in the decoding tuples, corresponding to each message bit. That is, each component of the codeword occurs same number of times in the decoding tuples of the message bits. For example, consider the fourth component of the codeword, namely x

The important parameters of the LCD codes are the codeword length n and the query complexity r. It is desirable to have both these parameters as small as possible. However, there is a trade-off between these parameters and determining the true shape of this trade-off is a major open problem in this area.

Now lets see how the LCD codes can be used to design information theoretically secure PIR scheme in the multiple server settings. We will again consider the above example. Before that, let me informally state the problem of PIR: there is a public database, replicated across several servers and available for the user access. A user can access any data in the database by querying the server. However, we would like to preserve the user privacy in that each server should

be completely oblivious of the data accessed by the user (we assume that the servers do not collude with each other). This problem can be considered as a weaker form of another important cryptographic primitive, namely Oblivious Transfer (OT). In OT, we also require that the user should not get any information about the other data items which he has not accessed; where as in a PIR scheme, the user can retrieve other items along with the item that he wants to access.

We can design a 2 server PIR scheme using the above example: let x be the data base, containing 3 bits and let S

There are several interesting open problems in the area of LCD. The interested readers can refer to the PhD thesis of Sergey (recently published as a book).

To see the motivation of LCD codes, consider the well known problem of data storage, where we want to reliably store some data and at the same time, we would like the data to be readily available for the users. One simple and common approach would be to replicate the data at several places. This approach provides us with local recovery, as the data can be locally accessed by the users from any one of the several places, where the data is stored. But this will be an overkill for large sized data. An alternative approach would be to use erasure coding, where we encode the data using an erasure correcting code and store the various components of the codeword at various places. Though this approach is cheaper than the idea of replicating the whole data, it no longer provides us the local recovery property. This is because now to access the data, the users have to retrieve several components of the codeword to perform the decoding. What we would like now is to have an erasure code which can provide us with local decoding.

Consider the following example: let x be the message, consisting of 3 bits x

_{1, }x_{2 }and x_{3}. Suppose we encode the message into the 7 bit codeword C = (x_{1}, x_{2}, x_{3}, x_{1}⊕ x_{2}, x_{1}⊕ x_{3}, x_{2}⊕ x_{3}, x_{1}⊕ x_{2}⊕ x_{3}). This is nothing but the well known Hadamard code. An interesting feature of this code is that any message bit can be recovered with locality 2. For example, suppose we want to recover the bit x_{1}. We can recover x_{1}from the codeword C if at least one of the following 4 pairs of components of the codeword is not erased:- {x
_{1}} - {x
_{1}⊕ x_{2 , }x_{2}}_{ } - {x
_{1}⊕ x_{3 , }x_{3}}_{ } - {x
_{2}⊕ x_{3 , }x_{1}⊕ x_{2}⊕ x_{3}}

_{1}. Notice that the same holds for every other message bit. For example, the decoding tuples for x_{2}are:- {x
_{2}} - {x
_{1}⊕ x_{2 , }x_{1}}_{ } - {x
_{3}⊕ x_{2 , }x_{3}} - {x
_{1}⊕ x_{3 , }x_{1}⊕ x_{2}⊕ x_{3}}

_{3}are:- {x
_{3}} - {x
_{1}⊕ x_{3 , }x_{1}} - {x
_{3}⊕ x_{2 , }x_{2}} - {x
_{1}⊕ x_{2 , }x_{1}⊕ x_{2}⊕ x_{3}}

Another interesting property of the above LCD is the following: each component of the codeword occurs uniformly in the decoding tuples, corresponding to each message bit. That is, each component of the codeword occurs same number of times in the decoding tuples of the message bits. For example, consider the fourth component of the codeword, namely x

_{1}⊕ x_{2 . }It occurs in the decoding tuples of x_{1}once, in the decoding tuples of x_{2 }once and in the decoding tuples of x_{3}once. The same holds for every other component of the codeword. This property of the LCD codes, called as the**smoothness**property makes it an important tool for cryptographic primitives like PIR.The important parameters of the LCD codes are the codeword length n and the query complexity r. It is desirable to have both these parameters as small as possible. However, there is a trade-off between these parameters and determining the true shape of this trade-off is a major open problem in this area.

Now lets see how the LCD codes can be used to design information theoretically secure PIR scheme in the multiple server settings. We will again consider the above example. Before that, let me informally state the problem of PIR: there is a public database, replicated across several servers and available for the user access. A user can access any data in the database by querying the server. However, we would like to preserve the user privacy in that each server should

be completely oblivious of the data accessed by the user (we assume that the servers do not collude with each other). This problem can be considered as a weaker form of another important cryptographic primitive, namely Oblivious Transfer (OT). In OT, we also require that the user should not get any information about the other data items which he has not accessed; where as in a PIR scheme, the user can retrieve other items along with the item that he wants to access.

We can design a 2 server PIR scheme using the above example: let x be the data base, containing 3 bits and let S

_{1}and S_{2}be the servers. Each server applies the LCD encoding on x and stores the codeword. Now suppose a user wants to access a particular data bit, say the bit x_{3 }. The user randomly selects one of the 4 decoding tuples for retrieving x_{3}. For example, suppose he selects the tuple {x_{3}⊕ x_{2 , }x_{2}}. Then he queries the server S_{1}for the component x_{3}⊕ x_{2}and the server S_{2}for the component x_{2}. Once he obtains them, he can obtain x_{3}(notice that in this process he also obtains x_{2}but that is allowed in PIR). On the other hand, each server has no information about the data bit that the user has retrieved. From the view point of S_{1}, the component x_{3}⊕ x_{2}might have been queried for x_{1}or x_{2}or x_{3}with the same probability. Similarly, from the view point of S_{2}, the component x_{2}might have been queried for x_{1}or x_{2}or x_{3}with the same probability.There are several interesting open problems in the area of LCD. The interested readers can refer to the PhD thesis of Sergey (recently published as a book).

## No comments:

## Post a Comment