Minimum cost content distribution using network coding: Replication vs. coding
Large scale content distribution over the internet has been the focus of numerous studies in recent years. In the traditional server-client model, the server may suffer from overload when a popular file stored at the server is frequently requested. In order to reduce the cost at servers and decrease the retrieval time for clients, distributed storage solutions that operate by dividing the file into pieces and placing copies of the pieces (replication) or coded versions of the pieces (coding) at multiple source nodes have been proposed.
Network coding has also been used in large content distribution. In this work, we consider multicasting a file that can be broken into small pieces to multiple clients over a network with network coding. The network contains a set of source nodes that can store either subsets or coded version of the pieces of the file. We are interested in finding the optimal storage capacity and flows over the edges for the subset case and the coded case, respectively, such that the joint cost of transmission at edges and storage at sources is minimized. We provide succinct formulations of the corresponding optimization problems by using information measures. By the insight gained from the two formulations, a gap linear program which can compute the cost gap between the subset case and the coded case is formulated. A greedy algorithm is developed to find a suboptimal solution of the gap LP. In particular, we show that when there are two source nodes, there is no loss in considering subset sources. Furthermore, in the case of three source nodes, we derive a tight upper bound on the cost gap between the two cases. Algorithms for determining the content of the source nodes are also provided.