Enhancing the performance of energy harvesting wireless communications using optimization and machine learning
The motivation behind this thesis is to provide efficient solutions for energy harvesting communications. Firstly, an energy harvesting underlay cognitive radio relaying network is investigated. In this context, the secondary network is an energy harvesting network. Closed-form expressions are derived for transmission power of secondary source and relay that maximizes the secondary network throughput. Secondly, a practical scenario in terms of information availability about the environment is investigated. We consider a communications system with a source capable of harvesting solar energy. Two cases are considered based on the knowledge availability about the underlying processes. When this knowledge is available, an algorithm using this knowledge is designed to maximize the expected throughput, while reducing the complexity of traditional methods. For the second case, when the knowledge about the underlying processes is unavailable, reinforcement learning is used. Thirdly, a number of learning architectures for reinforcement learning are introduced. They are called selector-actor-critic, tuner-actor-critic, and estimator-selector-actor-critic. The goal of the selector-actor-critic architecture is to increase the speed and the efficiency of learning an optimal policy by approximating the most promising action at the current state. The tuner-actor-critic aims at improving the learning process by providing the actor with a more accurate estimation about the value function. Estimator-selector-actor-critic is introduced to support intelligent agents. This architecture mimics rational humans in the way of analyzing available information, and making decisions. Then, a harvesting communications system working in an unknown environment is evaluated when it is supported by the proposed architectures. Fourthly, a realistic energy harvesting communications system is investigated. The state and action spaces of the underlying Markov decision process are continuous. Actor-critic is used to optimize the system performance. The critic uses a neural network to approximate the action-value function. The actor uses policy gradient to optimize the policy's parameters to maximize the throughput.