Reinforcement learning (RL), is a form of ML wherein a virtual agent tries to learn how to interact with its surroundings in such a way that it can achieve the maximum reward from it for a certain set of actions.
Let's try to understand this with a small example—say you build a robot that plays darts. Now, the robot will get a maximum reward only when it hits the center of the dartboard. It begins with a random throw of dart and lands on the outermost ring. It gets a certain amount of points, say x1. It now knows that throwing near that area will yield it an expected value of x1. So, in the next throw, it makes a very slight change of angle and luckily lands in the second outermost right, fetching it x2 points. Since x2 is greater than x1, the robot has achieved a better result and it will learn to throw nearby this area in the future. If the dart had landed even further out than the outermost ring, the robot would keep throwing it near the first throw that it made until it got a better result.
Over several such trials, the robot keeps learning the better places to throw and makes small detours from those positions until it gets the next better place to throw at. Eventually, it finds the bull's eye and meets the highest points every time.
In the preceding example, your robot is the agent who is trying to throw a dart at the dartboard, which is the environment. Throwing the dart is the action the agent performs on the environment. The points the agent gets act as the reward. The agent, over multiple trials, tries to maximize the reward that it gets by performing the actions.
Some well-known RL algorithms are Monte Carlo, Q-learning, and SARSA.