Whether we are training a horse to do a canter pirouette, a dog to sit, a child to be a civilized citizen, or a smoker to quit smoking, we employ the primary principles of Classical and Operant conditioning. Whether we are intentionally teaching our horses something or not, they are always learning. Most trainers believe that they train their horses primarily with positive reinforcement (Warren-Smith & McGreevy, 2008); in fact, negative reinforcement is the mainstay of equine training, and positive reinforcement is rarely used. We could be using this learning tool a great deal more than we do, and building a better relationship with our horses in the process.
Undoubtedly, negative reinforcement will always play a central role in equine training. Not to be confused with punishment (which seeks to decrease undesired behaviours), negative reinforcement makes a desired behaviour more likely to reoccur by removing an unpleasant stimulus (see sidebar). Horses, as prey animals, have evolved to escape aversive events, making them uniquely evolutionarily programmed to avoid the discomfort of pressure from bit, leg, spur, or seat. Negative reinforcement, which rewards the desired behaviour by removing that pressure, forms the foundation of control pivotal to riding a horse under saddle.
However, research indicates that positive reinforcement may sometimes be more effective than negative reinforcement, and holds clear benefits for equine well-being. Across many research studies, most horses, whether trained with positive or negative reinforcement, learn tasks within the required time frame. However, horses trained with positive reinforcement learn more quickly, retain the learned tasks longer, experience less stress, react to humans more positively, and are able to generalize this training across trainers, novel tasks, and over long periods of time (eg. Sankey, 2010). Human research finds similar results in children, students, employees, and managers. People, like horses, respond more positively to rewards (such as praise, favourable performance evaluations, and raises), than they do to the termination of unpleasant events.
“Clicker Training,” so named for the small handheld device that makes a distinctive clicking sound and signals to the conditioned animal that a reward is coming, is simply the marketing term for positive reinforcement training. Clicker training is based on three basic premises from psychological learning theory – positive reinforcement, classical conditioning, and shaping.
I: Positive Reinforcement
The concept of positive reinforcement is so simple as to seem self-evident; if you want to see a behaviour again, reward it. For example, if we reward a horse for standing quietly while we get on, we make it more likely that he will stand quietly in the future.
II: Classical Conditioning:
We can imbue a previously neutral stimulus (an event that is neither rewarding nor punishing, such as the sound of a clicker) with a positive value by consistently pairing it with something intrinsically rewarding (such as food). Establishing this association is always the first lesson in positive reinforcement training.
There exists a misconception that feeding horses treats by hand will make them aggressive. However, if the horse is never reinforced for nuzzling or grabbing for treats, positive reinforcement can eliminate nippiness rather than induce it.
Equine trainer Shawna Karrasch outlines a first session that establishes the click = reward association while training the horse to keep away from the treat bucket. This may begin with ignoring the horse for pushy behaviours, but clicking and rewarding any small movement away from the treat bucket. By continuing to click and reward the desired behaviour and ignoring ‘mugging’ behaviours, we soon see the horse voluntarily turning away from the treat bucket to get his reward. Surprisingly quickly the horse learns that: 1) a click means good things will follow; and 2) maintaining a respectful distance gets you treats, and mugging behaviours do not.
The term “clicker training” places the emphasis on the clicker. However, any distinct audible stimulus can become a “secondary reinforce” (something that has value because of a learned association) provided it is predictably paired with a “primary reinforce” (something that has inherent value on its own). If we repeatedly said “Eric Lamaze” reliably followed by a food reward, “Eric Lamaze” would work as a secondary reinforcer to mark a desired behaviour.
Secondary reinforcers become particularly useful when training under saddle. Karrasch calls the click the ‘bridge signal’ as it bridges the gap between the exact behaviour we are rewarding and the administration of the reward. If we rewarded a particularly good jumping effort by stooping forward to offer a carrot after landing, the behaviour we would see repeated would be the horse stopping and turning his head, since this is the behaviour that immediately preceded the reward. However, with a horse previously conditioned to the click = reward association, a click at the peak of the jump marks the exact behaviour we want. As the association becomes stronger and the response more reliable, we can gradually lengthen the delay between click and reward. Eventually the secondary reinforcer becomes so strongly associated with something pleasurable that it becomes reinforcing in its own right – much like money does for us.
Since most behaviours we want horses to perform do not occur spontaneously (such as leaping over obstacles and dancing to music), we use ‘shaping’ to reward closer and closer approximations of the eventual behaviour we want. Karrasch worked with Beezie and John Madden to effectively help the jumper Judgement overcome his fear of jumping water by training him to respond to the clicker, systematically clicking and rewarding closer and closer approaches to, and eventually over, a water obstacle. Judgement went on to win over $1.5 million in his career.
Positive reinforcement trainers such as Karrasch and Pryor stress the importance of making it easy for the horse to succeed. This provides more opportunities for reinforcing the desired response, minimizes confusion by reducing or eliminating the wrong response, and reliably establishes the new behaviour more efficiently. Make shaping increments small and when you change the context, relax the criteria. For example, a horse that stands unattended for three minutes at home should not be held to the same criteria on day one of a horse show. If behaviour regresses, back up to where your horse can reliably succeed, and reshape. If it isn’t working, shift tactics. There are many shaping paths towards the desired behaviour.
Once the first clicker training lessons have established a strong click = reward association, positive reinforcement principles can be generalized to any behaviour we want to see more often. For example, we could reduce tranquilizer use and injuries (both horse and human) by training a rehabbing horse to lead quietly regardless of how explosive he may be feeling. After a few short training sessions with my hot, rehabbing dressage horse, he learned that the benefits of walking calmly were worth the emotion regulation it took to do it – walking and trotting in hand in varied contexts without incident. I simply rewarded the behaviour I wanted to see, while incrementally upping the ante.
Positive reinforcement can be used to set up new reinforcement histories to rehabilitate horses with problem behaviours or phobias. Just as Karrasch worked with Judgement to overcome his fear of water, a horse can be gradually conditioned to tolerate a feared object (clippers, trailers, needles, spray bottles, even farriers and vets) by rewarding small shaped increments. Since the desired behaviour is a non-response, we need to begin the shaping process there. If the clippers need to be running 30 metres from the horse to elicit a non-response, that is where we start, gradually bringing the feared object closer and rewarding the horse for his calm behaviour. If he becomes alarmed, rewind and reward the calm behaviour. In small increments the horse can learn to tolerate the feared object and discover that good things happen when these monsters are nearby.
Sometimes I hear owners say, ‘You’ll never cure my horse. He is so bad he starts up when he hears the vet truck!’ However, horses with these long association histories provide equally long and rich opportunities to shape a desired behaviour. This long build strongly establishes the new desired behaviour and makes it particularly resistant to unraveling.
Teaching separation tolerance
When two horses become excessively dependent, the typical solution is to separate them and wait out their whinnies until they eventually give up. However, this arguably cruel, cold-turkey isolation does not train the horse to handle separation. Now hyper-vigilant about the unpredictable and absolute disappearance of a friend, the anxious horse invariably makes an immediate new friend, the process is repeated, and his anxiety escalates with each new attachment.
A more effective and longer-term strategy is to train the horse to tolerate incrementally longer periods of separation. By starting the shaping process at the point before the horse becomes anxious (which may be when someone enters the adjacent horse’s stall), and by making increments sufficiently small so as not to elicit the undesired separation distress, we may build a new reinforcement history for the anxious horse. Eventually he gains greater confidence to tolerate the separation, reassured that he will see his friend again.
THE POSITIVES OF POSITIVE REINFORCEMENT
The greatest positive of positive reinforcement training will become immediately evident in the first few sessions. Your reward is seeing your horse anticipate his work eagerly and respond enthusiastically as you gain a richer relationship with him and connect at an even deeper level to his generous spirit.