PEP 622 proposes adding a pattern matching construct to Python. Pattern matching allows the programmer to destructure data with a syntax that mirrors the construction syntax. The proposal brings Python in-line with many other modern programming languages, like Haskell, OCaml, and Rust. However, two features included in the proposal (or-patterns and guards) interact in a perhaps surprising way—see this paper for an explanation of this interaction as it relates to OCaml, another language with both or-patterns and guards.
You can try out an in-progress implementation of this PEP at this fork of cpython: https://github.com/brandtbucher/cpython/tree/patma.
In the rest of this post, I’ll briefly summarize pattern matching, define or-patterns and guards, and show the surprising behavior in the context of Python.
Before the proposal, here’s how you might write a function that classifies a pair of animals according to their kind
class Dog:
def __init__(self, name):
self.name = name
class Cat:
def __init__(self, age):
self.age = age
def classify(pet):
if isinstance(pet, Cat):
print("A cat that's {} years old".format(pet.age))
elif isinstance(pet, Dog):
print("A dog named {}".format(pet.name))
classify(Dog("Fido"))
# prints: A dog named Fido
classify(Cat(3))
# prints: A cat that's 3 years old
Under the proposal, this is how you could write classify
:
def classify(pet):
match pet:
case Cat(age=age):
print("A cat that's {} years old".format(age))
case Dog(name=name):
print("A dog named {}".format(name))
In other words, pattern matching allows you to inspect a piece of data and simultaneously match its shape and bind the value of its fields to variables. In the above code, Cat(age=age)
is a pattern that matches Cat
values, and binds to the variable named age
.
The proposal gives an informal description of the semantics of this construct, but the informal description doesn’t explain the interaction between two of the introduced features: or-patterns, and pattern guards. As it turns out, as currently implemented, the interaction between these features is somewhat surprising, and will probably yield at least one high-scoring Stack Overflow question if this PEP ends up being accepted.
Here is an example of an or-pattern: Cat(age=3) | Dog(name="Fido")
. This pattern matches either three-year-old cats or dogs named Fido. In practice:
# I have a three-year-old cat and a dog named Fido.
# This function returns true if the pet could be mine.
def couldBeMyPet(pet):
match pet:
case Cat(age=3) | Dog(name="Fido"):
return True
case _:
return False
print(couldBeMyPet(Cat(3)))
# prints True
print(couldBeMyPet(Dog("Rover")))
# prints False
Pattern guards are boolean expressions that can be included in a case
branch; the case branch is only taken if the pattern guard evaluates to true.
def classify(pet):
match pet:
case Cat(age=age) if age % 2 == 0:
print("This cat has an even age!")
case Dog(name=name) if sorted(name) == list(name):
print("This dog's name is in alphabetical order!")
case _:
print("I have nothing interesting to say about this pet.")
classify(Cat(4))
# prints "This cat has an even age!"
classify(Dog("abe"))
# prints "This dog's name is in alphabetical order!"
classify(Dog("fido"))
# prints "I have nothing interesting to say about this pet."
In the above example, if age % 2 == 0
and if sorted(name) == list(name)
are pattern guards that allow that case to be taken only if that boolean expression evaluates to True.
What do you think this function does?
def doesEitherCatHaveAnEvenAge(pet1, pet2):
match (pet1, pet2):
case (Cat(age=age), _) | (_, Cat(age=age)) if age % 2 == 0:
return True
case _:
return False
print(doesEitherCatHaveAnEvenAge(Cat(2), Cat(4)))
# prints True
print(doesEitherCatHaveAnEvenAge(Cat(2), Cat(5)))
# prints True
print(doesEitherCatHaveAnEvenAge(Cat(5), Cat(4)))
# prints False (?????)
The last one (doesEitherCatHaveAnEvenAge(Cat(5), Cat(4))
) may seem like it should evaluate to True. After all, the one of the patterns in the or-pattern seems like it matches the second cat, which is a cat with an even age.
Does (Cat(5), Cat(4))
match against (Cat(age=age), _) | (_, Cat(age=age)) if age % 2 == 0
? To answer this question, the semantics in your head might look something like this:
(Cat(5), Cat(4))
does match agains the first pattern of the or-pattern, (Cat(age=age), _)
. This match binds the value 5 to the variable age
.if age % 2 == 0
. Because 5 % 2 == 0
evaluates to False, this check fails.(Cat(5), Cat(4))
also matches the second pattern of the or-pattern, (_, Cat(age=age)
. This binds the value 4 to the variable age
.if age % 2 == 0
succeeds, because 4 % 2 == 0
evaluates to True
.This is perfectly sensible, but this is not what the implementation does. The implementation’s semantics are more like this:
case _:
) always succeeds, so we return False.In other words, pattern matching does’t have backtracking semantics, even if a value can match an or-pattern in multiple different ways.
I love pattern matching, and I’m excited to see that it might be adopted by such a popular language as Python. But this particular interaction of features might confuse novices. Let’s compare the approaches of other languages.
OCaml has the same semantics as Python here, but the OCaml compiler can emit a warning (because the compiler has enough type information to figure out whether the constituent patterns of an or-pattern can possibly match the same value in different ways).
let () =
match (1, 2) with
| (x, _) | (_, x) when x = 2 -> print_endline ":)"
| _ -> print_endline ":("
;;
This prints :(
, but also emits the warning:
Warning 57: Ambiguous or-pattern variables under guard;
variable x may match different arguments. (See manual section 9.5)
Rust has pattern matching and guards, and has an experimental or_patterns
feature. It implements backtracking semantics, unlike Python. This is weird in its own right, especially in the presence of side effects. Take a look at this program:
#![feature(or_patterns)]
// Prompt the user whether they want to accept the match
pub fn check(x: i32) -> bool {
use std::io::{stdin};
println!("Checking this match: {}. Should we accept it? y/n", x);
let mut s = String::new();
stdin().read_line(&mut s).expect("Did not enter a correct string");
return s.trim() == "y"
}
pub fn main() {
match (1, 2) {
(x, _) | (_, x) if check(x) => {
println!(":)")
}
_ => {
println!(":(")
}
}
}
In particular, the guard can now run multiple times:
$ cargo +nightly run
Checking this match: 1. Should we accept it? y/n
n
Checking this match: 2. Should we accept it? y/n
y
:)
I’m not sure what the right decision is for Python. Or-patterns and guards may just not compose well.