ncik-roberts.github.io

Python pattern matching: Guards and or-patterns might not interact in the way you expect

PEP 622 proposes adding a pattern matching construct to Python. Pattern matching allows the programmer to destructure data with a syntax that mirrors the construction syntax. The proposal brings Python in-line with many other modern programming languages, like Haskell, OCaml, and Rust. However, two features included in the proposal (or-patterns and guards) interact in a perhaps surprising way—see this paper for an explanation of this interaction as it relates to OCaml, another language with both or-patterns and guards.

You can try out an in-progress implementation of this PEP at this fork of cpython: https://github.com/brandtbucher/cpython/tree/patma.

In the rest of this post, I’ll briefly summarize pattern matching, define or-patterns and guards, and show the surprising behavior in the context of Python.

Example of pattern matching

Before the proposal, here’s how you might write a function that classifies a pair of animals according to their kind

class Dog:
  def __init__(self, name):
    self.name = name

class Cat:
  def __init__(self, age):
    self.age = age

def classify(pet):
  if isinstance(pet, Cat):
    print("A cat that's {} years old".format(pet.age))
  elif isinstance(pet, Dog):
    print("A dog named {}".format(pet.name))

classify(Dog("Fido"))
# prints: A dog named Fido

classify(Cat(3))
# prints: A cat that's 3 years old

Under the proposal, this is how you could write classify:

def classify(pet):
  match pet:
    case Cat(age=age):
      print("A cat that's {} years old".format(age))
    case Dog(name=name):
      print("A dog named {}".format(name))

In other words, pattern matching allows you to inspect a piece of data and simultaneously match its shape and bind the value of its fields to variables. In the above code, Cat(age=age) is a pattern that matches Cat values, and binds to the variable named age.

The proposal gives an informal description of the semantics of this construct, but the informal description doesn’t explain the interaction between two of the introduced features: or-patterns, and pattern guards. As it turns out, as currently implemented, the interaction between these features is somewhat surprising, and will probably yield at least one high-scoring Stack Overflow question if this PEP ends up being accepted.

Or-patterns

Here is an example of an or-pattern: Cat(age=3) | Dog(name="Fido"). This pattern matches either three-year-old cats or dogs named Fido. In practice:

# I have a three-year-old cat and a dog named Fido.
# This function returns true if the pet could be mine.
def couldBeMyPet(pet):
  match pet:
    case Cat(age=3) | Dog(name="Fido"):
      return True
    case _:
      return False

print(couldBeMyPet(Cat(3)))
# prints True

print(couldBeMyPet(Dog("Rover")))
# prints False

Pattern guards

Pattern guards are boolean expressions that can be included in a case branch; the case branch is only taken if the pattern guard evaluates to true.

def classify(pet):
  match pet:
    case Cat(age=age) if age % 2 == 0:
      print("This cat has an even age!")
    case Dog(name=name) if sorted(name) == list(name):
      print("This dog's name is in alphabetical order!")
    case _:
      print("I have nothing interesting to say about this pet.")

classify(Cat(4))
# prints "This cat has an even age!"

classify(Dog("abe"))
# prints "This dog's name is in alphabetical order!"

classify(Dog("fido"))
# prints "I have nothing interesting to say about this pet."

In the above example, if age % 2 == 0 and if sorted(name) == list(name) are pattern guards that allow that case to be taken only if that boolean expression evaluates to True.

The surprising interaction

What do you think this function does?

def doesEitherCatHaveAnEvenAge(pet1, pet2):
  match (pet1, pet2):
    case (Cat(age=age), _) | (_, Cat(age=age)) if age % 2 == 0:
      return True
    case _:
      return False

print(doesEitherCatHaveAnEvenAge(Cat(2), Cat(4)))
# prints True

print(doesEitherCatHaveAnEvenAge(Cat(2), Cat(5)))
# prints True

print(doesEitherCatHaveAnEvenAge(Cat(5), Cat(4)))
# prints False (?????)

The last one (doesEitherCatHaveAnEvenAge(Cat(5), Cat(4))) may seem like it should evaluate to True. After all, the one of the patterns in the or-pattern seems like it matches the second cat, which is a cat with an even age.

Does (Cat(5), Cat(4)) match against (Cat(age=age), _) | (_, Cat(age=age)) if age % 2 == 0? To answer this question, the semantics in your head might look something like this:

  1. Well, (Cat(5), Cat(4)) does match agains the first pattern of the or-pattern, (Cat(age=age), _). This match binds the value 5 to the variable age.
  2. Now we check the guard if age % 2 == 0. Because 5 % 2 == 0 evaluates to False, this check fails.
  3. Go back to the or-pattern and see if there’s any other way to match it. In this case, (Cat(5), Cat(4)) also matches the second pattern of the or-pattern, (_, Cat(age=age). This binds the value 4 to the variable age.
  4. Now the guard if age % 2 == 0 succeeds, because 4 % 2 == 0 evaluates to True.
  5. Because this case matches the value, we return True.

This is perfectly sensible, but this is not what the implementation does. The implementation’s semantics are more like this:

  1. Same as step 1 from above.
  2. Same as step 2 from above.
  3. Because the guard failed, the case isn’t taken! Instead, the next case is tried. This case (case _:) always succeeds, so we return False.

In other words, pattern matching does’t have backtracking semantics, even if a value can match an or-pattern in multiple different ways.

I love pattern matching, and I’m excited to see that it might be adopted by such a popular language as Python. But this particular interaction of features might confuse novices. Let’s compare the approaches of other languages.

OCaml has the same semantics as Python here, but the OCaml compiler can emit a warning (because the compiler has enough type information to figure out whether the constituent patterns of an or-pattern can possibly match the same value in different ways).

let () =
  match (1, 2) with
  | (x, _) | (_, x) when x = 2 -> print_endline ":)"
  | _ -> print_endline ":("
;;

This prints :(, but also emits the warning:

Warning 57: Ambiguous or-pattern variables under guard;
variable x may match different arguments. (See manual section 9.5)

Rust has pattern matching and guards, and has an experimental or_patterns feature. It implements backtracking semantics, unlike Python. This is weird in its own right, especially in the presence of side effects. Take a look at this program:

#![feature(or_patterns)]

// Prompt the user whether they want to accept the match
pub fn check(x: i32) -> bool {
    use std::io::{stdin};
    println!("Checking this match: {}. Should we accept it? y/n", x);
    let mut s = String::new();
    stdin().read_line(&mut s).expect("Did not enter a correct string");
    return s.trim() == "y"
}

pub fn main() {
    match (1, 2) {
        (x, _) | (_, x) if check(x) => {
            println!(":)")
        }
        _ => {
            println!(":(")
        }
    }
}

In particular, the guard can now run multiple times:

$ cargo +nightly run
Checking this match: 1. Should we accept it? y/n
n
Checking this match: 2. Should we accept it? y/n
y
:)

I’m not sure what the right decision is for Python. Or-patterns and guards may just not compose well.