Development is hard. One of the challenges newcomers face when learning a programming language is the knowledge gap between them and the people they learn from.
This becomes particularly apparent when a beginner is introduced to a high-level programming language. Interpreted languages have the benefit of being (relatively) simple to understand, given that the lower-level boilerplate is largely abstracted away for the user. This abstraction allows beginners to focus on their logic rather than worrying about making sure their data structures are sized appropriately for their contents or how to dereference a pointer. On the other hand, it also makes it easy to overlook fundamental computer science concepts that anyone who's been studying this stuff for any length of time might take for granted.
Knowledge gaps like these, in my experience, are the direct cause of misunderstandings such as the titular difference between the "is" and "==" operators in Python. The easiest way to address this knowledge gap is by diving into the REPL.
>>> # It should be no surprise that 1 is indeed equal to 1
>>> 1 == 1
True
>>> # But is 1...1?
>>> 1 is 1
<stdin>:1: SyntaxWarning: "is" with a literal. Did you mean "=="?
True
1 certainly is 1. But what is with the SyntaxWarning
?
In computer science, a literal is a notation for representing a fixed value in source code.
It seems that when we use literals, we should use the double-equals (==
) operator. Great. Why? What difference does it make?
The is
operator is evaluating whether the objects on both sides of the equality operation are the same. The same, here, meaning the same object in memory. This is not the same as the two objects having the same value.
So, as an aside, let's take a look at the memory location of the literal value 1
:
# Evaluate the hexadecimal value of the memory id of the literal 1
>>> hex(id(1))
'0x958e20'
# Is the literal always in the same memory location?
>>> hex(id(1))
'0x958e20'
As we can see, the literal 1
is an object in memory stored at 0x958e20
in memory on my machine. Thus, when we evaluate 1 is 1
it is, in fact, True
.
The SyntaxWarning
, therefore, is warning us that while we receive the expected evaluation of 1 is 1
(because they are the same literal object in memory), is
is likely not the correct operator for our evaluation. It is more likely that we intended to evaluate whether or not the value represented by the literal object 1
is equal to the value represented by the literal object 1
, not the addresses of the literal object 1
in memory!
So, when might we want to know if an object is the same object in memory as a constant value?
# We might want to check to see if a value is null
>>> None is None
True
We use the is
operator here because, in Python, None
is a specific object in memory. When the value of an object is None
, it is the same object as None
.
Okay, so we understand what the difference between the is
and ==
operators. We know what literals are. We understand that the value of an object is not the same as the object itself. So what?
Where this concept leads to trouble is when programmers use the is
or ==
operators interchangeably. Often, the behavior appears to act as expected and is particularly difficult to notice in complex logic. This is known as a "Comparison of Incompatible Types" and has been assigned a CWE (Common Weakness Enumeration) (CWE-1024) by MITRE.
For example:
>>> a = []
>>> a is []
False
Here, we initialized a to an empty list and then compared it to the empty list object using the is
operator.
# Which, if we examine their addresses in memory, is correct
>>> print(hex(id(a)), hex(id([])))
0x7fa32bcb6440 0x7fa32bcb6480
They are, in fact, different objects in memory, so they evaluate as False
. Suppose, however, the expected behavior was as follows:
>>> a = [1,2,3]
>>> a is []
False
If a
is expected not to be an empty list, we might return early from the method that evaluates this logic or perhaps even throw an exception. Therein lies the issue--we expect this to evaluate as False
. Evaluating True
would be an edge case. At a cursory glance, it would appear that the code is behaving as expected. Unfortunately, any code that might be intended to execute in the event of a True
evaluation is effectively dead code, meaning there are no circumstances in which it will run.
So--how can we make sure to avoid this pitfall? In a word: testing. Testing is outside of the scope of this post, but the general idea is to write unit tests that test that a method operates as expected and does not operate unexpectedly. This provides a sanity check that we did use the correct operator and that we wrote our logical evaluations without the presence of a CWE.