Your Python Code Arrows to Dispatch Jurassic Loop Holdouts

A tutorial by example:

  • List, set, and dictionary comprehensions, with conditions or filter-like behavior;
  • the zipping and unzipping of lists, dictionaries, and tuples in conjunction with comprehensions;
  • followed by some speed measurements that will compare the performance of “old-style” loops with comprehensions.

0. The Pythonic Character of Comprehensions

If you — like I did — came to Python from other programming languages that do not offer similar objects, you’ve probably been puzzled when you were confronted with list comprehensions for the first time; and were amazed when you grasped how they can make your code more concise and faster. Comprehensions, like loops, serve the purposes of filtering lists or dictionaries, extracting items, and transforming values. Let’s walk through a sequence of examples for both loops and comprehensions.

1. Zips and Comprehensions: Dealing with Lists, Tuples, and Dictionaries

1.1 Zipping

1.1.a Zipping Lists of Equal Length

Suppose we have to process an unstructured group of values we have received from separate methods in a Python script. In our case, we assume that we need to deal with a list of three prediction accuracy metrics: RMSE, MAPE, and R-squared. We want to avoid passing each of these variables individually to other methods. Rather, the three metrics should be processed in tandem. Therefore, we will collect them in a list or dictionary and then demonstrate how we can operate with them and on them.

We combine the numerical values in a list, acc_values.

To distinguish the metrics, we write down their names in a second list, acc_names, taking care that it matches the sort order of the first list.

# zipping two or more separate lists to generate a combined list of tuples
# example: assume we have multiple values the script has provided;
# for instance, different prediction accuracy metrics such as RMSE, MAPE, R-squared
# we want to assign each of the results to a specific variable
inputs = [1, 0.04, 0.9]
# clumsy: manual assignment of a results list and assigning to specific variables:
inputs = [1, 0.04, 0.9]
rmse = inputs[0]
mape = inputs[1]
rsq = inputs[2]
# better: unpacking of the list
rmse, mape, rsq = inputs
# collect multiple calculations results in a list
acc_values = [rmse, mape, rsq]
# define a list of names for these results
acc_names = [“RMSE”, “MAPE”, “R-sq”]
# alternative to creating a list:
# asterisk * in front of the list variable; and a comma , behind it
*acc_names, = “RMSE”, “MAPE”, “R-SQ”
acc_names

view rawzip_41_zip1 hosted with ❤ by GitHub

image by author

Lines 23 and 27 are alternative formulations, they generate the same list of names.

Stealthily, we’ve introduced an operation on lists without using a keyword in line 27. Note the asterisk * in front of the list variable, and the comma behind it. The syntax

  • *listname, = list of comma-separated values

generates a list. The asterisk * is actually known as an unzipping operator. But it can also combine items to create a list when we apply the syntax of line 27.

Next, we combine the two lists by zipping them. Syntax:

  • mylist = list(zip(list1, list2, …))
# combine the names of the metrics and their values via zipping to a list of tuples
acc_list = list(zip(acc_names, acc_values))
acc_list

view rawzip_2_zip2 hosted with ❤ by GitHub

image by author

image by author

Python’s zip function pairs the corresponding items of two (or more) lists in a single list of tuples, without the need for a multi-line loop.

1.1.b Unzipping a List of Tuples

What about the opposite case? Can we, if we are confronted with a list of tuples, unzip them to obtain separate lists without needing to write a loop? Yes, we precede the list variable with an asterisk or star * to unpack the tuples. Syntax:

  • var1, …, varN = zip(*mylist)

# short digression #1 on zip: unzipping a list of tuples: use the asterisk *

names, values = zip(*acc_list)
print(names)
print(values)

image by author

image by author

1.1.c Zipping Lists of Unequal Length, with Omissions

If we deal with multiple lists that differ in their item counts, we can still combine them to a list of tuples despite the unequal length by applying the list() constructor. However, the zip function takes the shortest list and omits the corresponding elements of longer lists.

Syntax:

  • mylist = list(zip(list1, …, listN))

# short digression 2 on zip: zipping lists of unequal length
# zip takes the shorter list and skips the corresponding item of the longer partner list

acc_names2 = ["RMSE", "MAPE"]
acc_values2 = [rmse, mape, rsq]
acc_list2 = list(zip(acc_names2, acc_values2))
acc_list2

image by author

image by author

1.1.d Zipping Lists of Unequal Length, with Padding

Skipping the “excess” items which the longer lists contain will not be the preferred behavior in most cases. Luckily, the itertools library comes to the rescue and provides the zip_longest function. It inserts all the items of the longest list into the tuples and then writes None into those tuple items to which the shorter lists cannot contribute values. Here, for instance, the preceding methods do not provide a value for the fourth metric, ‘MSE,’ therefore zip_longest inserts a value of None to complete the last tuple.

Note again our mantra of conciseness: where possible, we want to create a list in a single line of code, like the one in row 8 below; hence, without a loop that would require several lines.

Syntax:

  • mylist = list(zip_longest(list1, …, listN))
# short digression 3 on zip: zipping lists of unequal length by using zip_longest
# zip_longest also shows list items that don't have a partner item in the other list

from itertools import zip_longest
acc_names3 = ["RMSE", "MAPE", "R-sq", "MSE"]
acc_values3 = [rmse, mape, rsq]

acc_list3 = list(zip_longest(acc_names3, acc_values3))
acc_list3

image by author

image by author

1.2a Zipping a List of Tuples to Create a Dictionary

Let’s convert the list of tuples into a dictionary to get rid of the many brackets. Syntax:

  • mydict = dict(mylist)
# convert the list of zipped tuples to a dictionary

acc_dict = dict(acc_list)
acc_dict
# not ready for pretty-printing

image by author

image by author

1.2b Zipping Lists to Create a Dictionary

To demonstrate how the zip functions can be applied to lists, we’ve made a detour and combined the original list of names and the list of values in a single list of tuples before we converted that one to a dictionary.

Now we skip a redundant step: we demonstrate how the two original lists can directly be converted to a dictionary without a detour through tuples: again by using the zip function. Row 3 represents a dictionary comprehension.

Syntax:

  • mydict = {key:val for key, val in zip(listK, listV)}
# faster: zip the names and values lists directly into a dictionary,
# without creating a list of tuples in between
acc_dict = {k: v for k, v in zip(acc_names, acc_values)}
acc_dict

image by author

image by author

The dictionary removed the many brackets of the tuples.

But how can we generate a tabular layout that will be better readable in a report?

1.3 Unpacking a Dictionary

Before we continue with our zip-generated dictionary — how do we implement the opposite operation and unpack an existing dictionary, preferably without loops?

We can assign the dictionary values to comma-separated variables on the left-hand side of code line 3. Or, instead of values, assign pairs consisting of both key and value.

Syntax:

  • var1, …, varN = mydict.items()

# unpack the keys and values of a dictionary to separate tuples

rmse, mape, rsq = acc_dict.items()
print(rmse)
print(mape)
print(rsq)

image by author

image by author
  • var1, …, varN = mydict.values()

# unpack the values of a dictionary and assign to separate variables

rmse, mape, rsq = acc_dict.values()
print(rmse)
print(mape)
print(rsq)

image by author

image by author
# unpack the values or keys of a dictionary to a list
# use the .keys and .values() functions and then typecast to a list using the list() function

acc_keys = list(acc_dict.keys())   
print(acc_keys) 
acc_values = list(acc_dict.values())    
print(acc_values)

image by author

image by author

Or we unpack the dictionary keys and values to two separate lists, by using the .keys() and .values() functions and typecasting their outcomes with the list() constructor. Syntax:

  • listKeys = list(mydict.keys())
  • listValues = list(mydict.values())
# unpack a dictionary to a list of tuples by using zip() and list()

acc_tuples = list(zip(acc_dict.keys(), acc_dict.values()))
acc_tuples

image by author

image by author

Or, third variant, we pairwise unpack the keys and values to a list of tuples, by using both the zip and list functions in tandem.

A shorter way to the list of tuples:

# unpack a dictionary to a list of tuples by using zip() and list()

acc_tuples = list(acc_dict.items())
acc_tuples

image by author

image by author

Syntax:

  • listTuples = list(mydict.items())

1.4 The Structure of List and Dictionary Comprehensions

The general structure of comprehensions follows this pattern:

  • an expression at the beginning; in the following examples: the print function that takes they key and the value of a dictionary item and prints the pair;
  • next, the for — in’ reference to the items (key and value) which the comprehension will extract from the dictionary (in list comprehensions, the key is not applicable); the speed tests we will run further below will demonstrate that the ‘for — in’ construct within comprehensions is faster than the traditional for loops;
  • followed by the dictionary or list itself which contains all the keys and/or values which the expression is supposed to process;
  • optionally, a conditional expression (if — else) can be appended to the comprehension’s syntax to filter the dictionary or list; we will see an example further below.
  • Syntax: [expression for item in list]
  • example: mylist = [x**2 for x in numberslist]
  • if numberslist =[1,2,3], then mylist = [1,4,9]
  • Syntax: {expression for key, value in dictionary.items() or in list}
  • example: mydict = {v: v**2 for v in numberslist}
  • The expression squares the values in the iterable, which is a list of numbers in the example, and then pairs the input argument (which it will interpret as the key) with its squared value.
  • If numberslist =[1,2,3], then mydict = {1:1, 2:4, 3:9}

1.5 Examples: Comprehensions Used for Printing

To see practical examples, let’s create some dictionary and list comprehensions that will pretty-print multiple results. We want to obtain a report-ready output in a tabular, vertical layout:

image by author
image by author

Python offers the pretty-print library with its function pprint(). But it does not return the layout we’d prefer, the one shown below:

image by author

We could use a for-loop to print the dictionary items one by one. In this case, the loop shown above is even quite concise. But the speed tests further below will demonstrate that loops are significantly slower, whether we use them for printing or other methods. The time inefficiency would be felt when we need to cope with lists or dictionaries that contain hundreds or thousands of items.

We could convert the dictionary to a dataframe. The one-liner is a neat alternative to comprehensions.

image by author

Though if we need to deal with a much larger dictionary that contains 10,000 strings as its values and their index numbers as its keys, a conversion to a dataframe turns out to be a whopping 175 times slower than a list comprehension.

As an alternative to long-winded loops and slower dataframe conversions, let’s try out a list comprehension on our dictionary.

We enclose our dictionary of three prediction accuracy metrics in a list comprehension, which I nickname a print comprehension whenever I use it for pretty-printing. The print() function represents the expression at the start of any comprehension.

# print the values and their names line by line: list comprehension

[print(k, ":", v) for k,v in acc_dict.items()]

image by author

image by author

The list comprehension prints the contents of the dictionary, one below the other.

The only flaw: the list of three None values. From where do they originate? This is not a bug in the list comprehension itself. Rather, the print function is so defined that it returns None, and does it for each of the key/value pairs it processes. We are going to suppress the None values.

First approach: insert another row beneath the print comprehension.

# list comprehension for printing, without list of Nones
[print(k, “:”, v) for k,v in acc_dict.items()]
print(“\n”)

view rawzip_9_print2 hosted with ❤ by GitHub

image by author

image by author

Second approach: the additional line can also consist of a pass statement.

# list comprehension for printing, without list of Nones
[print(k, “:”, v) for k,v in acc_dict.items()]
pass

view rawzip_10_print4 hosted with ❤ by GitHub

image by author

image by author

Third approach: assign the print comprehension to any variable.

# dictionary comprehension for printing, without list of Nones
y = [print(k, “:”, v) for k,v in acc_dict.items()]

view rawzip_11_print6 hosted with ❤ by GitHub

image by author

image by author

Fourth approach: an underscore to the left serves the same purpose.

# dictionary comprehension for printing, without list of Nones
_ = [print(k, “:”, v) for k,v in acc_dict.items()]

view rawzip_12_print7 hosted with ❤ by GitHub

image by author

image by author

Finally, let’s complete our print comprehension exercise and pretty-print the dictionary of named metrics and their numerical values (v). The names of the metrics serve as the keys (k). We enclose the value variable v in a number format inside the print function.

# list comprehension for pretty-printing, with number format
c = [print(k, “:”, f'{v:.1f}’) for k,v in acc_dict.items()]

view rawzip_14_print8 hosted with ❤ by GitHub

image by author

image by author

For comparison, let’s have a look at an alternative code construction I saw on several websites. To print a dictionary, the websites proposed to define a function that runs through a for-loop. It does the job, but I’d hesitate to call it Pythonic elegance.

# on the web, I’ve seen an alternative:
# the use of functions with a loop to unpack and print dictionary items
# the double asterisk ** in the function call unpacks the dictionary
# the **kwargs argument accepts an undetermined number of keyword arguments
# it needs 4 lines of code where the list comprehension needed 1 to print the dictionary
def print_dict(**kwargs):
for key,value in kwargs.items():
print(f'{key}:{value:.1f}’)
# call function:
print_dict(**acc_dict)

view rawzip_46_printfunc hosted with ❤ by GitHub

By now, we are sensitive enough to loops that their occurrence in a Python script will give us some pause. Comprehensions cannot replace loops in all circumstances. But this task— as simple as printing a dictionary — should require no more than one line of code. Defining a function to use print() —itself a function — appears redundant. If not the function, then the loop in the function body should raise our hackles and motivate a glance at possible alternatives that need fewer lines.

1.6 Summary: Zip to Dict and Print

It’s time to summarize the steps we’ve taken and omit the alternative and intermediate solutions we’ve discussed in order to demonstrate the various ways in which zip functions and comprehensions can interact with one another. We are going to condense the script to four lines of code.

We wanted to deal with a group of prediction accuracy metrics and their values, which preceding calculations have generated and passed to our script.

  • We combined their separate values in a list, acc_values.
  • To correctly label the values, we created a list of names for the metrics, acc_names.
  • In row 13, zipping generates a list of tuples which the dictionary comprehension turns into a dictionary,
  • which we print in row 16 via a list comprehension.
# summary:
# if you have multiple values with individual names, e.g. a list of metrics which
# you want to summarize, collect the values in a list
rmse = 1
mape = 0.04
rsq = 0.9
acc_values = [rmse, mape, rsq]
# define a list of names for the metrics
acc_names = [“RMSE”, “MAPE”, “R-sq”]
# combine the lists of names and values via zipping to a dictionary
acc_dict = {k: v for k, v in zip(acc_names, acc_values)}
# pretty-print the dictionary of metrics
_ = [print(k, “:”, f'{v:.1f}’) for k,v in acc_dict.items()]

view rawzip_46_sum hosted with ❤ by GitHub

image by author

image by author

Theoretically, we could merge rows 13 and 16 in a single row that creates the dictionary and immediately prints it.

But readability should be prioritized over reducing the number of code lines. A single line that is just twice as long does not count as an improvement. The creation of the dictionary and its print-out represent separate purposes, therefore each of them deserves its own line, as in rows 13 and 16.

2. Performance: Speed of Comprehensions and Loops

Chapter 1 has focused on the length of the code that comes with loops, in contrast to the less verbose single-liners that most comprehensions require. Comprehensions, in general, are more concise.

In this chapter, let’s test whether the shorter code of comprehensions also results in faster execution.

To generate data on which a loop and a comprehension can compete with one another, we create a list we fill with random numbers. We use a short list comprehension in line 3.

The comprehension c that will serve our purpose consists of a short line of code in row 15 while the loop, as usual, needs multiple lines to process all elements of the list.

# small list of random numbers: the jurassic way to square them in a for loop
rands = [random.randrange(1, 100, 1) for i in range(5)]
print(rands)
# list of random numbers: the jurassic way to square them in a for loop
rands2 = []
for n in rands:
n = n**2
rands2.append(n)
print(rands2)
# list comprehension to square them
c = [n**2 for n in rands]
c

view rawzip_21_smlist hosted with ❤ by GitHub

image by author

image by author

2.1 A List Comprehension to Operate on Numerical Values

After preparing the race track as shown above, we want to look for measurable speed differences. Therefore, using the same code as above, we create a much longer list of 100,000 random numbers and then unleash the loop and the comprehension on it.

# list comprehension to square each number in a LARGE list of 100,000 numbers
rands = [random.randrange(1, 100, 1) for i in range(100000)]
t = time.perf_counter()
# >>>>>>>>>>>>>>>>>>>>>>>>
rands2 = []
for n in rands:
n = n**2
rands2.append(n)
# >>>>>>>>>>>>>>>>>>>>>>>>
tLoop = time.perf_counter() – t
print(f'{tLoop:.3f} sec’)
t = time.perf_counter()
# >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
_ = [n**2 for n in rands]
# >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
tComp = time.perf_counter() – t
print(f'{tComp:.3f} sec: comprehension vs loop: {100*(tComp/tLoop-1):.1f}%’)

view rawzip_22_square1 hosted with ❤ by GitHub

image by author

image by author

The list comprehension is 21% faster than the loop.

Note that the measured times will differ whenever you rerun the code. The effective time depends on how your computer allocates processor capacity between tasks that are in the waiting line at any given moment. But the outcomes can be expected to demonstrate that comprehensions are faster than loops with few exceptions.

2.2 Conditional Expressions in List Comprehensions

Let’s add another layer to the loop and the comprehension — by subjecting them to a conditional expression they are to evaluate before applying the expression to the list items.

Any list value in our example should only be squared if it is larger than 90. This construction filters the source list by only passing those values which meet the condition to the list of results, after applying the expression to the selected values.

Syntax of the conditional comprehension:

  • mylist = [expression for item in list if (condition==True)]

A condition can also contain an else clause that chooses between two different expressions. Where there was a single expression at the beginning of the comprehension in the previous example, there are now two alternative expressions. But only one of them will be executed on each of the values the comprehension reads from the source list. We can again append a condition at the end, which has the effect of a filter, as in the previous example, and would thus limit the evaluations of the condition #2 to the items that meet condition #1. In other words, condition #1 selects the source data that will be passed to the two expressions at the beginning of the line; then condition #2 determines which of the two expressions, either A or B, will transform the selected source value.

Syntax:

  • mylist = [expressionA if (condition2==True) else expressionB for item in list if (condition1==True)]

Example: square an argument if it exceeds 90; but raise it to the third power if it is smaller or equal to 90; return all the exponentiated results, whether they have been squared or cubed, as a list ‘mylist’; but only if the source value is an even number as per the condition at the end of the line — thus, omit odd numbers from the exponentiations and from the list of results:

  • mylist = [(x**2) if (x>90) else (x**3) for x in list if (x%2==0)]

The following example demonstrates the syntax with a single, filter-like condition #1 at the end. The comprehension only processes numbers greater than 90 and omits smaller arguments from the squared results, which therefore will form a shorter list than the source list of 100,000 small and large random numbers.

# expression with a condition (filter via if):
# square only those of the 100,000 numbers in the list which exceed 90
rands = [random.randrange(1, 100, 1) for i in range(100000)]
t = time.perf_counter()
# >>>>>>>>>>>>>>>>>>>>>>>>
rands2 = []
for n in rands:
if n > 90:
n = n**2
rands2.append(n)
# >>>>>>>>>>>>>>>>>>>>>>>>
tLoop = time.perf_counter() – t
print(f'{tLoop:.3f} sec’)
t = time.perf_counter()
# >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
c = [n**2 for n in rands if n > 90]
# >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
tComp = time.perf_counter() – t
print(f'{tComp:.3f} sec: comprehension vs loop: {100*(tComp/tLoop-1):.1f}%’)
print(“found: ” + str(len(rands2)))
print(“found: ” + str(len(c)))

view rawzip_23_rand2 hosted with ❤ by GitHub

image by author

image by author

The comprehension races through the list at a 45% higher speed than the loop.

We test another variant: We formulate a loop and a list comprehension that both contain a conditional expression: square a value only if it is an even number and skip the odd-numbered arguments

# expression with condition (filter via if): square only the even numbers among the 100,000 in the list
rands = [random.randrange(1, 100, 1) for i in range(100000)]
t = time.perf_counter()
# >>>>>>>>>>>>>>>>>>>>>>>>
rands2 = []
for n in rands:
if n % 2 == 0:
n = n**2
rands2.append(n)
# >>>>>>>>>>>>>>>>>>>>>>>>
tLoop = time.perf_counter() – t
print(f'{tLoop:.3f} sec’)
t = time.perf_counter()
# >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
c = [n**2 for n in rands if n % 2 == 0]
# >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
tComp = time.perf_counter() – t
print(f'{tComp:.3f} sec: comprehension vs loop: {100*(tComp/tLoop-1):.1f}%’)
print(“found: ” + str(len(rands2)))
print(“found: ” + str(len(c)))

view rawzip_25_cond1 hosted with ❤ by GitHub

image by author

image by author

The conditional list comprehension is 18.8% faster than the loop. That’s close to the low end. At times, you will see processing times that are 40–50 % faster, depending on the workload prioritizations of your processors for concurrent tasks.

2.3 A List Comprehension to Operate on Strings

We have investigated the performance of list comprehensions for numerical variables. Let’s check how loops and comprehensions fare with lists of strings such as names, words, or date strings.

To quickly obtain a long list of strings we want to use for search and filter exercises, we create a date range that contains the names of 10,000 weekdays and then convert the elements from datetime to string type.

We execute the conversion by means of a list comprehension (not via a loop, of course, whenever we can avoid one). The expression element in this comprehension consists of the strftime function.

# list comprehension for text
# create a list of 10,000 dates
datlist = pd.date_range(dt.datetime.today(), periods=10000).tolist()
# convert the dates to strings via list comprehension
datstrlist = [d.strftime(“Day %d in %B of year %Y is a %A”) for d in datlist]
datstrlist[:4]

view rawzip_26_text1 hosted with ❤ by GitHub

image by author

image by author

2.4 A Conditional List Comprehension To Operate on Strings

Next, we formulate a conditional expression to sieve through the list of date strings. Let’s search for Saturdays and Sundays in the month of October of all years. The loop and the comprehension are supposed to modify the date strings they find by appending the suffix “ = Oct weekend” to the dates.

We evaluate three conditions: the name of the day is “(S)unday” or “(Sat)urday” and the name of the month begins with “Oc”. These conditions determine whether or not a value will be subjected to the expression (here, by adding the suffix “ = Oct weekend”). The else term tells the comprehension to include the other date strings in the list of results without modifying them.

We could also use an additional condition to filter the values that will be subjected to the expression. For instance, only values in the year 2022 are to be evaluated. As in the example of the random numbers above, we would append such a filter condition (condition1) to the end of the list comprehension. Syntax:

  • mylist = [expressionA if (condition2==True) else expressionB for item in list if (condition1==True)]
filter the list of strings:
# example: add “weekend” to Saturdays and Sundays in October of each year
t = time.perf_counter()
# >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
strLoop = []
for d in datstrlist:
if (d.endswith(“urday”) or d.endswith(“unday”)) and “Oc” in d:
strLoop.append(d + ” = Oct weekend”)
else:
strLoop.append(d)
# >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
tLoop = time.perf_counter() – t
print(f'{tLoop:.4f} sec’)
t = time.perf_counter()
# >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
c = [d + ” = Oct weekend” if ((d.endswith(“urday”) or d.endswith(“unday”)) and “Oc” in d) else d for d in datstrlist]
# >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
tComp = time.perf_counter() – t
print(f'{tComp:.4f} sec: comprehension vs loop: {100*(tComp/tLoop-1):.1f}%’)
df = pd.DataFrame(list(zip(strLoop,c)), columns=(“loop”,”comprehension”))
df

view rawzip_30_text2 hosted with ❤ by GitHub

image by author

image by author

The list comprehension, using the same conditional expression as the loop, is 21.9% faster.

2.5 Set Comprehensions

After having demonstrated how list and dictionary comprehensions outperform loops, let’s look at an example of set comprehensions.

We want to find the prime numbers among the first N = 100,000 integers.

We will use an improved variant of the sieve of Eratosthenes (Sieve of Eratosthenes — Wikipedia). First, the sieve algorithm creates a set of all the integers from 2 to 100,000 (by modern definition, 1 is not a prime). Then it paces through all the integers i up to the square root of N and discards from the set of 100,000 those numbers j which are equal or larger than the square of i. The sieve is by far not the fastest algorithm to find primes. But the mathematics aside, we can apply the logic of the sieve both in a loop and a set comprehension to compare their processing speeds.

The traditional style would use a nested loop: ‘for i’ as the outer loop and ‘for j’ as the inner loop. Let’ s create a nested comprehension to act as its challenger.

Syntax of nested comprehensions:

  • myset = {{expression(itemA, itemB) for itemA in setA} for itemB in setB}
loops vs vs set comprehensions
# find all prime numbers up to N using the sieve of Eratosthenes
N = 100000
t = time.perf_counter()
# >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
primes_loop = set(range(2, N))
for i in range(2, int(N**0.5)+1):
for j in range(i**2, N, i):
primes_loop.discard(j)
# >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
tLoop = time.perf_counter() – t
print(f'{tLoop:.5f} sec’)
t = time.perf_counter()
# >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
primes_setcomp = set(range(2,N)) – {j for i in range(2, int(N**0.5)+1) for j in range(i**2, N, i)}
# >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
tComp = time.perf_counter() – t
print(f'{tComp:.4f} sec: set comprehension vs loop: {100*(tComp/tLoop-1):.1f}%’)
assert primes_loop == primes_setcomp
primes_list1 = [p for p in primes_loop if p < 100]
primes_list2 = [p for p in primes_setcomp if p < 100]
df = pd.DataFrame([primes_list1, primes_list2])
df

view rawzip_62_primes hosted with ❤ by GitHub

image by author

image by author

The set comprehension is 57% faster than the loop.

2.6 Are Comprehensions the Panacea for All Iterables?

So … are we done yet? Have comprehensions been proven unbeatable under any circumstances? Well, not quite.

Let’s filter the list of date strings again, but this time by using a filter based on a lambda function.

# comprehensions vs lambda-filters
t = time.perf_counter()
# >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
strLamb = filter(lambda d: ((d.endswith(“urday”) or d.endswith(“unday”)) and “Oc” in d), datstrlist)
# >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
tLamb = time.perf_counter() – t
print(f'{tLamb:.4f} sec’)
t = time.perf_counter()
# >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
c = [d for d in datstrlist if ((d.endswith(“urday”) or d.endswith(“unday”)) and “Oc” in d)]
# >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
tComp = time.perf_counter() – t
print(f'{tComp:.4f} sec: comprehension vs lambda-filter: {100*(tComp/tLamb-1):.1f}%’)
df = pd.DataFrame(list(zip(strLamb,c)), columns=(“lambda”,”comprehension”))
df

view rawzip_32_lmbd hosted with ❤ by GitHub

The lambda-based filter takes 0.0001 seconds. The list comprehension, at 0.0033 seconds, is 22 times slower.

image by author

We will reserve horse races between list comprehensions and lambda-filters as a separate topic for another article.

3. Conclusions

Today’s article has focused on the comparison of traditional loops with list, set, and dictionary comprehensions; and on the zip, unzip, and unpack methods associated with comprehensions.

The examples demonstrated

  • how comprehensions and zip functions interact with one another on lists, tuples, and dictionaries;
  • how their interactions can be coded concisely, often expressible in a single line, whereas loops would extend over multiple lines — each of the examples showed an operation that in many other languages would typically require a loop;
  • and that we can expect comprehensions to offer superior speed over traditional loops.

Original Source

Author

Business intelligence consultant. striving to deep-teach dachshund neural nets that training & test datasets should yield consistent results.