Python Programming
for big Data Analysis and Visualisation

Session 1 : Python Basics¶

Welcome to the Python Programming for big Data Analysis and Visualisation course. In this notebook you will find the material covered during Session 1.

Exercises¶

Here are links to all exercises:

  • Exercise 1
  • Exercise 2
  • Exercise 3
  • Exercise 4
  • Exercise 5
  • Exercise 6
  • Exercise 7
  • Exercise 8
  • Exercise 9
  • Exercise 10
  • Exercise 11
  • Exercise 12
  • Exercise 13
  • Exercise 14
  • Exercise 15
  • Exercise 16
  • Exercise 17

Python as a Calculator¶

Python number types are int and float.

Operations on numbers include:

  • +, -, *, / (as in other languages)
  • ** (power)
  • // (integer division)
  • % (modulo, or remainder of integer division)

Try some operations!

  1. Create a new cell by pressing the Insert cell (+) button above.
  2. Type your commands in the new cell.
  3. Execute the cell by pressing the Execute (triangle) button above or pressing Shift-Enter.
In [11]:
# Type your commands here
7 % 2
Out[11]:
1
In [12]:
7 // 2
Out[12]:
3
In [15]:
2 * (3 + 3)
Out[15]:
12

Exercise 1¶

Suppose you have $100\$$, which you can invest with a 10\% interest each year. After one year, it's $1001.1=110$$, after two years $1001.1*1.1=121$$, etc.

Calculate how much money you end up with after 7 years.

(N.B.: to solve an exercise, add a cell below this one using the (+) button above, fill it with the solution, then execute it using the triangle button above)

In [21]:
100 * 1.1 * 1.1 * 1.1 * 1.1
Out[21]:
146.41000000000008
In [23]:
100 * 1.1 ** 7
Out[23]:
194.87171000000012

Types¶

The type function allows us to determine the type.
Type names are also functions to perform type conversions.

In [24]:
type(2)
Out[24]:
int
In [25]:
type(1.1)
Out[25]:
float
In [26]:
int(1.1)
Out[26]:
1
In [27]:
float(2)
Out[27]:
2.0

Variables¶

Variables are assigned using the assignment operator =.
Variable names must be as descriptive as possible.
Two conventions for variable naming are common :

  1. snake_case (words are separated by underscore)
  2. CamelCase (beginning of words are marked by uppercase letters)
In [29]:
variable = 2
variable
Out[29]:
2
In [30]:
a_number = 100
another_number = 101
yet_another_number = a_number
yet_another_number
Out[30]:
100

Exercise 2¶

Redo the previous exercise by assigning values to correctly named variables and using them in the calculation, and assigning the result to the variable final_amount.

In [34]:
initial_amount = 200
interest = 0.1
period = 7

# 100 * 1.1 ** 7
final_amount = initial_amount * (interest + 1.0) ** period

An alternative solution :

In [36]:
initial_amount = 200
interest = 1.1
period = 7

final_amount = initial_amount * interest ** period

Or in CamelCase :)

In [53]:
initialAmount = 200
interest = 1.1
period = 7

finalAmount = initialAmount * interest ** period

When writing code, try to keep in mind:

K.I.S.S. = Keep It Simple, Stupid

Write code as clear and as simple as possible.

Print¶

The output of a Notebook cell is the value of the last expression evaluated.
The print() function outputs its argument.

In [39]:
final_amount
Out[39]:
389.74342000000024
In [41]:
print(interest)
print(period)
print(final_amount)
1.1
7
389.74342000000024

Note: behind the scenes, the display() function is used to display the cell output.

Comments¶

In Python, everything following a # character on the same line is considered a comment, and therefore not interpreted by Python:

In [42]:
# This is a comment
interest = 0.1  # 1 = 100%
# interest above is 100% when 1
In [43]:
# interest = 0.2
In [44]:
interest
Out[44]:
0.1

Errors¶

Whenever the Python interpreter fails to execute the code in the cell, it will output an error.
When Python doesn't understand what you have written it it will output a Syntax Error.

In [47]:
print(final_amount  # ops, forgot )
  File "<ipython-input-47-165c6b5c7c82>", line 1
    print(final_amount
                      ^
SyntaxError: unexpected EOF while parsing

Errors indicate where Python believes the error is located, but please bare in mind that Python's ability to understand this is limited.
Another type of error is a Name Error, typically when you have mis-spelled a variable name.

In [50]:
print(final_amout)  # ops, I missed letter "n"
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-50-524dfac17014> in <module>
----> 1 print(final_amout)

NameError: name 'final_amout' is not defined

Strings¶

Strings are sequences of characters.
They are expressed using quotes (either single ' or double ", there is no difference).
The type of strings is str; as before, str() is also a function to perform type conversion.

In [4]:
word = "Monty Python"
In [58]:
type(word)
Out[58]:
str
In [59]:
str(4)
Out[59]:
'4'
In [77]:
int('1')
Out[77]:
1
In [79]:
float('1.3321')
Out[79]:
1.3321
In [1]:
int('four')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-2f24743179af> in <module>
----> 1 int('four')

ValueError: invalid literal for int() with base 10: 'four'

Indexing and slicing¶

We can access elements of a string by using indexing (positive and negative indexes) as shown below.

The len() function returns the number of characters in a string.
We can access multiple elements of a string by using slicing, using the following syntax:

start:stop:step

If any of these three values are omitted, Python sets starts at 0 and stops at the end with a step of 1.

In [64]:
word[7]
word[-1]
Out[64]:
'n'
In [65]:
len(word)
Out[65]:
12
In [69]:
word[6:10]
word[-12:-7]
Out[69]:
'Monty'
In [68]:
word[6:10:2]
Out[68]:
'Pt'
In [73]:
word[6:]
word[:5]
word[::]
Out[73]:
'Monty Python'
In [76]:
word[::-1]
Out[76]:
'nohtyP ytnoM'

Exercise 3¶

Given the variable greetings defined below, extract each word and assign it to variables first, second and third.

In [9]:
greetings = "How are you"
In [87]:
first = greetings[0:3]
second = greetings[4:7]
third = greetings[8:]

Operations¶

Strings are immutable: that means that we cannot modify strings.

In [6]:
word[0] = 'J'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-91a956888ca7> in <module>
----> 1 word[0] = 'J'

TypeError: 'str' object does not support item assignment

Strings support some operations. In particular

  • + (concatenate strings)
  • * (repeat strings)
In [5]:
print('J' + word[1:])
print('a' + 'b' + 'c')
print('a' * 5)
Jonty Python
abc
aaaaa
In [100]:
print(first + second + third)
Howareyou
In [101]:
print(first + ' ' + second + ' ' + third)
How are you

Formatting¶

In order to combine strings and numbers, the ideal strategy (since Python v3.6) is to use f-strings: f"..."
They allow us to embed Python expressions in strings using curly braces {...}, most notably variable names.
A NameError will be raised if you try to use a variable that doesn't exist.

In [1]:
a = 2
b = 3
valid_string = f"The value of a is {a}"
valid_string_ab = f"The value of a is {a}, and of b is {b}"
print(valid_string_ab)
The value of a is 2, and of b is 3
In [2]:
valid_string_abc = f"The value of a is {a}, and of b is {b}, and of c is {c}"
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-2-76c831f56e7b> in <module>
----> 1 valid_string_abc = f"The value of a is {a}, and of b is {b}, and of c is {c}"

NameError: name 'c' is not defined

Exercise 4¶

Write out the result of your interest calculation using string formatting.
E.g.: "Starting with 100, with a 10% interest, after 7 years you will have 194.87".

In [8]:
initial_amount = 100
interest = 1.1
period = 7

final_amount = initial_amount * interest ** period
interest_percent = (interest - 1.0)*100

print(f"Starting with {initial_amount}, with a {interest_percent}% interest, after {period} years you will have {final_amount}")
Starting with 100, with a 10.000000000000009% interest, after 7 years you will have 194.87171000000012
In [9]:
print(f"Starting with {initial_amount}, with a {interest_percent:.2f}% interest, after {period} years you will have {final_amount}")
Starting with 100, with a 10.00% interest, after 7 years you will have 194.87171000000012

Functions¶

Functions are blocks of code that can be executed using a function call.
Functions are often created to group some code that has a common "function", for example "adding one side" as in the example below:

add_one_side

Definition¶

In Python, functions are defined using the def keyword followed by the function name.
The def line finishes with a colon :, which in Python introduces a code block, which is what will be executed when the function is called.
All lines in the code block must be indented.
The first non-indented line after the code block indicates the end of the code block. Easy, isn't it!?

In the example below, something else is NOT printed as part of the say_hello function:

In [121]:
def say_hello():
    print("hello")
    print("what's your name")
print("something else")
something else
In [122]:
say_hello()
hello
what's your name

Arguments and return¶

Functions can have arguments, whose names appear between the parentheses on the def line.

In [128]:
def say_hello(name):
    print(f"hello {name}")
In [131]:
say_hello("David")
say_hello("Marco")
say_hello("Arthur")
hello David
hello Marco
hello Arthur

Multiple arguments are separated by comma ,.

In [10]:
def say_hello(name, surname):
    print(f"hello {surname} {name}")
In [136]:
say_hello("David", "Bowman")
hello Bowman David

As well as receiving values via its arguments, a function can also return a value using the return keyword.

In [139]:
def square(n):
    value = n**2
    return value
In [152]:
square(2)
Out[152]:
4
In [142]:
square(2) + square(3)
Out[142]:
13

If no return statement is given, the function returns None. In Python, None means that the value is missing; note that None is not the same as 0, False, or an empty string "".

In [14]:
def no_return(n):
    value = n**2

print(no_return(2))
None

Exercise 5¶

Write a function calculate_interest(starting_amount) that, given a starting amount, returns the final amount after 5 years with an interest of 10%. Test this function on starting amounts 100 and 200.

In [8]:
def calculate_interest(starting_amount):
    interest = 1.1
    period = 5
    final_amount = starting_amount * interest ** period
    return final_amount
In [9]:
print(calculate_interest(100))
print(calculate_interest(200))
161.05100000000004
322.1020000000001

It happens often that several values are required for the function to do it's job (for example interest, period and starting_amount). These values are said to be hardcoded when they are defined within the function and can't be specified via the function call.
It is often useful to instead define them as function arguments so they can all be specified via the function call.

Exercise 5b¶

Write a function calculate_interest(starting_amount, interest, period) that, given a starting amount, an interest and a period, returns the final amount. Test this function on cases seen previously.

In [148]:
def calculate_interest(starting_amount, interest, period):
    final_amount = starting_amount * interest ** period
    return final_amount
In [151]:
calculate_interest(100, 1.1, 7)
Out[151]:
194.87171000000012

Lists¶

Lists are ordered sequences; they can be defined as comma-separated lists of values within square brackets [].
A list can contain any value (integers, floats, strings, even other lists): whatever it may contain, its type is always list.
A list can also be empty.

In [168]:
integers = [1, 2, 3]
integers
mixed = [1, 2, "three", 4, 3+2]
type(integers)
type(mixed)
Out[168]:
list
In [169]:
empty_list = []
empty_list
Out[169]:
[]

Indexing and slicing¶

Lists support indexing and slicing like strings.
The len() function returns the number of characters in a string, or of elements in a list.

In [175]:
print(integers)
integers[0]
integers[1:3]
integers[::2]
integers[::-1]
[1, 2, 3]
Out[175]:
[3, 2, 1]

Exercise 6¶

Create a list week containing the 7 days of the week. Then

  1. Retrieve the first 5 days of the week on one hand, and the days of the week-end on the other;
  2. Obtain the same result using negative indices;
  3. Find two ways to access the last day of the week;
  4. Reverse the order of the week days with one single command.
In [189]:
week = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
# question 1.
week[:5]  # get working days
week[5:]  # get weekend days
# question 2.
week[:-2]  # get working days
week[-2:]  # get weekend days
# question 3
week[-1]
week[6]
# question 4
week[::-1]
Out[189]:
['Sunday', 'Saturday', 'Friday', 'Thursday', 'Wednesday', 'Tuesday', 'Monday']

Operations¶

Lists support some operations. In particular:

  • + (concatenate lists)
  • * (repeat lists)

ATTENTION these are not the vector sums or products !

In [193]:
print([1, 2, 3] + [4, 5, 6])
print([1, 2] * 3)  #  NOT [3, 6]
[1, 2, 3, 4, 5, 6]
[1, 2, 1, 2, 1, 2]

Mutating lists¶

Lists are mutable, i.e. we can mutate (change) them:

In [198]:
integers[0] = 'one'
print(integers)
['one', 2, 3]
In [200]:
integers[1] = 'two'
print(integers)
['one', 'two', 3]

The append() method allows us to add an element to the end of the list:

In [203]:
integers = [1, 2, 3]
print(integers)
integers.append(4)
[1, 2, 3]
In [204]:
integers  # Note that the list has been modified
Out[204]:
[1, 2, 3, 4]
In [205]:
integers.append(5)  # Note that append returns nothing
In [206]:
print(integers)
[1, 2, 3, 4, 5]

Exercice 7¶

The list cubes is defined as follows:

In [1]:
cubes = [1, 8, 27, 66, 125]

but something's wrong...
Correct cubes, then add 3 more values.

In [2]:
cubes[3] = 64
cubes.append(6**3)
print(cubes)
cubes.append(7**3)
cubes.append(8**3)
cubes
[1, 8, 27, 64, 125, 216]
Out[2]:
[1, 8, 27, 64, 125, 216, 343, 512]
In [12]:
# Alternatively
cubes = cubes + [6**3, 7**3, 8**3]
cubes
Out[12]:
[1, 8, 27, 64, 125, 216, 343, 512, 216, 343, 512]

For loops¶

Syntax¶

Now that we've seen lists, it would be nice to be able to operate some action on each element of a list.
For loops serve that purpose, by allowing us to iterate over the elements of a list
They are defined using the for keyword.

In [3]:
for cube in cubes:
    print(cube)
1
8
27
64
125
216
343
512

The code block is executed once for each element of cubes.
Each execution of the code block has a different value of the iterator variable cube (without the s!).

Exercise 8¶

Using a for loop and the cubes list (which contains values of $x^3$), to print out the values of $x^3+1$ (i.e. 2, 9, 28, etc.).

In [4]:
for cube in cubes:
    print(cube + 1)
2
9
28
65
126
217
344
513

Iterating over integers¶

A typical use of for loops is to iterate over integers in a range.
To do so we can use the range() function, which returns integers in a given range.

In [230]:
for integer in range(4):
    print(integer)
0
1
2
3

The range() function can either be called with one, two or three arguments :

  • range(stop) between 0 and stop;
  • range(start, stop) between start and stop;
  • range(start, stop, step) between start and stop every step.
In [233]:
for i in range(3, 7):
    print(i)
3
4
5
6
In [232]:
for i in range(0, 6, 2):
    print(i)
0
2
4

Exercise 9¶

Using a for loop, create a list even_squares containing the squares of even numbers up to 20 (included).
(Hint: initialise the list to an empty list before the loop, then fill it using the .append() method).

In [236]:
even_squares = []
for even_integer in range(0, 21, 2):
    even_squares.append(even_integer ** 2)
In [237]:
print(even_squares)
[0, 4, 16, 36, 64, 100, 144, 196, 256, 324, 400]

Exercise 10¶

Using a for loop, draw a triangle like this (10 lines) :

*
**
***
****
*****
******
*******
********
*********
**********
In [245]:
for n_stars in range(1, 11):
    print('*' * n_stars)
*
**
***
****
*****
******
*******
********
*********
**********
In [244]:
# Also a valid solution:
for integer in range(10):
    n_stars = integer + 1
    print('*' * n_stars)
*
**
***
****
*****
******
*******
********
*********
**********

Iterating over lists with indices¶

Sometimes it is useful to keep track of the indices while iterating over a list.
To do so, we can use the function enumerate().
When using enumerate, we define two variables : the index as well as the iterator variable.
Both change value at each execution of the loop's code block.

In [5]:
print(cubes)
[1, 8, 27, 64, 125, 216, 343, 512]
In [6]:
for index, cube in enumerate(cubes):
    print(f"{cube} is the cube of {index+1}")
1 is the cube of 1
8 is the cube of 2
27 is the cube of 3
64 is the cube of 4
125 is the cube of 5
216 is the cube of 6
343 is the cube of 7
512 is the cube of 8

If clauses¶

It is of fundamental importance to be able to execute parts of code only when certain conditions are met.
To do so programming languages provide if clauses.
There are sometimes conditions of the execution environment that will change according to when the program is run, or that we cannot foresee when we write our program. It is by using if clauses that we define how our program will make decisions.

Comparisons¶

Before looking at if clauses we first need to understand how to compare things, in order to understand whether a condition of interest is met.
For example, let's compare two integers x and y:

In [13]:
x = 2
y = 3
print(x < y)
print(x > y)
True
False

The < comparison operator compares x and y and returns either True or False. These last two values are of type bool for boolean.

In [257]:
type(x < y)
Out[257]:
bool

There exist 6 comparison operators :

  • < smaller
  • > greater
  • <= smaller or equal
  • >= greater or equal
  • == equal
  • != not equal

NOTE that = is for assignment and == is for comparison !!!

In [ ]:
x = y  # This is an assignment !!!
In [15]:
x == y
Out[15]:
True

Syntax¶

If clauses are defined using the if keyword, followed by the condition and then :.
The indented lines following are the code block that is executed ONLY if the condition is True.
Try modifying x and y below to see what is printed.

In [8]:
x = 2
y = 3
if x < y:
    print(f"{x} is smaller than {y}")
2 is smaller than 3

Exercise 11¶

Write a function print_even(n) that given an integer as argument, prints its value only if it is an even number (a number is even when the remainder of its division by 2 is 0). Test the function on a few examples of your choice.

In [16]:
def print_even(n):
    if n % 2 == 0:
        print(n)
In [17]:
for integer in range(10):
    print_even(integer)
0
2
4
6
8

Multiple cases¶

In the previous example, it would be useful to print a message when the condition is False.
To do so, we can use the else keyword.
The else keyword, followed by :, introduces a second code block that is executed ONLY if the condition is False

In [3]:
x = 3
y = 2
if x < y:
    print(f"{x} is smaller than {y}")
else:
    print(f"{x} is larger than {y}")
3 is larger than 2

NOTE: The example above is slightly wrong: why ? Try x = 2 and y = 2...
When there are more than 2 cases, we can use the elif keyword, followed by another condition.
There can be as many elif as necessary, but only one else.

In [19]:
x = 2
y = 2
if x < y:
    print(f"{x} is smaller than {y}")
elif x == y:
    print(f"{x} is equal to {y}")
else:
    print(f"{x} is larger than {y}")
print("This is printed whatever the value of x and y")  # close code block
2 is equal to 2
This is printed whatever the value of x and y

Exercise 12¶

Write a function is_smaller(x, y) that given two integers x and y, returns True if x is smaller than y, and False otherwise. Test the function on a few examples of your choice.

In [19]:
def is_smaller(x, y):
    if x < y:
        return True
    else:  # The x == y needn't be treated separately (see exercise text)
        return False
In [284]:
is_smaller(9, 6)
Out[284]:
False

Multiple tests¶

Sometimes it is useful to test multiple conditions at the same time.
Let's suppose we have two integers x and y, and we want to check both that x < y and that x is even.
To combine the two conditions we can use boolean operators.
There are 3 boolean operators:

  • not
  • and
  • or

The not operator simply negates the value of its argument.

In [21]:
x = 2
y = 3
if x < y and x % 2 == 0:
    print("x smaller than y, and x even")
x smaller than y, and x even
In [4]:
x = 2
y = 1
if x < y and not x % 2 == 0:
    print("x smaller than y, and x odd")

The action of the and and or operators are better expressed using the following truth tables:

Exercise 13¶

Write a function is_triangle(x, y, z) that, given three values (x, y and z), returns True if they can form the sides of a triangle, and False otherwise.
Remember: the sum of the lengths of any two sides of a triangle is greater than the length of the third side.
Test the function on the following examples:

  • x=3, y=4, z=5
  • x=1, y=3, z=1
  • x=1, y=1, z=3
In [5]:
def is_triangle(x, y, z):
    return (x+y > z and x+z > y and y+z > x)

print(is_triangle(3, 4, 5))
print(is_triangle(1, 3, 1))
print(is_triangle(1, 1, 3))
True
False
False

Exercise 14¶

Write a function is_scalene(x, y, z) that, given the lengths of the three sides of a triangle (x, y and z), returns True if it is a scalene triangle, and False otherwise.
Remember: a scalene triangle has no equal sides.
Test the function on the following examples:

  • x=2, y=4, z=3
  • x=3, y=3, z=3
  • x=2, y=2, z=3
In [6]:
def is_scalene(x, y, z):
    return (is_triangle(x, y, z) and x != y and x != z and y != z)

print(is_scalene(3, 4, 5))
print(is_scalene(3, 3, 3))
print(is_scalene(2, 2, 3))
True
False
False

Extra Exercises¶

The following problems (in increasing order of difficulty) require all the concepts seen in this session.
Good luck!

Exercise 15¶

Write a function filter_even(numbers) that given a list of integers, returns another list containing only those elements that are even numbers (a number is even when the remainder of its division by 2 is 0).

Exercise 16¶

Write a function triangle_type(x, y, z) that, given the lengths of the three sides of a triangle (x, y and z), returns "Scalene", "Isosceles" or "Equilateral" according to the triangle type.
Remember: an isosceles triangle has 2 equal sides, while an equilateral triangle is a special case of isosceles triangle.

Exercise 17¶

The Collatz conjecture relates to a long-standing unsolved problem in mathematics. It concerns a sequence defined as follows : start with any positive integer $n$. To obtain the next term, proceed as follows:

  • if the previous term is even, divide it by 2;
  • if the previous term is odd, multiply it by 3 then add 1.

The conjecture is that no matter what value of $n$, the sequence will always reach 1 (sooner or later).

Write a function collatz(n, steps) that given a starting integer and a number of steps, returns the above defined sequence.
Use this function to answer the following questions:

  1. does the conjecture hold starting from 12? from 19? from 27?
  2. does the number of steps vary with starting value?
  3. what happens after the sequence has reached 1?

License and Acknowledgements¶

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Some exercise are adapted from https://python.sdv.univ-paris-diderot.fr/.