for big Data Analysis and Visualisation
Session 1 : Python Basics¶
Welcome to the Python Programming for big Data Analysis and Visualisation course. In this notebook you will find the material covered during Session 1.
Exercises¶
Here are links to all exercises:
Python as a Calculator¶
Python number types are int
and float
.
Operations on numbers include:
- +, -, *, / (as in other languages)
- ** (power)
- // (integer division)
- % (modulo, or remainder of integer division)
Try some operations!
- Create a new cell by pressing the Insert cell (+) button above.
- Type your commands in the new cell.
- Execute the cell by pressing the Execute (triangle) button above or pressing Shift-Enter.
# Type your commands here
7 % 2
1
7 // 2
3
2 * (3 + 3)
12
Exercise 1¶
Suppose you have $100\$$, which you can invest with a 10\% interest each year. After one year, it's $1001.1=110$$, after two years $1001.1*1.1=121$$, etc.
Calculate how much money you end up with after 7 years.
(N.B.: to solve an exercise, add a cell below this one using the (+) button above, fill it with the solution, then execute it using the triangle button above)
100 * 1.1 * 1.1 * 1.1 * 1.1
146.41000000000008
100 * 1.1 ** 7
194.87171000000012
Types¶
The type
function allows us to determine the type.
Type names are also functions to perform type conversions.
type(2)
int
type(1.1)
float
int(1.1)
1
float(2)
2.0
Variables¶
Variables are assigned using the assignment operator =
.
Variable names must be as descriptive as possible.
Two conventions for variable naming are common :
- snake_case (words are separated by underscore)
- CamelCase (beginning of words are marked by uppercase letters)
variable = 2
variable
2
a_number = 100
another_number = 101
yet_another_number = a_number
yet_another_number
100
Exercise 2¶
Redo the previous exercise by assigning values to correctly named variables and using them in the calculation, and assigning the result to the variable final_amount
.
initial_amount = 200
interest = 0.1
period = 7
# 100 * 1.1 ** 7
final_amount = initial_amount * (interest + 1.0) ** period
An alternative solution :
initial_amount = 200
interest = 1.1
period = 7
final_amount = initial_amount * interest ** period
Or in CamelCase :)
initialAmount = 200
interest = 1.1
period = 7
finalAmount = initialAmount * interest ** period
When writing code, try to keep in mind:
K.I.S.S. = Keep It Simple, Stupid
Write code as clear and as simple as possible.
Print¶
The output of a Notebook cell is the value of the last expression evaluated.
The print()
function outputs its argument.
final_amount
389.74342000000024
print(interest)
print(period)
print(final_amount)
1.1 7 389.74342000000024
Note: behind the scenes, the display()
function is used to display the cell output.
Comments¶
In Python, everything following a #
character on the same line is considered a comment, and therefore not interpreted by Python:
# This is a comment
interest = 0.1 # 1 = 100%
# interest above is 100% when 1
# interest = 0.2
interest
0.1
Errors¶
Whenever the Python interpreter fails to execute the code in the cell, it will output an error.
When Python doesn't understand what you have written it it will output a Syntax Error.
print(final_amount # ops, forgot )
File "<ipython-input-47-165c6b5c7c82>", line 1 print(final_amount ^ SyntaxError: unexpected EOF while parsing
Errors indicate where Python believes the error is located, but please bare in mind that Python's ability to understand this is limited.
Another type of error is a Name Error, typically when you have mis-spelled a variable name.
print(final_amout) # ops, I missed letter "n"
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-50-524dfac17014> in <module> ----> 1 print(final_amout) NameError: name 'final_amout' is not defined
Strings¶
Strings are sequences of characters.
They are expressed using quotes (either single '
or double "
, there is no difference).
The type of strings is str
; as before, str()
is also a function to perform type conversion.
word = "Monty Python"
type(word)
str
str(4)
'4'
int('1')
1
float('1.3321')
1.3321
int('four')
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-1-2f24743179af> in <module> ----> 1 int('four') ValueError: invalid literal for int() with base 10: 'four'
Indexing and slicing¶
We can access elements of a string by using indexing (positive and negative indexes) as shown below.
The len()
function returns the number of characters in a string.
We can access multiple elements of a string by using slicing, using the following syntax:
start:stop:step
If any of these three values are omitted, Python sets starts at 0 and stops at the end with a step of 1.
word[7]
word[-1]
'n'
len(word)
12
word[6:10]
word[-12:-7]
'Monty'
word[6:10:2]
'Pt'
word[6:]
word[:5]
word[::]
'Monty Python'
word[::-1]
'nohtyP ytnoM'
Exercise 3¶
Given the variable greetings defined below, extract each word and assign it to variables first, second and third.
greetings = "How are you"
first = greetings[0:3]
second = greetings[4:7]
third = greetings[8:]
Operations¶
Strings are immutable: that means that we cannot modify strings.
word[0] = 'J'
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-6-91a956888ca7> in <module> ----> 1 word[0] = 'J' TypeError: 'str' object does not support item assignment
Strings support some operations. In particular
+
(concatenate strings)*
(repeat strings)
print('J' + word[1:])
print('a' + 'b' + 'c')
print('a' * 5)
Jonty Python abc aaaaa
print(first + second + third)
Howareyou
print(first + ' ' + second + ' ' + third)
How are you
Formatting¶
In order to combine strings and numbers, the ideal strategy (since Python v3.6) is to use f-strings: f"..."
They allow us to embed Python expressions in strings using curly braces {...}
, most notably variable names.
A NameError will be raised if you try to use a variable that doesn't exist.
a = 2
b = 3
valid_string = f"The value of a is {a}"
valid_string_ab = f"The value of a is {a}, and of b is {b}"
print(valid_string_ab)
The value of a is 2, and of b is 3
valid_string_abc = f"The value of a is {a}, and of b is {b}, and of c is {c}"
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-2-76c831f56e7b> in <module> ----> 1 valid_string_abc = f"The value of a is {a}, and of b is {b}, and of c is {c}" NameError: name 'c' is not defined
Exercise 4¶
Write out the result of your interest calculation using string formatting.
E.g.: "Starting with 100, with a 10% interest, after 7 years you will have 194.87".
initial_amount = 100
interest = 1.1
period = 7
final_amount = initial_amount * interest ** period
interest_percent = (interest - 1.0)*100
print(f"Starting with {initial_amount}, with a {interest_percent}% interest, after {period} years you will have {final_amount}")
Starting with 100, with a 10.000000000000009% interest, after 7 years you will have 194.87171000000012
print(f"Starting with {initial_amount}, with a {interest_percent:.2f}% interest, after {period} years you will have {final_amount}")
Starting with 100, with a 10.00% interest, after 7 years you will have 194.87171000000012
Functions¶
Functions are blocks of code that can be executed using a function call.
Functions are often created to group some code that has a common "function", for example "adding one side" as in the example below:
Definition¶
In Python, functions are defined using the def
keyword followed by the function name.
The def
line finishes with a colon :
, which in Python introduces a code block, which is what will be executed when the function is called.
All lines in the code block must be indented.
The first non-indented line after the code block indicates the end of the code block. Easy, isn't it!?
In the example below, something else is NOT printed as part of the say_hello
function:
def say_hello():
print("hello")
print("what's your name")
print("something else")
something else
say_hello()
hello what's your name
Arguments and return¶
Functions can have arguments, whose names appear between the parentheses on the def
line.
def say_hello(name):
print(f"hello {name}")
say_hello("David")
say_hello("Marco")
say_hello("Arthur")
hello David hello Marco hello Arthur
Multiple arguments are separated by comma ,
.
def say_hello(name, surname):
print(f"hello {surname} {name}")
say_hello("David", "Bowman")
hello Bowman David
As well as receiving values via its arguments, a function can also return a value using the return
keyword.
def square(n):
value = n**2
return value
square(2)
4
square(2) + square(3)
13
If no return
statement is given, the function returns None
. In Python, None
means that the value is missing; note that None
is not the same as 0
, False
, or an empty string ""
.
def no_return(n):
value = n**2
print(no_return(2))
None
Exercise 5¶
Write a function calculate_interest(starting_amount) that, given a starting amount, returns the final amount after 5 years with an interest of 10%. Test this function on starting amounts 100 and 200.
def calculate_interest(starting_amount):
interest = 1.1
period = 5
final_amount = starting_amount * interest ** period
return final_amount
print(calculate_interest(100))
print(calculate_interest(200))
161.05100000000004 322.1020000000001
It happens often that several values are required for the function to do it's job (for example interest
, period
and starting_amount
).
These values are said to be hardcoded when they are defined within the function and can't be specified via the function call.
It is often useful to instead define them as function arguments so they can all be specified via the function call.
Exercise 5b¶
Write a function calculate_interest(starting_amount, interest, period) that, given a starting amount, an interest and a period, returns the final amount. Test this function on cases seen previously.
def calculate_interest(starting_amount, interest, period):
final_amount = starting_amount * interest ** period
return final_amount
calculate_interest(100, 1.1, 7)
194.87171000000012
Lists¶
Lists are ordered sequences; they can be defined as comma-separated lists of values within square brackets []
.
A list can contain any value (integers, floats, strings, even other lists): whatever it may contain, its type
is always list
.
A list can also be empty.
integers = [1, 2, 3]
integers
mixed = [1, 2, "three", 4, 3+2]
type(integers)
type(mixed)
list
empty_list = []
empty_list
[]
Indexing and slicing¶
Lists support indexing and slicing like strings.
The len()
function returns the number of characters in a string, or of elements in a list.
print(integers)
integers[0]
integers[1:3]
integers[::2]
integers[::-1]
[1, 2, 3]
[3, 2, 1]
Exercise 6¶
Create a list week
containing the 7 days of the week. Then
- Retrieve the first 5 days of the week on one hand, and the days of the week-end on the other;
- Obtain the same result using negative indices;
- Find two ways to access the last day of the week;
- Reverse the order of the week days with one single command.
week = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
# question 1.
week[:5] # get working days
week[5:] # get weekend days
# question 2.
week[:-2] # get working days
week[-2:] # get weekend days
# question 3
week[-1]
week[6]
# question 4
week[::-1]
['Sunday', 'Saturday', 'Friday', 'Thursday', 'Wednesday', 'Tuesday', 'Monday']
Operations¶
Lists support some operations. In particular:
+
(concatenate lists)*
(repeat lists)
ATTENTION these are not the vector sums or products !
print([1, 2, 3] + [4, 5, 6])
print([1, 2] * 3) # NOT [3, 6]
[1, 2, 3, 4, 5, 6] [1, 2, 1, 2, 1, 2]
Mutating lists¶
Lists are mutable, i.e. we can mutate (change) them:
integers[0] = 'one'
print(integers)
['one', 2, 3]
integers[1] = 'two'
print(integers)
['one', 'two', 3]
The append()
method allows us to add an element to the end of the list:
integers = [1, 2, 3]
print(integers)
integers.append(4)
[1, 2, 3]
integers # Note that the list has been modified
[1, 2, 3, 4]
integers.append(5) # Note that append returns nothing
print(integers)
[1, 2, 3, 4, 5]
Exercice 7¶
The list cubes
is defined as follows:
cubes = [1, 8, 27, 66, 125]
but something's wrong...
Correct cubes, then add 3 more values.
cubes[3] = 64
cubes.append(6**3)
print(cubes)
cubes.append(7**3)
cubes.append(8**3)
cubes
[1, 8, 27, 64, 125, 216]
[1, 8, 27, 64, 125, 216, 343, 512]
# Alternatively
cubes = cubes + [6**3, 7**3, 8**3]
cubes
[1, 8, 27, 64, 125, 216, 343, 512, 216, 343, 512]
for cube in cubes:
print(cube)
1 8 27 64 125 216 343 512
The code block is executed once for each element of cubes
.
Each execution of the code block has a different value of the iterator variable cube
(without the s!).
Exercise 8¶
Using a for loop and the cubes
list (which contains values of $x^3$),
to print out the values of $x^3+1$ (i.e. 2, 9, 28, etc.).
for cube in cubes:
print(cube + 1)
2 9 28 65 126 217 344 513
Iterating over integers¶
A typical use of for loops is to iterate over integers in a range.
To do so we can use the range()
function, which returns integers in a given range.
for integer in range(4):
print(integer)
0 1 2 3
The range()
function can either be called with one, two or three arguments :
range(stop)
between0
andstop
;range(start, stop)
betweenstart
andstop
;range(start, stop, step)
betweenstart
andstop
everystep
.
for i in range(3, 7):
print(i)
3 4 5 6
for i in range(0, 6, 2):
print(i)
0 2 4
Exercise 9¶
Using a for loop, create a list even_squares
containing the
squares of even numbers up to 20 (included).
(Hint: initialise the list to an empty list before the loop, then fill it using the .append()
method).
even_squares = []
for even_integer in range(0, 21, 2):
even_squares.append(even_integer ** 2)
print(even_squares)
[0, 4, 16, 36, 64, 100, 144, 196, 256, 324, 400]
Exercise 10¶
Using a for loop, draw a triangle like this (10 lines) :
*
**
***
****
*****
******
*******
********
*********
**********
for n_stars in range(1, 11):
print('*' * n_stars)
* ** *** **** ***** ****** ******* ******** ********* **********
# Also a valid solution:
for integer in range(10):
n_stars = integer + 1
print('*' * n_stars)
* ** *** **** ***** ****** ******* ******** ********* **********
Iterating over lists with indices¶
Sometimes it is useful to keep track of the indices while iterating over a list.
To do so, we can use the function enumerate()
.
When using enumerate
, we define two variables : the index as well as the iterator variable.
Both change value at each execution of the loop's code block.
print(cubes)
[1, 8, 27, 64, 125, 216, 343, 512]
for index, cube in enumerate(cubes):
print(f"{cube} is the cube of {index+1}")
1 is the cube of 1 8 is the cube of 2 27 is the cube of 3 64 is the cube of 4 125 is the cube of 5 216 is the cube of 6 343 is the cube of 7 512 is the cube of 8
If clauses¶
It is of fundamental importance to be able to execute parts of code only when certain conditions are met.
To do so programming languages provide if clauses.
There are sometimes conditions of the execution environment that will change according to when the program is run, or that we cannot foresee when we write our program. It is by using if clauses that we define how our program will make decisions.
Comparisons¶
Before looking at if clauses we first need to understand how to compare things, in order to understand whether a condition of interest is met.
For example, let's compare two integers x
and y
:
x = 2
y = 3
print(x < y)
print(x > y)
True False
The <
comparison operator compares x
and y
and returns either True
or False
. These last two values are of type bool
for boolean.
type(x < y)
bool
There exist 6 comparison operators :
<
smaller>
greater<=
smaller or equal>=
greater or equal==
equal!=
not equal
NOTE that =
is for assignment and ==
is for comparison !!!
x = y # This is an assignment !!!
x == y
True
Syntax¶
If clauses are defined using the if
keyword, followed by the condition and then :
.
The indented lines following are the code block that is executed ONLY if the condition is True
.
Try modifying x
and y
below to see what is printed.
x = 2
y = 3
if x < y:
print(f"{x} is smaller than {y}")
2 is smaller than 3
Exercise 11¶
Write a function print_even(n)
that given an integer as argument, prints its value
only if it is an even number (a number is even when the remainder of
its division by 2 is 0). Test the function on a few examples of your choice.
def print_even(n):
if n % 2 == 0:
print(n)
for integer in range(10):
print_even(integer)
0 2 4 6 8
Multiple cases¶
In the previous example, it would be useful to print a message when the condition is False
.
To do so, we can use the else
keyword.
The else
keyword, followed by :
, introduces a second code block that is executed ONLY if the condition is False
x = 3
y = 2
if x < y:
print(f"{x} is smaller than {y}")
else:
print(f"{x} is larger than {y}")
3 is larger than 2
NOTE: The example above is slightly wrong: why ? Try x = 2
and y = 2
...
When there are more than 2 cases, we can use the elif
keyword, followed by another condition.
There can be as many elif
as necessary, but only one else
.
x = 2
y = 2
if x < y:
print(f"{x} is smaller than {y}")
elif x == y:
print(f"{x} is equal to {y}")
else:
print(f"{x} is larger than {y}")
print("This is printed whatever the value of x and y") # close code block
2 is equal to 2 This is printed whatever the value of x and y
Exercise 12¶
Write a function is_smaller(x, y)
that given two integers x
and y
,
returns True
if x
is smaller than y
, and False
otherwise. Test the
function on a few examples of your choice.
def is_smaller(x, y):
if x < y:
return True
else: # The x == y needn't be treated separately (see exercise text)
return False
is_smaller(9, 6)
False
Multiple tests¶
Sometimes it is useful to test multiple conditions at the same time.
Let's suppose we have two integers x
and y
, and we want to check both that x < y
and that x
is even.
To combine the two conditions we can use boolean operators.
There are 3 boolean operators:
not
and
or
The not
operator simply negates the value of its argument.
x = 2
y = 3
if x < y and x % 2 == 0:
print("x smaller than y, and x even")
x smaller than y, and x even
x = 2
y = 1
if x < y and not x % 2 == 0:
print("x smaller than y, and x odd")
The action of the and
and or
operators are better expressed using the following truth tables:
Exercise 13¶
Write a function is_triangle(x, y, z)
that, given three values (x
, y
and
z
), returns True
if they can form the sides of a triangle, and False
otherwise.
Remember: the sum of the lengths of any two sides of a triangle is greater than
the length of the third side.
Test the function on the following examples:
x=3
,y=4
,z=5
x=1
,y=3
,z=1
x=1
,y=1
,z=3
def is_triangle(x, y, z):
return (x+y > z and x+z > y and y+z > x)
print(is_triangle(3, 4, 5))
print(is_triangle(1, 3, 1))
print(is_triangle(1, 1, 3))
True False False
Exercise 14¶
Write a function is_scalene(x, y, z)
that, given the lengths of the
three sides of a triangle (x
, y
and z
), returns True
if it is a
scalene triangle, and False
otherwise.
Remember: a scalene triangle has no equal sides.
Test the function on the following examples:
x=2
,y=4
,z=3
x=3
,y=3
,z=3
x=2
,y=2
,z=3
def is_scalene(x, y, z):
return (is_triangle(x, y, z) and x != y and x != z and y != z)
print(is_scalene(3, 4, 5))
print(is_scalene(3, 3, 3))
print(is_scalene(2, 2, 3))
True False False
Extra Exercises¶
The following problems (in increasing order of difficulty) require all the concepts seen in this session.
Good luck!
Exercise 15¶
Write a function filter_even(numbers)
that given a list of integers, returns
another list containing only those elements that are even numbers (a
number is even when the remainder of its division by 2 is 0).
Exercise 16¶
Write a function triangle_type(x, y, z)
that, given the lengths of the
three sides of a triangle (x
, y
and z
), returns "Scalene",
"Isosceles" or "Equilateral" according to the triangle type.
Remember: an isosceles triangle has 2 equal sides, while an equilateral triangle is a special case of
isosceles triangle.
Exercise 17¶
The Collatz conjecture relates to a long-standing unsolved problem in mathematics. It concerns a sequence defined as follows : start with any positive integer $n$. To obtain the next term, proceed as follows:
- if the previous term is even, divide it by 2;
- if the previous term is odd, multiply it by 3 then add 1.
The conjecture is that no matter what value of $n$, the sequence will always reach 1 (sooner or later).
Write a function collatz(n, steps)
that given a starting integer and a number of steps,
returns the above defined sequence.
Use this function to answer the following questions:
- does the conjecture hold starting from 12? from 19? from 27?
- does the number of steps vary with starting value?
- what happens after the sequence has reached 1?
License and Acknowledgements¶
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Some exercise are adapted from https://python.sdv.univ-paris-diderot.fr/.