Data Structures in Python
Note: This post is based on my old programming study notes when I taught myself.
Lists
Python doesn’t have array like C/C++ but it has lists, array module, and NumPy arrays.
- Dynamic: Python list can be automatically resizble when an element is added or removed.
- Mutable: Element can be changed.
- Heterogeneous: A single Python list can hold elements of different types at the same time.
arr = [1, "hello", 3.14, [2, 3], {"a": 1}]
Lists Methods
insert()
: Add value at desired positionappend()
: Add element to listpop()
: Remove value at desired position (or last element)clear()
: Remove all valuessort()
: Sort values in orderreverse()
: Reverse ordercopy()
: Copy listcount()
: Count how many times a value appearsindex()
: Find position of a valuelen()
: Count number of elements in list
Array module
Syntax is array.array(typecode, iterable)
from array import array
arr = array.array('i', [1, 2, 3]) # 'i' = integer
array.array(typecode, iterable)
stores elements as raw C values (compact binary representation).- typecode is data type of array that’s represented by single characters
- i for signed integer, f for float, and b for signed characters. This parameter is mandatory.
NumPy Array
- Homogeneous: All elements must be the same data type
- Fixed Size: Cannot change size after creation
- Performance: Much faster for numerical operations than lists
Reference: NumPy Array Objects Documentation
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
Dictionary
{key: value}
- Useful for storing key:value pair data where order doesn’t matter
- Keys or values can be any data type
- When all keys are strings, it’s called JSON
- For example, when retrieving specific values from JSON data through API communication, you can use
response.content["text"]
to get the value of the “text” key from the API response’s content dictionary.
Example)
fruits={"apple":0.99, "banana": 2.50, "blueberry":3.00}
Accessing specific key values in Dictionary
fruits["apple"]
fruits["banana"]
fruits["blueberry"]
How to add new values to Dictionary
Case1: Use assignment operator
fruit_price["watermelon"]=5.00
print(fruit_price) # {"watermelon": 5.00}
Case2: update()
method
fruit_price.update({"cherry":4.00})
print(fruit_price)
# {"cherry": 4.00}
Like Lists, you can use len()
with Dictionaries as well.
fruits = {"banana": 2, "apple": 3, "kiwi": 4}
print(len(fruit_price)) # 3
Accessing keys that doesn’t exist
If you try to access a key in the dictionary that isn’t there, you’ll get a KeyError and your program will crash.
You have two options to avoid doing that. You can first check if a key exists inside a dictionary using the in
keyword.
if "kiwi" in fruit_prices:
print(fruit_prices["kiwi"])
else:
print("Didn't find a kiwi!")
Or you can use the in-built get()
function like this.
kiwi_price = fruit_price.get("kiwi")
print(kiwi_price) # None
And if the value doesn’t exist, it’ll just return None
.
Or if you prefer you specify a default value for it to return instead.
kiwi_price = fruit_price.get("kiwi", 0)
print(kiwi_price) # 0
Dictionary built-in methods
fruits.clear() # Removes all items from the dictionary
fruits.get("apple") # Gets the value of a key
fruits.items() # Returns a list of key-value pairs
fruits.keys() # Returns a list of all keys
fruits.pop("apple") # Removes an item from the dictionary
fruits.update({"apple": 3}) # Adds or updates existing elements
fruits.values() # Returns a list of all values
Serialization
- Turning into a format that almost everybody understands. That process is called “serialization”
- With dictionaries, it can serialize nicely into something called JSON, which stands for Javascript Object Notation
- Python has easy ways to convert to and from JSON data.
import json
fruit_prices = {"apple": 90}
json_fruit_prices = json.dumps(fruit_prices)
print(json_fruit_prices)
In the code above, json.dumps()
converts JSON data to string format. This way, we can easily convert JSON data received through HTTP communication over the internet for use in our code.
Tuples
- Immutable sequences of elements
- The difference is that, after a tuple is created, its elements cannot be changed. You can’t add, remove, or update elements in tuple.
- Tuples are faster and use less memory than lists, making them more efficient for large data sets. They can be used as keys in dictionaries, while lists cannot.
my_dict = {(1,2): "value"}
Creating a tuple
my_tuple = (1, 2, 3)
Access an element of tuple
# Access an element of tuple
print(my_tuple[0])
print(my_tuple[1])
print(my_tuple[2])
Unpacking or Spreading
# Instead of this...
my_tuple = (1, 2, 3)
x = my_tuple[0]
y = my_tuple[1]
z = my_tuple[2]
# We can do this!
x, y, z = (1, 2, 3)
print(x, y, z)
Sets
- Unordered collections of items, and are mutable
- Use
{}
but unlike Dictionary there is not key, element. Only individual element exists. - Sets are useful for membership testing, removing duplicates, and mathematical operations like union, intersection, difference, and symmetric difference.
Create an empty set
favourite_vegetables = set()
Set Operations
add()
- Add single elementupdate()
- Add multiple elementsremove()
- Remove element (raises error if not found)discard()
- Remove element (no error if not found)union()
- Combine setsintersection()
- Common elementsdifference()
- Elements in first set but not secondsymmetric_difference()
- Elements in either set but not both
# Set operations examples
set1 = {1, 2, 3, 4}
set2 = {3, 4, 5, 6}
print(set1.union(set2)) # {1, 2, 3, 4, 5, 6}
print(set1.intersection(set2)) # {3, 4}
print(set1.difference(set2)) # {1, 2}
print(set1.symmetric_difference(set2)) # {1, 2, 5, 6}
Technical Interview Essentials
Time Complexity Quick Reference
Lists:
# Time complexities you MUST know for interviews
lst = [1, 2, 3, 4, 5]
# Access by index: O(1)
item = lst[2]
# Search for value: O(n)
found = 3 in lst
# Append to end: O(1) amortized
lst.append(6)
# Insert at beginning: O(n) - shifts all elements!
lst.insert(0, 0)
# Delete by index: O(n) for middle, O(1) for end
lst.pop(2) # O(n) - shifts elements
lst.pop() # O(1) - removes from end
# Sort: O(n log n)
lst.sort()
Dictionaries:
# All operations are O(1) average case, O(n) worst case
my_dict = {"key": "value"}
# Access, insert, delete: O(1) average
value = my_dict["key"] # O(1)
my_dict["new_key"] = "new" # O(1)
del my_dict["key"] # O(1)
Sets:
# All operations are O(1) average case
my_set = {1, 2, 3}
# Add, remove, lookup: O(1) average
my_set.add(4) # O(1)
my_set.remove(1) # O(1)
exists = 2 in my_set # O(1)
Common Interview Patterns
1. Two Sum Problem (Hash Map Pattern)
def two_sum(nums, target):
"""
Given array of integers, return indices of two numbers that add up to target.
Time: O(n), Space: O(n)
"""
seen = {} # value -> index mapping
for i, num in enumerate(nums):
complement = target - num
if complement in seen:
return [seen[complement], i]
seen[num] = i
return []
# Test
print(two_sum([2, 7, 11, 15], 9)) # [0, 1]
2. Find Missing Number (XOR or Set Pattern)
def find_missing_number_set(nums):
"""
Find missing number from 0 to n using set.
Time: O(n), Space: O(n)
"""
n = len(nums)
expected = set(range(n + 1))
actual = set(nums)
return (expected - actual).pop()
def find_missing_number_math(nums):
"""
Find missing number using math formula.
Time: O(n), Space: O(1)
"""
n = len(nums)
expected_sum = n * (n + 1) // 2
actual_sum = sum(nums)
return expected_sum - actual_sum
# Test both approaches
nums = [0, 1, 3, 4, 5]
print(find_missing_number_set(nums)) # 2
print(find_missing_number_math(nums)) # 2
3. Group Anagrams (Dictionary + Sorting Pattern)
def group_anagrams(strs):
"""
Group strings that are anagrams of each other.
Time: O(n * m log m) where n = number of strings, m = average string length
"""
groups = {}
for s in strs:
# Sort characters to create key for anagrams
key = ''.join(sorted(s))
if key not in groups:
groups[key] = []
groups[key].append(s)
return list(groups.values())
# Test
strs = ["eat", "tea", "tan", "ate", "nat", "bat"]
print(group_anagrams(strs))
# [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]
4. Top K Frequent Elements (Dictionary + Heap Pattern)
from collections import Counter
import heapq
def top_k_frequent(nums, k):
"""
Find k most frequent elements.
Time: O(n log k), Space: O(n)
"""
# Count frequencies
count = Counter(nums)
# Use heap to find top k
return heapq.nlargest(k, count.keys(), key=count.get)
# Alternative using bucket sort - O(n) time
def top_k_frequent_bucket(nums, k):
"""
Find k most frequent elements using bucket sort.
Time: O(n), Space: O(n)
"""
count = Counter(nums)
# Create buckets for each possible frequency
buckets = [[] for _ in range(len(nums) + 1)]
# Put numbers in buckets based on frequency
for num, freq in count.items():
buckets[freq].append(num)
# Collect top k from highest frequency buckets
result = []
for i in range(len(buckets) - 1, -1, -1):
for num in buckets[i]:
result.append(num)
if len(result) == k:
return result
return result
# Test both approaches
nums = [1, 1, 1, 2, 2, 3]
print(top_k_frequent(nums, 2)) # [1, 2]
print(top_k_frequent_bucket(nums, 2)) # [1, 2]
Memory Optimization Tips for Interviews
1. Use Generators for Large Data
# Memory intensive - creates entire list
def squares_list(n):
return [i**2 for i in range(n)]
# Memory efficient - generates on demand
def squares_generator(n):
return (i**2 for i in range(n))
# Usage
large_squares = squares_generator(1000000) # Uses minimal memory
2. Choose Right Data Structure
# For frequent lookups: use dict or set (O(1)) instead of list (O(n))
# For membership testing with duplicates
items = [1, 2, 3, 1, 2, 3, 1, 2, 3]
# Slow - O(n) lookup each time
def slow_lookup(target):
return target in items # O(n)
# Fast - O(1) lookup after O(n) setup
item_set = set(items) # Convert once
def fast_lookup(target):
return target in item_set # O(1)
Common Interview Gotchas
1. Mutable Default Arguments
# WRONG - dangerous!
def append_to_list(item, target_list=[]):
target_list.append(item)
return target_list
# This will behave unexpectedly:
list1 = append_to_list(1) # [1]
list2 = append_to_list(2) # [1, 2] - Oops!
# CORRECT
def append_to_list(item, target_list=None):
if target_list is None:
target_list = []
target_list.append(item)
return target_list
2. Shallow vs Deep Copy
import copy
original = [[1, 2], [3, 4]]
# Shallow copy - nested lists are still shared!
shallow = original.copy() # or list(original) or original[:]
shallow[0].append(3)
print(original) # [[1, 2, 3], [3, 4]] - Original modified!
# Deep copy - completely independent
deep = copy.deepcopy(original)
deep[0].append(4)
print(original) # [[1, 2, 3], [3, 4]] - Original unchanged
3. Dictionary Iteration During Modification
# WRONG - RuntimeError: dictionary changed size during iteration
def remove_negative_values_wrong(d):
for key, value in d.items():
if value < 0:
del d[key] # Error!
# CORRECT - iterate over copy of keys
def remove_negative_values_correct(d):
for key in list(d.keys()): # Create copy of keys
if d[key] < 0:
del d[key]
# Or use dictionary comprehension (creates new dict)
def filter_positive_values(d):
return {k: v for k, v in d.items() if v >= 0}