Data Structures in Python

· Seokhyeon Byun

Note: This post is based on my old programming study notes when I taught myself.

Lists

Python doesn’t have array like C/C++ but it has lists, array module, and NumPy arrays.

  • Dynamic: Python list can be automatically resizble when an element is added or removed.
  • Mutable: Element can be changed.
  • Heterogeneous: A single Python list can hold elements of different types at the same time.
arr = [1, "hello", 3.14, [2, 3], {"a": 1}]

Lists Methods

  • insert(): Add value at desired position
  • append(): Add element to list
  • pop(): Remove value at desired position (or last element)
  • clear(): Remove all values
  • sort(): Sort values in order
  • reverse(): Reverse order
  • copy(): Copy list
  • count(): Count how many times a value appears
  • index(): Find position of a value
  • len(): Count number of elements in list

Array module

Syntax is array.array(typecode, iterable)

from array import array
arr = array.array('i', [1, 2, 3])  # 'i' = integer
  • array.array(typecode, iterable) stores elements as raw C values (compact binary representation).
  • typecode is data type of array that’s represented by single characters
  • i for signed integer, f for float, and b for signed characters. This parameter is mandatory.

NumPy Array

  • Homogeneous: All elements must be the same data type
  • Fixed Size: Cannot change size after creation
  • Performance: Much faster for numerical operations than lists

Reference: NumPy Array Objects Documentation

import numpy as np
arr = np.array([1, 2, 3, 4, 5])

Dictionary

  • {key: value}
  • Useful for storing key:value pair data where order doesn’t matter
  • Keys or values can be any data type
  • When all keys are strings, it’s called JSON
  • For example, when retrieving specific values from JSON data through API communication, you can use response.content["text"] to get the value of the “text” key from the API response’s content dictionary.

Example)

fruits={"apple":0.99, "banana": 2.50, "blueberry":3.00}

Accessing specific key values in Dictionary

fruits["apple"]
fruits["banana"]
fruits["blueberry"]

How to add new values to Dictionary

Case1: Use assignment operator

fruit_price["watermelon"]=5.00
print(fruit_price) # {"watermelon": 5.00}

Case2: update() method

fruit_price.update({"cherry":4.00})
print(fruit_price)
# {"cherry": 4.00}

Like Lists, you can use len() with Dictionaries as well.

fruits = {"banana": 2, "apple": 3, "kiwi": 4}
print(len(fruit_price))  # 3

Accessing keys that doesn’t exist

If you try to access a key in the dictionary that isn’t there, you’ll get a KeyError and your program will crash.

You have two options to avoid doing that. You can first check if a key exists inside a dictionary using the in keyword.

if "kiwi" in fruit_prices:
	print(fruit_prices["kiwi"])
else:
	print("Didn't find a kiwi!")

Or you can use the in-built get() function like this.

kiwi_price = fruit_price.get("kiwi")
print(kiwi_price)  # None

And if the value doesn’t exist, it’ll just return None.

Or if you prefer you specify a default value for it to return instead.

kiwi_price = fruit_price.get("kiwi", 0)
print(kiwi_price)  # 0

Dictionary built-in methods

fruits.clear()  # Removes all items from the dictionary
fruits.get("apple")  # Gets the value of a key
fruits.items()  # Returns a list of key-value pairs
fruits.keys()  # Returns a list of all keys
fruits.pop("apple")  # Removes an item from the dictionary
fruits.update({"apple": 3})  # Adds or updates existing elements
fruits.values()  # Returns a list of all values

Serialization

  • Turning into a format that almost everybody understands. That process is called “serialization”
  • With dictionaries, it can serialize nicely into something called JSON, which stands for Javascript Object Notation
  • Python has easy ways to convert to and from JSON data.
import json

fruit_prices = {"apple": 90}
json_fruit_prices = json.dumps(fruit_prices)
print(json_fruit_prices)

In the code above, json.dumps() converts JSON data to string format. This way, we can easily convert JSON data received through HTTP communication over the internet for use in our code.


Tuples

  • Immutable sequences of elements
  • The difference is that, after a tuple is created, its elements cannot be changed. You can’t add, remove, or update elements in tuple.
  • Tuples are faster and use less memory than lists, making them more efficient for large data sets. They can be used as keys in dictionaries, while lists cannot.
my_dict = {(1,2): "value"}

Creating a tuple

my_tuple = (1, 2, 3)

Access an element of tuple

# Access an element of tuple
print(my_tuple[0])
print(my_tuple[1])
print(my_tuple[2])

Unpacking or Spreading

# Instead of this...
my_tuple = (1, 2, 3)
x = my_tuple[0]
y = my_tuple[1]
z = my_tuple[2]

 # We can do this!
x, y, z = (1, 2, 3)
print(x, y, z)

Sets

  • Unordered collections of items, and are mutable
  • Use {} but unlike Dictionary there is not key, element. Only individual element exists.
  • Sets are useful for membership testing, removing duplicates, and mathematical operations like union, intersection, difference, and symmetric difference.

Create an empty set

favourite_vegetables = set()

Set Operations

  • add() - Add single element
  • update() - Add multiple elements
  • remove() - Remove element (raises error if not found)
  • discard() - Remove element (no error if not found)
  • union() - Combine sets
  • intersection() - Common elements
  • difference() - Elements in first set but not second
  • symmetric_difference() - Elements in either set but not both
# Set operations examples
set1 = {1, 2, 3, 4}
set2 = {3, 4, 5, 6}

print(set1.union(set2))           # {1, 2, 3, 4, 5, 6}
print(set1.intersection(set2))    # {3, 4}
print(set1.difference(set2))      # {1, 2}
print(set1.symmetric_difference(set2))  # {1, 2, 5, 6}

Technical Interview Essentials

Time Complexity Quick Reference

Lists:

# Time complexities you MUST know for interviews
lst = [1, 2, 3, 4, 5]

# Access by index: O(1)
item = lst[2]

# Search for value: O(n)
found = 3 in lst

# Append to end: O(1) amortized
lst.append(6)

# Insert at beginning: O(n) - shifts all elements!
lst.insert(0, 0)

# Delete by index: O(n) for middle, O(1) for end
lst.pop(2)    # O(n) - shifts elements
lst.pop()     # O(1) - removes from end

# Sort: O(n log n)
lst.sort()

Dictionaries:

# All operations are O(1) average case, O(n) worst case
my_dict = {"key": "value"}

# Access, insert, delete: O(1) average
value = my_dict["key"]     # O(1)
my_dict["new_key"] = "new" # O(1)
del my_dict["key"]         # O(1)

Sets:

# All operations are O(1) average case
my_set = {1, 2, 3}

# Add, remove, lookup: O(1) average
my_set.add(4)      # O(1)
my_set.remove(1)   # O(1)
exists = 2 in my_set  # O(1)

Common Interview Patterns

1. Two Sum Problem (Hash Map Pattern)

def two_sum(nums, target):
    """
    Given array of integers, return indices of two numbers that add up to target.
    Time: O(n), Space: O(n)
    """
    seen = {}  # value -> index mapping

    for i, num in enumerate(nums):
        complement = target - num
        if complement in seen:
            return [seen[complement], i]
        seen[num] = i

    return []

# Test
print(two_sum([2, 7, 11, 15], 9))  # [0, 1]

2. Find Missing Number (XOR or Set Pattern)

def find_missing_number_set(nums):
    """
    Find missing number from 0 to n using set.
    Time: O(n), Space: O(n)
    """
    n = len(nums)
    expected = set(range(n + 1))
    actual = set(nums)
    return (expected - actual).pop()

def find_missing_number_math(nums):
    """
    Find missing number using math formula.
    Time: O(n), Space: O(1)
    """
    n = len(nums)
    expected_sum = n * (n + 1) // 2
    actual_sum = sum(nums)
    return expected_sum - actual_sum

# Test both approaches
nums = [0, 1, 3, 4, 5]
print(find_missing_number_set(nums))   # 2
print(find_missing_number_math(nums))  # 2

3. Group Anagrams (Dictionary + Sorting Pattern)

def group_anagrams(strs):
    """
    Group strings that are anagrams of each other.
    Time: O(n * m log m) where n = number of strings, m = average string length
    """
    groups = {}

    for s in strs:
        # Sort characters to create key for anagrams
        key = ''.join(sorted(s))

        if key not in groups:
            groups[key] = []
        groups[key].append(s)

    return list(groups.values())

# Test
strs = ["eat", "tea", "tan", "ate", "nat", "bat"]
print(group_anagrams(strs))
# [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]

4. Top K Frequent Elements (Dictionary + Heap Pattern)

from collections import Counter
import heapq

def top_k_frequent(nums, k):
    """
    Find k most frequent elements.
    Time: O(n log k), Space: O(n)
    """
    # Count frequencies
    count = Counter(nums)

    # Use heap to find top k
    return heapq.nlargest(k, count.keys(), key=count.get)

# Alternative using bucket sort - O(n) time
def top_k_frequent_bucket(nums, k):
    """
    Find k most frequent elements using bucket sort.
    Time: O(n), Space: O(n)
    """
    count = Counter(nums)

    # Create buckets for each possible frequency
    buckets = [[] for _ in range(len(nums) + 1)]

    # Put numbers in buckets based on frequency
    for num, freq in count.items():
        buckets[freq].append(num)

    # Collect top k from highest frequency buckets
    result = []
    for i in range(len(buckets) - 1, -1, -1):
        for num in buckets[i]:
            result.append(num)
            if len(result) == k:
                return result

    return result

# Test both approaches
nums = [1, 1, 1, 2, 2, 3]
print(top_k_frequent(nums, 2))        # [1, 2]
print(top_k_frequent_bucket(nums, 2)) # [1, 2]

Memory Optimization Tips for Interviews

1. Use Generators for Large Data

# Memory intensive - creates entire list
def squares_list(n):
    return [i**2 for i in range(n)]

# Memory efficient - generates on demand
def squares_generator(n):
    return (i**2 for i in range(n))

# Usage
large_squares = squares_generator(1000000)  # Uses minimal memory

2. Choose Right Data Structure

# For frequent lookups: use dict or set (O(1)) instead of list (O(n))
# For membership testing with duplicates
items = [1, 2, 3, 1, 2, 3, 1, 2, 3]

# Slow - O(n) lookup each time
def slow_lookup(target):
    return target in items  # O(n)

# Fast - O(1) lookup after O(n) setup
item_set = set(items)  # Convert once
def fast_lookup(target):
    return target in item_set  # O(1)

Common Interview Gotchas

1. Mutable Default Arguments

# WRONG - dangerous!
def append_to_list(item, target_list=[]):
    target_list.append(item)
    return target_list

# This will behave unexpectedly:
list1 = append_to_list(1)      # [1]
list2 = append_to_list(2)      # [1, 2] - Oops!

# CORRECT
def append_to_list(item, target_list=None):
    if target_list is None:
        target_list = []
    target_list.append(item)
    return target_list

2. Shallow vs Deep Copy

import copy

original = [[1, 2], [3, 4]]

# Shallow copy - nested lists are still shared!
shallow = original.copy()  # or list(original) or original[:]
shallow[0].append(3)
print(original)  # [[1, 2, 3], [3, 4]] - Original modified!

# Deep copy - completely independent
deep = copy.deepcopy(original)
deep[0].append(4)
print(original)  # [[1, 2, 3], [3, 4]] - Original unchanged

3. Dictionary Iteration During Modification

# WRONG - RuntimeError: dictionary changed size during iteration
def remove_negative_values_wrong(d):
    for key, value in d.items():
        if value < 0:
            del d[key]  # Error!

# CORRECT - iterate over copy of keys
def remove_negative_values_correct(d):
    for key in list(d.keys()):  # Create copy of keys
        if d[key] < 0:
            del d[key]

# Or use dictionary comprehension (creates new dict)
def filter_positive_values(d):
    return {k: v for k, v in d.items() if v >= 0}