Data structures for coding interview

Rayane de Araújo
11 min readJun 16, 2019

Data structures is one of the base knowledge that Big companies like Amazon, Facebook and Microsoft usually require their candidates to prove they have. If you are preparing for a coding interview in any of those companies, I strongly encourage you to read this post carefully.

If you are not preparing for an interview but you want to understand when you should use each one of those data structures and the performance consequences of it, you are reading the right post as well.

This post focuses on explaining how the main existing kinds of data structures work, their usage, what are their strengths and weaknesses and how they differ from one another. I will also explain the time complexity for the basic operations of each one of them. Hopefully, by the end of this post, you will be able to choose the right data structure for each situation you may face and you will also be prepared to crack the data structure interview!

Linked Lists

Linked List is a simple data structure that stores a sequence of elements where each element is linked to its neighbor as a chain. Linked Lists can contain pretty much all kinds of data types, they can be sorted or unsorted, they can have duplicates or not.

There are three types of Linked Lists:

  • Singly Linked List: each node has a pointer to the next node.
Singly liked list
  • Doubly Linked List: each node has a pointer to the next and to the previous node.
Doubly liked list
  • Circular Linked List: the last node points to the previous one.
Circular liked list

Liked List vs Array

Liked lists share some properties with Arrays, however the main difference between them is that liked lists are not based on indexing as Arrays are. In other words, liked lists are only accessible by the head node. If you want to get the fourth element of a liked list, you will have to traverse all elements until you get to the fourth one, what takes a linear time complexity and makes it slower compared to an Array where you could traverse the element in constant time by accessing its index like: Array[3].

Linked List element retrieval

Time complexity

  • Insert/delete from the head (beginning) is O(1)
  • Access/search in average is O(n)

Advantages

  • Linked Lists are relatively memory efficient, since they only require memory to hold each item and the reference(s) to the neighbor(s).
  • Very simple to create and maintain.

Disadvantages

  • Searching for a specific item in a Linked List is expensive, since you may have to end up retrieving all elements until you find the one you are searching for, having a time complexity of O(n).
  • Because it takes linear time to look up for an item, the longer the list, the longer it will take to find the element.
  • Sorting elements may also be very slow. For sorting a Linked List, you will have to compare an element against all the others and then place it in the right position. After that, you will have to do the same for the other elements and you may end up having to traverse the list many many times.

Usage

  • Lists can be used to implement Stacks, Queues and Hash Tables.
  • The OS uses a circular Linked List to manage tasks. All the running applications are kept in a circular Linked List and the OS gives a fixed time slot to all for running. The Operating System keeps on iterating over the Linked List until all the applications are completed.
  • Linked Lists are a good option when memory usage is a concern and searching and sorting are not necessary.
  • They are also usable when the data is conceptually connected linearly.

Arrays

An Array is an aggregate data structure that is designed to store a group of objects of the same or different types. Each element is accessed based on its index and there is no direct connection between items like in a Linked List.

Representation of an Array

Because Arrays are based on indexing, the great thing about them is that if you know the index of the element you want, you can retrieve it in constant time O(1). For example, in the Array above, if you want to find the fourth element of the Array, which value is 40, you would simply say: array[3]. It would not be necessary to retrieve all precious elements, 10, 20 and 30, like it would be if you were using a Linked List.

Time complexity

  • Direct access (when you know the index): O(1)
  • Sequential access (when you do not know the index): O(N)

Unsorted Arrays

  • Search/delete: O(n)
  • Insert (when inserting at the end): O(1)

Sorted Arrays

  • Search (using Binary Search): O(Log n)
  • Insert/delete: O(n) (in worst case all elements may have to be moved)
  • Sorting: O(NLogN)

Advantages

  • An element of an Array can be accessed very quickly if you know the index.
  • If you know the index, the speed at which you access any specific item remains the same regardless of how long the Array is.

Disadvantages

  • Can be slow and inefficient when you need to resize them.
  • In some languages you have to define the size of the Array beforehand.

Usage

  • Arrays are a good option if you will need immediate access to items and you know their indexes.
  • Can be used whey will be good ave a fairly stable number of items, or know a maximum number beforehand.
  • If you will not need to insert or remove data frequently maybe Arrays can be a good option.

Stacks and Queues

Stacks and Queues are very efficient linear data structures that have very specific purposes. Stacks and Queue sare very similar when it comes to time complexity and many other characteristics. The main difference between them is that Stacks are a last in, first out data structure (LIFO), while Queues are a first in, first out (FIFO) one.

In other words, the first element to be added to a stack will be the last element to be retrieved and the first element to be added to a queue will be the first one to be retrieved. You can imagine a stack as a stack of plates, where the last plate you put on top of the stack will be the first one you will remove. The same way, you can imagine a queue as a line of people to buy a ticket, where the first person to get on the line will be the first one to buy it and to go home.

Stack and queue

Because of those peculiar characteristics, Stacks and Queues are very efficient data structures for all basic operations (retrieve, insert and remove), however it is not possible to retrieve elements from other positions, which means you will also not be able to sort their elements.

Time complexity

  • Push/Add/Pop/Remove/Peek: O(1)

Advantages

  • Differently from Arrays, Stacks and Queues have flexible size: you can add element as you go, there is no need to define the size of it beforehand.
  • Stacks and Queues are very efficient.

Disadvantages

  • There is no way to get element from other positions.
  • It is not possible to sort elements inside a queue or a stack.

Usage

Stacks

  • A clear example of an stack could be the “undo” mechanism in a text editor.
  • When you go back or forward in your browser history it’s a direct example of push and pop.

Queues

  • The application of Queues can be seen when a resource is shared among multiple consumers. Examples include CPU scheduling and disk scheduling.
  • Stacks can be used to set priorities, in games for example.

Hash Tables (dictionaries)

Hashing is an important and efficient data structure which stores values based on a key-value pair. It is designed to use a special function called the Hash Function which is used to map a given key to a particular value for faster access of elements. The efficiency of mapping depends on the efficiency of the hash function used.

Collisions

Collisions occur when pairs of elements are mapped to the same hash value. These should be avoided and defining a good hashing function is the way to prevent it from happening. However, even when using a good hash function is, collisions are bound to occur. Therefore, to maintain the performance of a Hash Table, it is important to manage collisions.

There are many collision handling techniques. One of them is called Chaining. When there is a collision, the chaining technique says that the values should be stored into a Linked List. For example, if in a Hash Table there is a key “person” mapped to the value “John” and you are adding the value “Mary” to the same key. Now, when you search for “person”, the value retrieved will no longer be “John”, but it will be a list containing the values “John” and “Mary”.

Advantages

  • Even if the Hash Table gets larger, the amount of time to retrieve an element will be always the same.
  • When the Hash Table gets full, it will increase its size. And, when the number of filled buckets is much smaller than the size of the Hash Table, it will then decrease its size. Both operations take a complexity of O(N). That’s why insertion and deletion takes O(1) amortized.
  • Looking up, adding, and removing data is very fast for dictionaries, as long as you know the keys. In fact, dictionaries are as fast as Arrays because you don’t have to traverse them to find a value, you can access the value you want in constant time O(1).

Disadvantages

  • It requires a little more space in memory than Arrays. You can waste a lot of memory if you make a large one and do not use much.
  • If you don’t know the key of the value you want, you have to traverse the dictionary from the beginning to the end.
  • Dictionaries are typically not meant to store sorted data, and most do not provide any method of doing that. Retrieval of elements doesn’t guarantee a specific order.

Usage

  • Dictionaries are most helpful when you have some kind of association, like when you have a constant or symbol that can be associated with a value that may change or is not known. For example, to store configuration values.
  • Hash Tables are very efficient for look up values instantly if the keys are known.
  • Dictionaries avoid duplicate values

Binary Tree and Binary Search Trees

A Binary Tree is a hierarchical data structure that is composed of a root, children and leaf elements. In a Binary Tree, each node can have at most 2 children (left and right).

A Binary Search Tree is a Binary Tree which elements follow a specific order. The left nodes are smaller than the root node which is smaller than the right nodes. Because of that property, Binary Search Trees are very efficient for searching large amounts of data, because you only need to look at a fraction of the nodes, without comparing against all of the possible values. Even so, finding a value is not as fast as in an Array or dictionary when you know the index or key respectively. Whereas getting a value from an Array or dictionary is the same regardless of how much data is stored, getting values from trees takes longer as the tree grows.

When a tree is not a Binary Search Tree or is unbalanced it is not efficient for searching because it may end up working just like a list.

Balanced vs Unbalanced Binary Search Tree

Traversing

Traversing is basically walking through the elements of a tree. There are three types of traversing algorithms:

  • In order: first traverse the left node, then the root and finally the right node.
  • Pre order: first traverse the root, then the left node and finally the right node.
  • Post order: first traverse the left node, then the right node and finally the root.

Time complexity

balanced:

  • Insert/delete/find is O(LogN)

Unbalanced:

  • Insert/delete/find is O(N)

Advantages

  • Binary Search Trees are very efficient for searching.

Disadvantages

  • Their creation and management add some overhead.
  • Trees have to use memory to keep the references to children nodes. This is potentially higher than a Linked List memory usage, depending on how many children nodes there are per node.

Usage

  • Binary Search Tree is a good option when the retrieval of elements must be done in order.
  • Storing unknown amounts of data that needs to be sorted and searched quickly is a good application for Binary Search Trees.
  • If the data will change often and still remain easily searched, trees are a great option.

Tries

Tries are a kind of tree that are often used to store characters. By looking at the path, you will get the whole word. Tries are also efficient information retrieval data structures.

Trie data structure

Time complexity

  • Insert/search: O(key_length)

Advantages

  • Allow efficiently performing of prefix search or auto complete
  • Allow efficiently printing all words in alphabetical order.

Disadvantages

  • The main disadvantage of Tries is that they may require a lot of memory to store pointers to the related nodes. For each node we may end up having as many pointers as the size of the alphabet (one pointer to each one of the letters in the alphabet).

Usage

  • Tries are very used for validation of words and problems where you have to find words beginning with something.

Bonus: Java implementation

If you have followed until here, now you should be able to describe how each one of the presented data structures work and when to use each one of them. Maybe now you are already thinking about how to implement those data structures. That’s why I will share with you the following repository where you will find java implementations of some operations using Stacks, Queues, Binary Search Trees and Hash Tables.

https://github.com/Rayanearaujo/java-data-structures

Conclusion

This post presented an overview of the main data structures you need to know. We discussed about their principles, time complexity, advantages, disadvantages and use cases.

If you are about to take a coding interview you will probably need to choose the right data structure to solve the problems you will be given. If you are already a developer and you want to write scalable and efficient solution, it is also important to know those data structures in order to make better choices.

I hope by know you are already able to decide when to use each one of them and to understand their strengths and weaknesses.

--

--

Rayane de Araújo

Data engineer at tembici. Big Data and Machine Learning Lover. Always seeking for professional and personal improvement.