Data Structures

Mon, Jan 23, 2023 3-minute read

Computers are all about data. How should we store the data that we have, especially inside our programs. This is not about long-term storage of data on the hard-drive, but about how we hold on to the data while our program is running.

Are we keeping a big list of numbers {1, 2, 3, } or names `{“Tom”, “Steve”}

What if we want to keep a bit more data, like names and telephone numbers? What if we know the data is going to grow as we go on? What if we know we will need to search the data regularly or that we will need to add things often but never remove them?

This affects the data structures that we choose to use. We could just use ArrayLists for everything, but we would lose the benefits that some other structure would give us, and we should also think about how other programmers might use our code in the future. Do we want to have some controls on how data is accessed?

One feature that ArrayLists have, is that we can access items by their index and pull things out randomly. Arrays start counting at 0 instead of 1 so this ArrayList has a size() of 3 but John is at element 2. Confusing? Yes. {"Tony", "Mike", "John"} e.g. names.get(9);

What happens if you remove the first (zero-ith) item from the ArrayList?

names.remove(0);

Every element after this will have to shift down to the left so the item at 1 is now 0, 2 is now 1 etc.

If there are 10, or 100, or 1000 items, this might not be so bad, but what if there are 1,000,000 items or a billion? That’s a massive operation everytime that we want to pull something off the back. If only we had something similar to a queue, that is designed to have the first item be removed easily. Well, Java has a Queue data structure for exactly this purpose. There isn’t a data structure for every purpose but there are some that are much better than others and part of good design is choosing which one is the best for the job.

Main Structures in Java

  • Map (key-value store)
  • Tree (keep things in order)
  • Set (prevent duplicates)
  • List (ordered list)
  • Graph (links between items)
  • Queue (first in, first out)
  • Stack (first in, last out)

Each interface/abstract data type has a lot of sub types. We can’t actually use these ADTs (weirdly) but they give an interface or overall picture of what their children classes should have. For example, all Queue should have an add() and a poll() but how that get implemented is up the actual class you use.

TODO: here are some diagrams

Queue<Patient> waitingRoomQueue = new PriorityQueue<>();
  • Maps
    • HashMaps
  • Sets
    • HashSet
    • TreeSet
    • LinkedHashSet
  • Queue
    • LinkedList,
    • PriorityQueue
    • Deque (“dek”)
      • ArrayDeque (“dek”)
      • Queue<Patient> myPatientList = new ArrayDeque<>();
  • List
    • ArrayList
    • LinkedList
  • Stack
    • (stack is old-school, OG Java, before they started making it nicer)
    • Stack is a type of Stack. Weird, I know.
    • Stack<ExamPaper> pileOfExams = new Stack<>(ExamPaper);