11.3 Set: HashSet, LinkedHashSet, and TreeSet

Overview

Set is a collection that does not allow duplicate elements. If you try to add an element that already exists, the operation is silently ignored. Sets are ideal when you need to:

Quickly check if an element exists (O(1) average for HashSet)
Remove duplicates from a collection
Perform mathematical set operations (union, intersection, difference)

1. HashSet

HashSet is the most commonly used Set. It stores elements in a hash table, giving O(1) average performance for add, remove, and contains. It does not guarantee any order.

import java.util.HashSet;
import java.util.Set;

Set<String> set = new HashSet<>();

// Adding elements
set.add("apple");
set.add("banana");
set.add("cherry");
set.add("apple");  // duplicate — silently ignored

System.out.println(set);          // [banana, apple, cherry] (order varies)
System.out.println(set.size());   // 3 (not 4!)

// Searching
System.out.println(set.contains("apple"));   // true
System.out.println(set.contains("grape"));   // false

// Removing
set.remove("banana");
System.out.println(set); // [apple, cherry]

// Iteration (order not guaranteed)
for (String fruit : set) {
    System.out.println(fruit);
}

set.forEach(System.out::println);

2. LinkedHashSet

LinkedHashSet extends HashSet and maintains insertion order by using a doubly-linked list internally. Same O(1) performance, but iteration always follows the order elements were added.

import java.util.LinkedHashSet;
import java.util.Set;

Set<String> linked = new LinkedHashSet<>();
linked.add("cherry");
linked.add("apple");
linked.add("banana");
linked.add("apple");  // duplicate — ignored

System.out.println(linked); // [cherry, apple, banana] — insertion order preserved

// Use case: deduplicate while preserving order
List<String> withDups = List.of("a", "b", "a", "c", "b", "d");
Set<String> unique = new LinkedHashSet<>(withDups);
System.out.println(unique); // [a, b, c, d]

3. TreeSet

TreeSet stores elements in a sorted (natural) order, implemented as a Red-Black tree. All operations are O(log n). It implements the NavigableSet interface, which adds powerful range queries.

import java.util.TreeSet;
import java.util.NavigableSet;

TreeSet<Integer> tree = new TreeSet<>();
tree.add(5);
tree.add(2);
tree.add(8);
tree.add(1);
tree.add(9);
tree.add(3);

System.out.println(tree);        // [1, 2, 3, 5, 8, 9] — always sorted!

// NavigableSet methods
System.out.println(tree.first()); // 1 (smallest)
System.out.println(tree.last());  // 9 (largest)

System.out.println(tree.floor(4));   // 3 (largest <= 4)
System.out.println(tree.ceiling(4)); // 5 (smallest >= 4)
System.out.println(tree.lower(5));   // 3 (strictly less than 5)
System.out.println(tree.higher(5));  // 8 (strictly greater than 5)

// Range views (subSet, headSet, tailSet)
System.out.println(tree.subSet(2, true, 8, true)); // [2, 3, 5, 8] (inclusive both)
System.out.println(tree.headSet(5));               // [1, 2, 3] (less than 5)
System.out.println(tree.tailSet(5));               // [5, 8, 9] (>= 5)

// Descending order
NavigableSet<Integer> desc = tree.descendingSet();
System.out.println(desc); // [9, 8, 5, 3, 2, 1]

TreeSet with Custom Order

import java.util.TreeSet;
import java.util.Comparator;

// Sort strings by length, then alphabetically
TreeSet<String> byLength = new TreeSet<>(
    Comparator.comparingInt(String::length)
              .thenComparing(Comparator.naturalOrder())
);

byLength.add("banana");
byLength.add("apple");
byLength.add("fig");
byLength.add("kiwi");
byLength.add("cherry");

System.out.println(byLength); // [fig, kiwi, apple, banana, cherry]

4. Choosing the Right Set

Feature	`HashSet`	`LinkedHashSet`	`TreeSet`
Order	None (hash-based)	Insertion order	Sorted (natural/comparator)
Performance (add/remove/contains)	O(1) average	O(1) average	O(log n)
Null element	Yes (one null)	Yes (one null)	No (NullPointerException)
Range queries	No	No	Yes (`subSet`, `headSet`, etc.)
Best for	Fast lookup, deduplication	Order-preserving dedup	Sorted data, range queries

5. equals() and hashCode() for Custom Objects

HashSet and LinkedHashSet use hashCode() and equals() to determine if two objects are the same. For custom objects, you must override both.

import java.util.HashSet;
import java.util.Objects;
import java.util.Set;

class Point {
    int x, y;

    Point(int x, int y) { this.x = x; this.y = y; }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (!(o instanceof Point p)) return false;
        return x == p.x && y == p.y;
    }

    @Override
    public int hashCode() {
        return Objects.hash(x, y);
    }

    @Override
    public String toString() { return "(" + x + "," + y + ")"; }
}

public class PointSetDemo {
    public static void main(String[] args) {
        Set<Point> points = new HashSet<>();
        points.add(new Point(1, 2));
        points.add(new Point(3, 4));
        points.add(new Point(1, 2)); // "duplicate" — same x,y

        System.out.println(points.size()); // 2 (not 3!)
        System.out.println(points.contains(new Point(1, 2))); // true
    }
}

Using record (Java 16+): Records automatically generate correct equals() and hashCode():

record Point(int x, int y) {}  // equals & hashCode included!

Set<Point> points = new HashSet<>();
points.add(new Point(1, 2));
points.add(new Point(1, 2)); // duplicate
System.out.println(points.size()); // 1

6. Set Operations (Union, Intersection, Difference)

import java.util.*;

Set<Integer> setA = new HashSet<>(Set.of(1, 2, 3, 4, 5));
Set<Integer> setB = new HashSet<>(Set.of(3, 4, 5, 6, 7));

// Union: A ∪ B (all elements from both)
Set<Integer> union = new HashSet<>(setA);
union.addAll(setB);
System.out.println("Union:        " + union); // [1, 2, 3, 4, 5, 6, 7]

// Intersection: A ∩ B (only elements in both)
Set<Integer> intersection = new HashSet<>(setA);
intersection.retainAll(setB);
System.out.println("Intersection: " + intersection); // [3, 4, 5]

// Difference: A \ B (in A but not in B)
Set<Integer> difference = new HashSet<>(setA);
difference.removeAll(setB);
System.out.println("Difference:   " + difference);   // [1, 2]

// Symmetric difference: (A ∪ B) \ (A ∩ B)
Set<Integer> symDiff = new HashSet<>(union);
symDiff.removeAll(intersection);
System.out.println("Sym. Diff:    " + symDiff);       // [1, 2, 6, 7]

// Subset check
Set<Integer> subset = Set.of(3, 4);
System.out.println("Is subset: " + setA.containsAll(subset)); // true

7. Practical Example: Duplicate Tag Removal

import java.util.*;
import java.util.stream.*;

public class TagManager {

    public static List<String> removeDuplicates(List<String> tags) {
        // LinkedHashSet preserves insertion order
        return new ArrayList<>(new LinkedHashSet<>(tags));
    }

    public static Set<String> commonTags(List<String> list1, List<String> list2) {
        Set<String> set1 = new HashSet<>(list1);
        Set<String> set2 = new HashSet<>(list2);
        set1.retainAll(set2);  // intersection
        return set1;
    }

    public static Set<String> allUniqueTags(List<List<String>> allTagLists) {
        return allTagLists.stream()
                .flatMap(Collection::stream)
                .collect(Collectors.toCollection(TreeSet::new)); // sorted
    }

    public static void main(String[] args) {
        List<String> rawTags = List.of("java", "spring", "java", "jpa", "spring", "redis");
        System.out.println("Original:   " + rawTags);
        System.out.println("Deduplicated: " + removeDuplicates(rawTags));

        List<String> post1Tags = List.of("java", "spring", "jpa");
        List<String> post2Tags = List.of("java", "python", "spring");
        System.out.println("Common tags: " + commonTags(post1Tags, post2Tags));

        List<List<String>> allPosts = List.of(
            List.of("java", "spring"),
            List.of("python", "java"),
            List.of("go", "rust", "spring")
        );
        System.out.println("All unique (sorted): " + allUniqueTags(allPosts));
        // [go, java, python, rust, spring]
    }
}

Pro Tips

HashSet vs TreeSet performance:

// For 1,000,000 elements:
// HashSet.contains() → ~50ns (O(1))
// TreeSet.contains() → ~700ns (O(log n))

// But TreeSet has unique powers:
TreeSet<Integer> ts = new TreeSet<>(/* millions of numbers */);
ts.subSet(100, 200);    // all numbers between 100-200
ts.headSet(500);        // all numbers < 500
// No equivalent in HashSet!

Null handling:

HashSet / LinkedHashSet: allow one null element
TreeSet: throws NullPointerException for null (can't compare null)

Rule of thumb: Use HashSet unless you need ordering. Use LinkedHashSet when insertion order matters. Use TreeSet when you need sorted data or range queries.

Overview​

1. HashSet​

2. LinkedHashSet​

3. TreeSet​

TreeSet with Custom Order​

4. Choosing the Right Set​

5. equals() and hashCode() for Custom Objects​

6. Set Operations (Union, Intersection, Difference)​

7. Practical Example: Duplicate Tag Removal​