Skip to main content

11.3 Set: HashSet, LinkedHashSet, and TreeSet

Overview

Set is a collection that does not allow duplicate elements. If you try to add an element that already exists, the operation is silently ignored. Sets are ideal when you need to:

  • Quickly check if an element exists (O(1) average for HashSet)
  • Remove duplicates from a collection
  • Perform mathematical set operations (union, intersection, difference)

1. HashSet

HashSet is the most commonly used Set. It stores elements in a hash table, giving O(1) average performance for add, remove, and contains. It does not guarantee any order.

import java.util.HashSet;
import java.util.Set;

Set<String> set = new HashSet<>();

// Adding elements
set.add("apple");
set.add("banana");
set.add("cherry");
set.add("apple"); // duplicate — silently ignored

System.out.println(set); // [banana, apple, cherry] (order varies)
System.out.println(set.size()); // 3 (not 4!)

// Searching
System.out.println(set.contains("apple")); // true
System.out.println(set.contains("grape")); // false

// Removing
set.remove("banana");
System.out.println(set); // [apple, cherry]

// Iteration (order not guaranteed)
for (String fruit : set) {
System.out.println(fruit);
}

set.forEach(System.out::println);

2. LinkedHashSet

LinkedHashSet extends HashSet and maintains insertion order by using a doubly-linked list internally. Same O(1) performance, but iteration always follows the order elements were added.

import java.util.LinkedHashSet;
import java.util.Set;

Set<String> linked = new LinkedHashSet<>();
linked.add("cherry");
linked.add("apple");
linked.add("banana");
linked.add("apple"); // duplicate — ignored

System.out.println(linked); // [cherry, apple, banana] — insertion order preserved

// Use case: deduplicate while preserving order
List<String> withDups = List.of("a", "b", "a", "c", "b", "d");
Set<String> unique = new LinkedHashSet<>(withDups);
System.out.println(unique); // [a, b, c, d]

3. TreeSet

TreeSet stores elements in a sorted (natural) order, implemented as a Red-Black tree. All operations are O(log n). It implements the NavigableSet interface, which adds powerful range queries.

import java.util.TreeSet;
import java.util.NavigableSet;

TreeSet<Integer> tree = new TreeSet<>();
tree.add(5);
tree.add(2);
tree.add(8);
tree.add(1);
tree.add(9);
tree.add(3);

System.out.println(tree); // [1, 2, 3, 5, 8, 9] — always sorted!

// NavigableSet methods
System.out.println(tree.first()); // 1 (smallest)
System.out.println(tree.last()); // 9 (largest)

System.out.println(tree.floor(4)); // 3 (largest <= 4)
System.out.println(tree.ceiling(4)); // 5 (smallest >= 4)
System.out.println(tree.lower(5)); // 3 (strictly less than 5)
System.out.println(tree.higher(5)); // 8 (strictly greater than 5)

// Range views (subSet, headSet, tailSet)
System.out.println(tree.subSet(2, true, 8, true)); // [2, 3, 5, 8] (inclusive both)
System.out.println(tree.headSet(5)); // [1, 2, 3] (less than 5)
System.out.println(tree.tailSet(5)); // [5, 8, 9] (>= 5)

// Descending order
NavigableSet<Integer> desc = tree.descendingSet();
System.out.println(desc); // [9, 8, 5, 3, 2, 1]

TreeSet with Custom Order

import java.util.TreeSet;
import java.util.Comparator;

// Sort strings by length, then alphabetically
TreeSet<String> byLength = new TreeSet<>(
Comparator.comparingInt(String::length)
.thenComparing(Comparator.naturalOrder())
);

byLength.add("banana");
byLength.add("apple");
byLength.add("fig");
byLength.add("kiwi");
byLength.add("cherry");

System.out.println(byLength); // [fig, kiwi, apple, banana, cherry]

4. Choosing the Right Set

FeatureHashSetLinkedHashSetTreeSet
OrderNone (hash-based)Insertion orderSorted (natural/comparator)
Performance (add/remove/contains)O(1) averageO(1) averageO(log n)
Null elementYes (one null)Yes (one null)No (NullPointerException)
Range queriesNoNoYes (subSet, headSet, etc.)
Best forFast lookup, deduplicationOrder-preserving dedupSorted data, range queries

5. equals() and hashCode() for Custom Objects

HashSet and LinkedHashSet use hashCode() and equals() to determine if two objects are the same. For custom objects, you must override both.

import java.util.HashSet;
import java.util.Objects;
import java.util.Set;

class Point {
int x, y;

Point(int x, int y) { this.x = x; this.y = y; }

@Override
public boolean equals(Object o) {
if (this == o) return true;
if (!(o instanceof Point p)) return false;
return x == p.x && y == p.y;
}

@Override
public int hashCode() {
return Objects.hash(x, y);
}

@Override
public String toString() { return "(" + x + "," + y + ")"; }
}

public class PointSetDemo {
public static void main(String[] args) {
Set<Point> points = new HashSet<>();
points.add(new Point(1, 2));
points.add(new Point(3, 4));
points.add(new Point(1, 2)); // "duplicate" — same x,y

System.out.println(points.size()); // 2 (not 3!)
System.out.println(points.contains(new Point(1, 2))); // true
}
}

Using record (Java 16+): Records automatically generate correct equals() and hashCode():

record Point(int x, int y) {}  // equals & hashCode included!

Set<Point> points = new HashSet<>();
points.add(new Point(1, 2));
points.add(new Point(1, 2)); // duplicate
System.out.println(points.size()); // 1

6. Set Operations (Union, Intersection, Difference)

import java.util.*;

Set<Integer> setA = new HashSet<>(Set.of(1, 2, 3, 4, 5));
Set<Integer> setB = new HashSet<>(Set.of(3, 4, 5, 6, 7));

// Union: A ∪ B (all elements from both)
Set<Integer> union = new HashSet<>(setA);
union.addAll(setB);
System.out.println("Union: " + union); // [1, 2, 3, 4, 5, 6, 7]

// Intersection: A ∩ B (only elements in both)
Set<Integer> intersection = new HashSet<>(setA);
intersection.retainAll(setB);
System.out.println("Intersection: " + intersection); // [3, 4, 5]

// Difference: A \ B (in A but not in B)
Set<Integer> difference = new HashSet<>(setA);
difference.removeAll(setB);
System.out.println("Difference: " + difference); // [1, 2]

// Symmetric difference: (A ∪ B) \ (A ∩ B)
Set<Integer> symDiff = new HashSet<>(union);
symDiff.removeAll(intersection);
System.out.println("Sym. Diff: " + symDiff); // [1, 2, 6, 7]

// Subset check
Set<Integer> subset = Set.of(3, 4);
System.out.println("Is subset: " + setA.containsAll(subset)); // true

7. Practical Example: Duplicate Tag Removal

import java.util.*;
import java.util.stream.*;

public class TagManager {

public static List<String> removeDuplicates(List<String> tags) {
// LinkedHashSet preserves insertion order
return new ArrayList<>(new LinkedHashSet<>(tags));
}

public static Set<String> commonTags(List<String> list1, List<String> list2) {
Set<String> set1 = new HashSet<>(list1);
Set<String> set2 = new HashSet<>(list2);
set1.retainAll(set2); // intersection
return set1;
}

public static Set<String> allUniqueTags(List<List<String>> allTagLists) {
return allTagLists.stream()
.flatMap(Collection::stream)
.collect(Collectors.toCollection(TreeSet::new)); // sorted
}

public static void main(String[] args) {
List<String> rawTags = List.of("java", "spring", "java", "jpa", "spring", "redis");
System.out.println("Original: " + rawTags);
System.out.println("Deduplicated: " + removeDuplicates(rawTags));

List<String> post1Tags = List.of("java", "spring", "jpa");
List<String> post2Tags = List.of("java", "python", "spring");
System.out.println("Common tags: " + commonTags(post1Tags, post2Tags));

List<List<String>> allPosts = List.of(
List.of("java", "spring"),
List.of("python", "java"),
List.of("go", "rust", "spring")
);
System.out.println("All unique (sorted): " + allUniqueTags(allPosts));
// [go, java, python, rust, spring]
}
}

Pro Tips

HashSet vs TreeSet performance:

// For 1,000,000 elements:
// HashSet.contains() → ~50ns (O(1))
// TreeSet.contains() → ~700ns (O(log n))

// But TreeSet has unique powers:
TreeSet<Integer> ts = new TreeSet<>(/* millions of numbers */);
ts.subSet(100, 200); // all numbers between 100-200
ts.headSet(500); // all numbers < 500
// No equivalent in HashSet!

Null handling:

  • HashSet / LinkedHashSet: allow one null element
  • TreeSet: throws NullPointerException for null (can't compare null)

Rule of thumb: Use HashSet unless you need ordering. Use LinkedHashSet when insertion order matters. Use TreeSet when you need sorted data or range queries.