11.3 Set: HashSet, LinkedHashSet, and TreeSet
Overview
Set is a collection that does not allow duplicate elements. If you try to add an element that already exists, the operation is silently ignored. Sets are ideal when you need to:
- Quickly check if an element exists (
O(1)average forHashSet) - Remove duplicates from a collection
- Perform mathematical set operations (union, intersection, difference)
1. HashSet
HashSet is the most commonly used Set. It stores elements in a hash table, giving O(1) average performance for add, remove, and contains. It does not guarantee any order.
import java.util.HashSet;
import java.util.Set;
Set<String> set = new HashSet<>();
// Adding elements
set.add("apple");
set.add("banana");
set.add("cherry");
set.add("apple"); // duplicate — silently ignored
System.out.println(set); // [banana, apple, cherry] (order varies)
System.out.println(set.size()); // 3 (not 4!)
// Searching
System.out.println(set.contains("apple")); // true
System.out.println(set.contains("grape")); // false
// Removing
set.remove("banana");
System.out.println(set); // [apple, cherry]
// Iteration (order not guaranteed)
for (String fruit : set) {
System.out.println(fruit);
}
set.forEach(System.out::println);
2. LinkedHashSet
LinkedHashSet extends HashSet and maintains insertion order by using a doubly-linked list internally. Same O(1) performance, but iteration always follows the order elements were added.
import java.util.LinkedHashSet;
import java.util.Set;
Set<String> linked = new LinkedHashSet<>();
linked.add("cherry");
linked.add("apple");
linked.add("banana");
linked.add("apple"); // duplicate — ignored
System.out.println(linked); // [cherry, apple, banana] — insertion order preserved
// Use case: deduplicate while preserving order
List<String> withDups = List.of("a", "b", "a", "c", "b", "d");
Set<String> unique = new LinkedHashSet<>(withDups);
System.out.println(unique); // [a, b, c, d]
3. TreeSet
TreeSet stores elements in a sorted (natural) order, implemented as a Red-Black tree. All operations are O(log n). It implements the NavigableSet interface, which adds powerful range queries.
import java.util.TreeSet;
import java.util.NavigableSet;
TreeSet<Integer> tree = new TreeSet<>();
tree.add(5);
tree.add(2);
tree.add(8);
tree.add(1);
tree.add(9);
tree.add(3);
System.out.println(tree); // [1, 2, 3, 5, 8, 9] — always sorted!
// NavigableSet methods
System.out.println(tree.first()); // 1 (smallest)
System.out.println(tree.last()); // 9 (largest)
System.out.println(tree.floor(4)); // 3 (largest <= 4)
System.out.println(tree.ceiling(4)); // 5 (smallest >= 4)
System.out.println(tree.lower(5)); // 3 (strictly less than 5)
System.out.println(tree.higher(5)); // 8 (strictly greater than 5)
// Range views (subSet, headSet, tailSet)
System.out.println(tree.subSet(2, true, 8, true)); // [2, 3, 5, 8] (inclusive both)
System.out.println(tree.headSet(5)); // [1, 2, 3] (less than 5)
System.out.println(tree.tailSet(5)); // [5, 8, 9] (>= 5)
// Descending order
NavigableSet<Integer> desc = tree.descendingSet();
System.out.println(desc); // [9, 8, 5, 3, 2, 1]
TreeSet with Custom Order
import java.util.TreeSet;
import java.util.Comparator;
// Sort strings by length, then alphabetically
TreeSet<String> byLength = new TreeSet<>(
Comparator.comparingInt(String::length)
.thenComparing(Comparator.naturalOrder())
);
byLength.add("banana");
byLength.add("apple");
byLength.add("fig");
byLength.add("kiwi");
byLength.add("cherry");
System.out.println(byLength); // [fig, kiwi, apple, banana, cherry]
4. Choosing the Right Set
| Feature | HashSet | LinkedHashSet | TreeSet |
|---|---|---|---|
| Order | None (hash-based) | Insertion order | Sorted (natural/comparator) |
| Performance (add/remove/contains) | O(1) average | O(1) average | O(log n) |
| Null element | Yes (one null) | Yes (one null) | No (NullPointerException) |
| Range queries | No | No | Yes (subSet, headSet, etc.) |
| Best for | Fast lookup, deduplication | Order-preserving dedup | Sorted data, range queries |
5. equals() and hashCode() for Custom Objects
HashSet and LinkedHashSet use hashCode() and equals() to determine if two objects are the same. For custom objects, you must override both.
import java.util.HashSet;
import java.util.Objects;
import java.util.Set;
class Point {
int x, y;
Point(int x, int y) { this.x = x; this.y = y; }
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (!(o instanceof Point p)) return false;
return x == p.x && y == p.y;
}
@Override
public int hashCode() {
return Objects.hash(x, y);
}
@Override
public String toString() { return "(" + x + "," + y + ")"; }
}
public class PointSetDemo {
public static void main(String[] args) {
Set<Point> points = new HashSet<>();
points.add(new Point(1, 2));
points.add(new Point(3, 4));
points.add(new Point(1, 2)); // "duplicate" — same x,y
System.out.println(points.size()); // 2 (not 3!)
System.out.println(points.contains(new Point(1, 2))); // true
}
}
Using record (Java 16+): Records automatically generate correct equals() and hashCode():
record Point(int x, int y) {} // equals & hashCode included!
Set<Point> points = new HashSet<>();
points.add(new Point(1, 2));
points.add(new Point(1, 2)); // duplicate
System.out.println(points.size()); // 1
6. Set Operations (Union, Intersection, Difference)
import java.util.*;
Set<Integer> setA = new HashSet<>(Set.of(1, 2, 3, 4, 5));
Set<Integer> setB = new HashSet<>(Set.of(3, 4, 5, 6, 7));
// Union: A ∪ B (all elements from both)
Set<Integer> union = new HashSet<>(setA);
union.addAll(setB);
System.out.println("Union: " + union); // [1, 2, 3, 4, 5, 6, 7]
// Intersection: A ∩ B (only elements in both)
Set<Integer> intersection = new HashSet<>(setA);
intersection.retainAll(setB);
System.out.println("Intersection: " + intersection); // [3, 4, 5]
// Difference: A \ B (in A but not in B)
Set<Integer> difference = new HashSet<>(setA);
difference.removeAll(setB);
System.out.println("Difference: " + difference); // [1, 2]
// Symmetric difference: (A ∪ B) \ (A ∩ B)
Set<Integer> symDiff = new HashSet<>(union);
symDiff.removeAll(intersection);
System.out.println("Sym. Diff: " + symDiff); // [1, 2, 6, 7]
// Subset check
Set<Integer> subset = Set.of(3, 4);
System.out.println("Is subset: " + setA.containsAll(subset)); // true
7. Practical Example: Duplicate Tag Removal
import java.util.*;
import java.util.stream.*;
public class TagManager {
public static List<String> removeDuplicates(List<String> tags) {
// LinkedHashSet preserves insertion order
return new ArrayList<>(new LinkedHashSet<>(tags));
}
public static Set<String> commonTags(List<String> list1, List<String> list2) {
Set<String> set1 = new HashSet<>(list1);
Set<String> set2 = new HashSet<>(list2);
set1.retainAll(set2); // intersection
return set1;
}
public static Set<String> allUniqueTags(List<List<String>> allTagLists) {
return allTagLists.stream()
.flatMap(Collection::stream)
.collect(Collectors.toCollection(TreeSet::new)); // sorted
}
public static void main(String[] args) {
List<String> rawTags = List.of("java", "spring", "java", "jpa", "spring", "redis");
System.out.println("Original: " + rawTags);
System.out.println("Deduplicated: " + removeDuplicates(rawTags));
List<String> post1Tags = List.of("java", "spring", "jpa");
List<String> post2Tags = List.of("java", "python", "spring");
System.out.println("Common tags: " + commonTags(post1Tags, post2Tags));
List<List<String>> allPosts = List.of(
List.of("java", "spring"),
List.of("python", "java"),
List.of("go", "rust", "spring")
);
System.out.println("All unique (sorted): " + allUniqueTags(allPosts));
// [go, java, python, rust, spring]
}
}
HashSet vs TreeSet performance:
// For 1,000,000 elements:
// HashSet.contains() → ~50ns (O(1))
// TreeSet.contains() → ~700ns (O(log n))
// But TreeSet has unique powers:
TreeSet<Integer> ts = new TreeSet<>(/* millions of numbers */);
ts.subSet(100, 200); // all numbers between 100-200
ts.headSet(500); // all numbers < 500
// No equivalent in HashSet!
Null handling:
HashSet/LinkedHashSet: allow one null elementTreeSet: throwsNullPointerExceptionfor null (can't compare null)
Rule of thumb: Use HashSet unless you need ordering. Use LinkedHashSet when insertion order matters. Use TreeSet when you need sorted data or range queries.