Reduce Array Size to The Half

1338. Reduce Array Size to The Half

You are given an integer array arr. You can choose a set of integers and remove all the occurrences of these integers in the array.
Return the minimum size of the set so that at least half of the integers of the array are removed.

Example 1:

Input: arr = [3,3,3,3,5,5,5,2,2,7]

Output: 2

Explanation: Choosing {3,7} will make the new array [5,5,5,2,2] which has size 5 (i.e equal to half of the size of the old array).

Possible sets of size 2 are {3,5},{3,2},{5,2}.

Choosing set {2,7} is not possible as it will make the new array [3,3,3,3,5,5,5] which has a size greater than half of the size of the old array.

Example 2:

Input: arr = [7,7,7,7,7,7]

Output: 1

Explanation: The only possible set you can choose is {7}. This will make the new array empty.

Reduce Array Size to The Half - Detailed Explanation

In this problem, we are given an array of integers and asked to perform operations that remove elements from it, until the array becomes no more than half of its original size.

The catch? Each operation allows us to pick one number and remove all occurrences of it at once. The ultimate goal is to minimize the number of such operations. So, the fewer numbers we choose, the better.

✨ Key Insight

Not all numbers are equal in this scenario. Some numbers appear many times, while others barely show up. So logically, we want to remove numbers that occur the most. This way, we eliminate more elements per operation.

Example

Suppose we have this array:

[1, 1, 2, 2, 2, 3, 3, 4]

Here’s the count for each number:

1 appears 2 times
2 appears 3 times
3 appears 2 times
4 appears 1 time

To reduce the array to half (original length is 8, so we need ≤ 4 elements), we can remove the number with the highest frequency first (which is 2). That removes 3 elements in one go.

After that, we just need to remove one more number to make the total removed ≥ 4. We can choose either 1 or 3.

Approach 1: Sorting

🔍 Intuition

First, we count how many times each number occurs. To make this easier, we sort the array so that identical elements are next to each other. This helps in grouping them together.

🚀 Steps

Sort the array.
Count the frequency of each number.
Sort the frequencies in descending order.
Keep removing the most frequent numbers until we have removed at least half of the array.

💡 Algorithm

sort(arr)
counts = frequency of each number
sort(counts, descending)
removed = 0
operations = 0

for count in counts:
    removed += count
    operations += 1
    if removed >= n / 2:
        break
return operations


           class Solution {
            public int minSetSize(int[] arr) {
              Map<Integer, Integer> hmap = new HashMap<>();
              
              for (int elem : arr) {
                hmap.put(elem, hmap.getOrDefault(elem, 0) + 1);
              }

              PriorityQueue<Integer> q = new PriorityQueue<>((a, b) -> (b - a));
              for (int val : hmap.values()) {
                q.offer(val);
              }

              int sum = 0;
              int count = 0;
              
              while (sum < arr.length / 2) {
                sum += q.poll();
                count++;
              }
              return count;
            }
          }

⏱ Time Complexity

O(n log n): Sorting takes O(n log n), counting and iterating are O(n).

🧠 Space Complexity

O(n): We store the counts of each number, and possibly a copy for sorting.

Approach 2: Hashing / Counting

🔍 Intuition

Instead of sorting the array first, we can go straight to counting frequencies using a hashmap or dictionary. This way, we save time and keep things efficient.

🚀 Steps

Create a hashmap where key = number and value = its count.
Extract all frequency values.
Sort them in descending order.
Keep removing until removed elements ≥ n/2.

💡 Sample Code (Pseudocode)

map = new HashMap()
for num in arr:
    map[num] += 1

counts = values of map
sort(counts, descending)

removed = 0
steps = 0
for count in counts:
    removed += count
    steps += 1
    if removed >= n / 2:
        break
return steps

⏱ Time Complexity

O(n log n) in worst case, due to sorting frequencies.

🧠 Space Complexity

O(n): HashMap and frequency list use extra space.

✨ Advantage

This is often faster in practice than the sorting-based approach, especially when there are fewer unique numbers.

Approach 3: Hashing + Bucket Sort

🔍 Intuition

The previous approach sorts the counts array which can take O(n log n). But do we really need a comparison-based sort? Nope! Since the maximum count of any number cannot exceed n, we can use a bucket sort which runs in linear time.

🚀 Steps

Count frequencies using a HashMap.
Initialize a bucket array of size n+1.
For each frequency value, add +1 at index bucket[freq].
Start from highest bucket index (most frequent), removing elements until we reach n/2.

💡 Pseudocode

map = new HashMap()
for num in arr:
    map[num] += 1

bucket = new int[n+1]
for count in map.values():
    bucket[count] += 1

removed = 0
operations = 0
for i = n to 1:
    while bucket[i] > 0 and removed < n/2:
        removed += i
        operations += 1
        bucket[i] -= 1
return operations

⏱ Time Complexity

O(n): Both frequency counting and bucket sorting take linear time.

🧠 Space Complexity

O(n): HashMap + bucket array.

💥 Performance Boost

This method gives you the best of both worlds: fast execution and fewer lines of code if implemented well.

✨ Summary & Recommendations

Approach 1 is simple and intuitive but not the most efficient.
Approach 2 is a practical improvement using hashing — clean and fast.
Approach 3 is best for optimal time performance, especially in large inputs.

For interviews or time-sensitive platforms like LeetCode, **Approach 2** is a great balance of simplicity and performance. But if you’re going the extra mile for performance, **Approach 3** is the way to go!