How to Remove Duplicates from a Sorted Array

How to Remove Duplicates from a Sorted Array

Removing duplicates from a sorted array is a common problem asked in many coding interviews and programming challenges. This can be approached using different methods. In this article, we'll discuss two approaches to remove duplicates from a sorted array: one using set and another using the two-pointers technique.

Solution 1: Brute Force Approach (using Set)

This method uses a set to store unique elements, ensuring that duplicates are removed.

Implementation:

// Solution-1: Brute Force Approach (using Set)
// Time Complexity: O(nlogn)
// Space Complexity: O(n)
int removeDuplicates(vector<int> &arr, int n)
{
    set<int> uniqueElements;

    for (int i = 0; i < n; i++)
    {
        uniqueElements.insert(arr[i]);
    }

    int k = uniqueElements.size();

    // Copy unique elements from the set back into the input array
    int i = 0;
    for (auto x : uniqueElements)
    {
        arr[i] = x;
        i++;
    }

    // Return the number of unique elements
    return k;

}

Logic:

  1. Use a Set: Insert all elements of the array into a set to remove duplicates.

  2. Get Unique Count: The size of the set gives the number of unique elements.

  3. Copy Back to Array: Copy elements from the set back to the array.

Time Complexity: O(n log n)

  • Explanation: Inserting each element into the set takes O(log n) time, resulting in O(n log n) for n elements.

Space Complexity: O(n)

  • Explanation: The set stores all unique elements, potentially up to n elements.

Example:

  • Input: arr = [1, 1, 2, 2, 3, 4, 4], n = 7

  • Output: k = 4, arr = [1, 2, 3, 4, ...]

  • Explanation: The array contains 4 unique elements.


Solution 2: Optimal Approach (Two-Pointers Technique)

This method uses two pointers to overwrite duplicates in place, providing an efficient solution.

Implementation:

// Solution-2: Optimal Approach (Two-Pointers Technique)
// Time Complexity: O(n)
// Space Complexity: O(1)
int removeDuplicates(vector<int> &arr, int n)
{
    int i = 0; // Pointer to track the position of the next unique element
    for (int j = 1; j < n; j++)
    {
        if (arr[i] != arr[j])
        {
            i++;
            arr[i] = arr[j];
        }
    }

    // Return the number of unique elements
    return i + 1;
}

Logic:

  1. Initialize Pointers: Use i to track the position of the next unique element and j to traverse the array.

  2. Compare Elements: If arr[i] is not equal to arr[j], increment i and update arr[i] with arr[j].

  3. Return Count: The value i + 1 gives the number of unique elements.

Time Complexity: O(n)

  • Explanation: The array is traversed once.

Space Complexity: O(1)

  • Explanation: No additional space is used apart from variables.

Example:

  • Input: arr = [1, 1, 2, 2, 3, 4, 4], n = 7

  • Output: k = 4, arr = [1, 2, 3, 4, ...]

  • Explanation: The array contains 4 unique elements.


Comparison

  • Brute Force Method:

    • Pros: Simple and straightforward.

    • Cons: Inefficient for large arrays due to O(n log n) time complexity and additional space usage.

    • Use Case: Useful when simplicity is more important than efficiency.

  • Optimal Method:

    • Pros: Efficient with O(n) time complexity and O(1) space complexity.

    • Cons: Requires in-place modification of the array.

    • Use Case: Ideal for large arrays and when in-place operations are feasible.

Edge Cases

  • Empty Array: Returns 0 as there are no elements.

  • Single Element Array: Returns 1 as a single element is trivially unique.

  • All Identical Elements: Returns 1 as all elements are the same.

Additional Notes

  • Efficiency: The optimal approach is significantly more efficient for large datasets.

  • Simplicity: Despite its efficiency, the optimal approach is simple to implement.

  • Practicality: The optimal method is generally preferred due to its linear time complexity and constant space complexity.

Conclusion

Removing duplicates from a sorted array can be done efficiently using an in-place approach with two pointers. While the brute force method provides a straightforward solution, the optimal approach is both efficient and easy to implement, making it suitable for large datasets.