Find the duplicate number

Problem

Given an array nums containing \(n + 1\) integers where each integer is between \(1\) and \(n\) (inclusive), prove that at least one duplicate number must exist. Assume that there is only one duplicate number, find the duplicate one.

Example 1:

Input: [1,3,4,2,2]  
Output: 2  

Example 2:

Input: [3,1,3,4,2]  
Output: 3  

Note:

  • You must not modify the array (assume the array is read only).
  • You must use only constant, \(O(1)\) extra space.
  • Your runtime complexity should be less than \(O(n^2)\).
  • There is only one duplicate number in the array, but it could be repeated more than once.

Approach

This is a slightly tricky problem that has a lot of constraints. Since you can't modify the array, you are tempted to create some space. However, you can only make constant space, meaning that you can't copy the array, or create an array of \(n\) size. On top of that, your time complexity needs to be better than \(O(n^2)\). When it comes to sorting, we can expect to sort in \(O(n log(n))\) time, which is better than \(O(n^2)\), but since we can't modify the array or create another array of equivalent size, we cannot do a traditional sort to solve the problem.

The key intuition here is to read the description carefully, especially on the input. Our input consists of an array where each number is between \(1\) and \(n\).

If \(n = 1\), then that means we can have the following input: \([1, 1]\).

If \(n = 2\), then that means we can have the following input: \([1, 2, 2], [1, 1, 2], 1, 1, 1], [2, 2, 2]\).

If \(n = 3\), then that means we can have the following input: \([1, 2, 3, 3], [3, 1, 2, 3], [1, 3, 2, 2]\) and more.

The \(n = 3\) gives us the biggest tip here. In general, our array will never contain a number that is greater than the size of the array itself. What's more, is that we can only have one duplicate number, with frequency that is only limited by \(n\). This means that if \(n = 5\), we can have an array with six \(1\)'s, but not an array with four \(1\)'s and two \(2\)'s, because that is an array with two duplicate numbers (of frequency 4 and 2 respectively).

Since what we want to aim for is linear time complexity, with only constant space used, we can certainly expect to linearly traverse the array. The question is how? Since the input array can be unsorted, it doesn't really make sense to traverse the array in ascending index order. Instead, what if we use the number value of the index as an index itself?

For example, if \(A = [3, 1, 2, 1]\), first we'll look at \(i = 0\). At \(i = 0\), \(A[i] = 3\), so we look at index \(3\). At \(i = 3\), \(A[i] = 1\), so replace with this value with the former \(3\) and then search for index \(1\) now. At \(i = 1\), \(A[i] = 1\), which is a duplicate since \(i = A[i]\), so we return \(i\).

Fun fact: This is similar to detecting a cycle in a linked list. If you treat the integer values of the array as pointers, then running into the same pointer value implies that a cycle exists.

Solution

    def findDuplicate(self, nums):
        """
        :type nums: List[int]
        :rtype: int
        """
        i = 0
        temp = -1

        while nums[i] != i:
            temp = nums[nums[i]]
            nums[nums[i]] = nums[i]

            # Our first number can't be a duplicate
            if i != 0:
                nums[i] = i
            i = temp


        return i