# Randomly Draw k unique integers out of an array of N unique integers

Given an array of n unique integers (1 to n), write an algorithm to draw a random combination of k distinct numbers (n >= k). (This problem comes from Core Java Vol I as an example.)

Unknown: A way to draw k distinct integers out of  n distinct integers.

Data: An array of integers 1 to n.

Constraint: k numbers must be distinct and randomly picked.

A straightforward solution would be:

1. Randomly pick one number out of the an array
2. If this number is not picked before, add it to the result. Return the result if we have k numbers.
3. Otherwise, back to step 1.

Q: So what is the time complexity of this solution?

A: If we are unlucky, in the worst case, O(k^2) and if k close to n, O(n^2).

Q: How so?

A: At some point, we will have problem selecting a number that’s not in the result set.  The first number is easy, just once. The second, if unlucky, twice. The third, if unlucky, 3 times since the first 2 times picked something in the result set…so up to k numbers, it can take 1+2+3+…+k picks which is approximately O(k^2). If k is close to n, then we have a O(n^2) algorithm. check the link at the bottom for the code.

Q: Alright, can we make it faster? Say let’s make it O(n) time and you cannot use additional space except for the result set.

A: Hmm, the bottleneck of the previous solution is that every time we pick a number, we have to check if it exists in the result set. If it does, we have to go back and pick again. If we can skip this step it will be faster.

Q: What do you mean by skipping this step?

A: I mean that every time we pick a number, it is guaranteed not picked before.

Q: How do we do that?

A: Hmm. we need to keep track of what has not been picked instead. Since we cannot use additional space, I assume that we have to do something on the original array. I can replace the picked number with some special value like n+1, but this sounds useless since if I happened to pick this number, I would have to choose again, exactly like before. I don’t know…

Q: OK, in what situation can we safely draw an unpicked number?

A: If the array only contains unpicked numbers, we can do that safely. But again, I don’t think we can recreate a new array to exclude the picked one in every pass. That’s O(n^2) again.

Q: True. So why can’t we draw the numbers safely now? What’s the matter?

A: Because there are picked values between unpicked ones.

Q: Good. You mentioned about excluding them. Is there a way to do that without creating a new array?

A: I suppose I can re-arrange the array? For example, if I picked the number at index i, I can move the numbers from i+1 to n-1 forward. But then I should pick a random index between 0 to n-1 (exclusive). Wait, this is still O(n^2)…

Q: Do we have to move all the elements after index i? Can we reduce this O(n) move to a O(1) move?

A: O(1)? So I should move only 1 element instead. But which one…

Q: Let’s use an example: 1,2,3,4,5 Say we picked 3. In your case, we change the array to 1,2,4,5,5 and then we pick from index 0 to 3 next time. We do the move because we want to make sure next time we are choosing from 1,2,4,5. So is there another way to do it?

A: Yes! I can move the last element to that position to achieve the same effect! So every time after the pick, I move the last element within the current range to the picked position then reduce the range by 1.

Q: That’s right 🙂

# MergeSort with practical improvement

Well MergeSort is a non-trivial sorting algorithm with time complexity O(NlogN) and space complexity O(N). It has very simple concept with both top-down and bottom-up approaches.

There are some practical improvements according to the book Algorithms, 4th Edition from Princeton University.

• Overhead for sorting small arrays: for array with size less than a CUTOFF value (it is set to 7), use insertion sort instead.
• Before merging: if the largest value of the left sorted half is less than the smallest value of the right sorted half, there is no need for merging.
• can’t remember…

The improved code is on github:

The bottom-up version:

# Algorithm I, Part I — 3 SUM

Q: 3-SUM in quadratic time. Design an algorithm for the 3-SUM problem that takes time proportional to N^2 in the worst case. You may assume that you can sort the N integers in time proportional to N^2 or better.

Quoting from Wikipedia: “In computational complexity theory, the 3SUM problem asks if a given set of n integers, each with absolute value bounded by some polynomial in n, contains three elements that sum to zero.” Suppose we have the following array [1, -9, 21, 2, 4, -5, -7, 10, 6, -8, -2, 0], we need to find all sets of integers (a, b, c) from the array such that a+b+c = 0

(Java Code on github at the bottom of the post. )

My thoughts and solutions:

For each pair of a and b, we are trying to find if c = -(a+b) exists in the array. So to pair a and b, that is O(N^2). And to find c in the array, with linear search is O(N). In total it is O(N^3). This is the brutal force algorithm.

How can we improve? Or we should ask where we can improve: Either the pairing of a and b or the search for c.

For the search for c, it costs O(N) time. We can only improve it to O(logN) or O(1) time.

• What search algorithm takes O(logN)? Well binary search does but it works on a sorted array.  So we sort the array first with MergeSort, which takes O(NlogN) time. Then in total we get O(N^2) * O(logN) + O(NlogN) = O(N^2logN), pairing of a and b times binary-search for c plus the sorting. So it is better, but not enough.
• What search algorithm takes O(1)? Eh, none? Maybe we are constraining ourselves too much. It does not have to be a “famous algorithm” out there. Consider the following question: can we use some help, like additional data structure? If we can, what data structure helps you find something in O(1) time? Hashtable. So we need a hashtable to save all the integers and problem solved.
• What if we cannot use additional data structure? Well for this one I didn’t figure it out so I peek at Wiki…They got a clever algorithm. I guess that part of my brain is just not wired to come up with the following idea. There are 3 cases: a + b + c = 0, > 0, and < 0.

Code on github:

# Algorithm I, Part I — Union By Height

Q: Union-by-height. Develop a union-find implementation that uses the same basic strategy as weighted quick-union but keeps track of tree height and always links the shorter tree to the taller one. Prove a lgN upper bound on the height of the trees for N sites with your algorithm.

(Java Code on github at the bottom of the post. )

My thoughts and solutions:

This one should be simple if you know how to union by size. Just change the array that holds the size of tree to an array that holds the height of the tree. And if a shorter tree connects to a taller tree, the height of the new tree is the same as the old tall tree. So when does the tree height increase? Well, the answer is that when two trees with the same height union together.

As for the upper bound of the height of the tree for N sites, let’s think about the worst case that we union objects in pairs to create trees with the same height and keep doing it until we end up with a single tree. The tree height always increases, but at most lgN times (since we are union things in pair). If on any level, we break this rule, we increase less.

Level 1: 1    2    3   4    5   6   7   8

Level 2: 1-2    3-4   5-6    7-8

Level 3: 1-2-3-4   5-6-7-8

Level 4: 1-2-3-4-5-6-7-8

Code on github:

# Algorithm, Part I — Union-find with specific canonical element

Q: Add a method find() to the union-find data type so that find(i) returns the largest element in the connected component containing i. The operations, union()connected(), and find() should all take logarithmic time or better.

For example, if one of the connected components is {1,2,6,9}, then the find() method should return 9 for each of the four elements in the connected components.

(Java Code on github at the bottom of the post. )

My thoughts and solutions:

The approach is similar to the one for the previous post. Let’s first look at some naive solutions.

1. If we implement the naive union-find as the base algorithm, the find() method would require O(N) time since we have to linearly scan all elements in the same component. So this is not acceptable.

2. So how about using the weighted quick union-find as the base? Where do we go next? Well, a connected component is represented by a tree. We could find the root of an object in O(logN) time and then use a breadth-first search to scan all objects in the tree to find the max. Unfortunately that’s not efficient enough.

3. As I mentioned, we use a similar approach–that is to keep track of things along the way of the union operations. Can we keep track of the max object in a given component and update it when doing union? In that way find() will only take constant time and the other methods will have the same time cost as before.

Well I guess we can. Code  on github: