Tag Archives: Array data structure

Randomly Draw k unique integers out of an array of N unique integers

Given an array of n unique integers (1 to n), write an algorithm to draw a random combination of k distinct numbers (n >= k). (This problem comes from Core Java Vol I as an example.)

Unknown: A way to draw k distinct integers out of  n distinct integers.

Data: An array of integers 1 to n.

Constraint: k numbers must be distinct and randomly picked.

A straightforward solution would be:

  1. Randomly pick one number out of the an array
  2. If this number is not picked before, add it to the result. Return the result if we have k numbers.
  3. Otherwise, back to step 1.

Q: So what is the time complexity of this solution?

A: If we are unlucky, in the worst case, O(k^2) and if k close to n, O(n^2).

Q: How so?

A: At some point, we will have problem selecting a number that’s not in the result set.  The first number is easy, just once. The second, if unlucky, twice. The third, if unlucky, 3 times since the first 2 times picked something in the result set…so up to k numbers, it can take 1+2+3+…+k picks which is approximately O(k^2). If k is close to n, then we have a O(n^2) algorithm. check the link at the bottom for the code.

Q: Alright, can we make it faster? Say let’s make it O(n) time and you cannot use additional space except for the result set.

A: Hmm, the bottleneck of the previous solution is that every time we pick a number, we have to check if it exists in the result set. If it does, we have to go back and pick again. If we can skip this step it will be faster.

Q: What do you mean by skipping this step?

A: I mean that every time we pick a number, it is guaranteed not picked before.

Q: How do we do that?

A: Hmm. we need to keep track of what has not been picked instead. Since we cannot use additional space, I assume that we have to do something on the original array. I can replace the picked number with some special value like n+1, but this sounds useless since if I happened to pick this number, I would have to choose again, exactly like before. I don’t know…

Q: OK, in what situation can we safely draw an unpicked number?

A: If the array only contains unpicked numbers, we can do that safely. But again, I don’t think we can recreate a new array to exclude the picked one in every pass. That’s O(n^2) again.

Q: True. So why can’t we draw the numbers safely now? What’s the matter?

A: Because there are picked values between unpicked ones.

Q: Good. You mentioned about excluding them. Is there a way to do that without creating a new array?

A: I suppose I can re-arrange the array? For example, if I picked the number at index i, I can move the numbers from i+1 to n-1 forward. But then I should pick a random index between 0 to n-1 (exclusive). Wait, this is still O(n^2)…

Q: Do we have to move all the elements after index i? Can we reduce this O(n) move to a O(1) move?

A: O(1)? So I should move only 1 element instead. But which one…

Q: Let’s use an example: 1,2,3,4,5 Say we picked 3. In your case, we change the array to 1,2,4,5,5 and then we pick from index 0 to 3 next time. We do the move because we want to make sure next time we are choosing from 1,2,4,5. So is there another way to do it?

A: Yes! I can move the last element to that position to achieve the same effect! So every time after the pick, I move the last element within the current range to the picked position then reduce the range by 1.

Q: That’s right 🙂

Link to the code: https://gist.github.com/jingz8804/53955bbaf817a6c2e179

 

Search in a bitonic array

Q: Search in a bitonic array. An array is bitonic if it is comprised of an increasing sequence of integers followed immediately by a decreasing sequence of integers. Write a program that, given a bitonic array of N distinct integer values, determines whether a given integer is in the array.

  • Standard version: Use ∼3lgN compares in the worst case.
  • Signing bonus: Use ∼2lgN compares in the worst case (and prove that no algorithm can guarantee to perform fewer than ∼2lgN compares in the worst case).

(Java Code on github at the bottom of the post. )

My thoughts and solutions:

Well, we could just go for linear search but that takes O(N) time. The question requires O(lgN). The only search with O(N) time I know is binary search, which requires a sorted array. We have a bitonic array. Well in some sense it consists of two sorted parts with a turning point in between. For example:

1, 2, 4, 7, 10, 9, 5, 3, 0

So if I know where the turning point is, I could just apply binary search on both sorted parts. That takes 2lgN. How do we find the turning point? Again, we could use linear search with O(N) time, but then there’s no point of doing any of these things above. The standard version is 3lgN. We used 2lgN, so I guess finding the turning point uses the left lgN time, with a binary search.

But how do we apply binary search to find the turning point? Think about the characteristics of a turning point. Say we look at the element at index I:

  1. If  A[I-1] < A[I] < A[I+1], A[I] is not the turning point and we are stilling in the ascending sequence;
  2. If A[I-1] > A[I] > A[I+1], A[I] is not the turning point and we are in the descending sequence;
  3. If A[I-1] < A[I] > A[I+1], then A[I] is the turning point.

Then we can apply binary search on it and instead of looking for a specific element, we look for the element with the 3rd characteristic. In this way, we can find the turning point and the rest is easy.

As for the 2lgN, I’m still working on it…Probably I have to eliminate one binary search here, but which one and how?

Code on github:

Enhanced by Zemanta

Algorithms, Part I — Social Network Connectivity (With Union-Find)

Q: Given a social network containing N members and a log file containing M timestamps at which times pairs of members formed friendships, design an algorithm to determine the earliest time at which all members are connected (i.e., every member is a friend of a friend of a friend … of a friend). Assume that the log file is sorted by timestamp and that friendship is an equivalence relation. The running time of your algorithm should be MlogN or better and use extra space proportional to N.

(Java Code on github at the bottom of the post. )

My thoughts and solutions:

When solving dynamic connectivity problems (like whether there is a path between two nodes but we don’t care what the exact path is), we can try using the Union-Find algorithm. And this is a typical application of union-find in social network.

This question asks if we can determine the earliest time at which all members are connected, which means very time we union two objects, we have to check if all points are connected.

Well, if we use the naive Union-Find algorithm as a base, to check if all objects are connected is to check if they have the same group label in the array and that will take O(N) time for each union. So M unions take O(MN), which is unacceptable.

So we move on to a better version of Union-Find, the Weighted Union-Find algorithm. It uses a tree structure to represent the connected components. So if all members are connected, there should be only one root element left, which means there is only one array element at index i such that array[i] == i. OK, so we go and check how many elements satisfy this condition after each union. However, this takes O(N) time for each check too.

Well, all we care is the number of roots left, right? Remember initially there are N objects and each represents its own tree. So we have N roots at the beginning. Each time we union, we get one less root. So, why not use a variable count to track the number of roots and change its value along the way of doing the union operations? And to check if all members are connected, we just check if count == 1? This takes constant time and O(MlogN + M) = O(M(logN + 1)) = O(MlogN).

===============================================================

Some other thoughts:

Maybe we can work from this direction too: we know that Weighted Quick Union-Find takes O(MlogN) time for M union operations. Now we are doing something extra and we have to keep the time cost on the same level, then it has to be something constant. Now where can we fit in something constant and get the job done?

===============================================================

What else can we learn from Union-Find algorithm? Well at least one thing: to represent connected components, we can use an array. Elements in the same component share the same component id (same integer) or we represent them as a tree. This also suggests how to represent a tree in an array or when you have a tree, what data structure you can use to hold it.

Code  on github:

Enhanced by Zemanta