Find missing numbers in 4 billion unique numbers with 50MB RAM

Carvia Tech | December 08, 2019 | 2 min read | 12 views


4 billion numbers can not fit into 50MB RAM. Even if we take a bitset of 4 billion bits, it will need 4 GB RAM. So we need a multi-pass solution.

Solution

  1. To solve this limited memory problem, we will create buckets of integers, each bucket covering 10,000 integers.

  2. So first bucket will cover 0-9999, second bucket will cover 10,000-19,999 and so on.

  3. To cover 4 x 109 integers, we will need 4x109 / 104 = 4x105 buckets

  4. We will store count of numbers (not actual numbers) in each bucket, that way we will save lot of memory at the cost of extra passes. If no number is missing in a bucket, then the number of integers in that bucket would be 10, 000

  5. After processing all 4 billion integers, we will make a second pass to figure out the missing numbers in those buckets that have less than 10,000 items.

Time Complexity of approach: 3xO(n)

For simplicity purpose, i am providing code with much lower values, so that it can be easily tested and understood.

Find Missing Number with Limited RAM
import java.util.*;
import java.util.stream.LongStream;

public class Problem2 {

    public static final int SINGLE_BUCKET_SIZE = 10_000;
    public static final int TOTAL_BUCKETS = 400;

    final int[] buckets = new int[TOTAL_BUCKETS];
    Set<Integer> bucketsWithProblem = new HashSet<>();
    Map<Integer, BitSet> bucketData = new HashMap<>();

    public void countItemsInEachBucket(long input) {
        int bucketId = (int) (input / SINGLE_BUCKET_SIZE);
        buckets[bucketId]++;
    }

    public void findBucketsWithLesserItems() {
        for (int i = 0; i < buckets.length; i++) {
            if (buckets[i] < SINGLE_BUCKET_SIZE) {
                bucketsWithProblem.add(i);
                System.out.println("Bucket has a problem: " + i + ", total items =" + buckets[i] + ", missing items = " + (SINGLE_BUCKET_SIZE - buckets[i]));
            }
        }
    }

    public void locateMissingNumbers(long input) {
        int bucketId = (int) (input / SINGLE_BUCKET_SIZE);
        if (bucketsWithProblem.contains(bucketId)) {
            bucketData.computeIfAbsent(bucketId, BitSet::new).set((int) (input % SINGLE_BUCKET_SIZE));
        }

    }

    public void printMissingNumbers() {
        bucketData.forEach((integer, bitSet) -> {
            int nextClearBit = 0, lastClearBit = 0;
            do {
                nextClearBit = bitSet.nextClearBit(lastClearBit);
                if (nextClearBit < SINGLE_BUCKET_SIZE) {
                    int missingNumber = integer * SINGLE_BUCKET_SIZE + nextClearBit;
                    System.out.println("missing number = " + missingNumber);
                } else break;
                lastClearBit = nextClearBit + 1;
            } while (lastClearBit < SINGLE_BUCKET_SIZE);
        });

    }

    public static void main(String[] args) {
        HashSet<Long> missingNumbers = new HashSet<>(10);
        missingNumbers.add(10L);
        missingNumbers.add(100L);
        missingNumbers.add(900L);
        missingNumbers.add(15L);
        missingNumbers.add(3999L);

        Problem2 problem2 = new Problem2();
        LongStream.range(0, 4_000_000L).filter(value -> !missingNumbers.contains(value)).forEach(problem2::countItemsInEachBucket);
        System.out.println("Pass 1 is complete now");
        problem2.findBucketsWithLesserItems();
        System.out.println("Pass 2 is complete now");
        LongStream.range(0, 4_000_000L).filter(value -> !missingNumbers.contains(value)).forEach(problem2::locateMissingNumbers);
        System.out.println("Pass 3 is complete now");
        problem2.printMissingNumbers();
        System.out.println("Pass 4 is complete now");
    }
}

Top articles in this category:
  1. can we write a java method that swaps two integers
  2. How will you calculate factorial of a large number in Java
  3. Set precision and scale for a double value in java
  4. How to implement Thread pool in Java without executor framework
  5. Given a collection of 1 million integers, All ranging between 1 to 9, how would you sort them in Big O(n) time
  6. Generate Random Numbers in a range using Java 8
  7. How to find if the Linked List contains any Cycle/Loop


Find more on this topic:
Core Java image
Core Java

Core Java - OOP Concepts, Garbage Collection, Multi-threading, Collections Framework, Java 8 Features, Lambda Functions, Streams.

Last updated 1 week ago


Recommended books for interview preparation:

This website uses cookies to ensure you get the best experience on our website. more info