What Do You Do if the Data Input Does Not Fit in Memory in Computer Programming?

Problem scenario
You have a massive amount of data to process. The amount is too big for the server you have. How do you write a program or algorithm to handle a very large set of data?

Possible Solution #1
Logically divide the memory by writing portions of the data to files. This is called "chunking" the data. Serially/sequentially process batches. Use external sorting with subfiles in parallel with multiple hard disks if the data set is very large.

Possible Solution #2
Try to use the C programming language to allocate memory more efficiently.

Possible Solution #3
Persist the data in a data store such as a SQL or NoSQL solution.

Possible Solution #4
Manipulate the data with encoding so there is symbolic or nested data that represents the original data with a smaller set. Sometimes ordering data can obviate retrieving a payload. Try using deduplication of the data (such as Hadoop and MapReduce jobs).

Possible Solution #5
Refactor the code's logic to use O(1) memory space.

The following sentences are indirectly related to the exact problem. If you can generate values dynamically rather than build up a set of data, this can help. Memoization can help use less memory in general. Breadth-first-searches can use less memory than depth first searches (according to this external site).

Possible Solution #6
Normally memory constraints are dealt with at the software level -- not at the hardware level. One solution could be to add more memory; see this page for more information on how to do this (even if your server is in the public cloud).

Possible Solution #7 (an extension of #1)
If you are using PowerShell and want to write to disk instead of downloading a file to memory first, you can see this posting.

Leave a comment

Your email address will not be published. Required fields are marked *