PDA

View Full Version : Grid display a csv file of million rows



sj_tt6
31 Oct 2013, 6:03 PM
Hi,

I want to create a page that allows the user to load a csv file. My initial thought is the back end reads the csv file and parses data in each row (assuming delimited by , ) and put data into a json object. Then the back end send the json object to the page and I have a data grid that displays the data in the json object.
However, if the csv file has a big size, for example several millions of rows, the process would take long.

Can somebody give me advices how to solve this problem?

Thank you very much,

existdissolve
2 Nov 2013, 4:49 AM
First things first: if you have CSV files of millions of records, it's *going* to take time for your server to process it...there's little you can do about that (other than having an efficient process to parse it), and that has nothing to do with Ext JS.

Once you have the CSV file parsed into a format that can be returned to the browser, I would *strongly* suggest adopting an approach that loads the records in batches, such as using an Infinite Grid: http://docs.sencha.com/extjs/4.2.1/#!/example/grid/infinite-scroll.html

If you try to return a million records as JSON in one response, you will literally wait years for that to return, and then you'll have to wait even longer for Ext JS to try to render all that data into the grid. It simply won't work, and even if it works, it will take ages.

But if you think about it, there's no reason for this anyway. There's no way that a human can interpret millions of rows of data in one shot, so having millions of rows of data available all at once is not necessary. It would be better to develop good searching strategies that allow the user to pare down the result set into a meaningful collection.

As far as the server-side goes, you'd also probably do well to figure out a good persistence model, even it's only temporary. Parsing the same file over and over will be expensive and time-consuming, but storing the uploaded, parsed records in some format (DB, no-SQL, etc.) will definitely help with performance and also allow you to implement the searching strategies I mentioned earlier.

Good luck