Excel CSV. file with more than 1,048,576 rows of d

2019-01-17 00:20发布

问题:

I have been given a CSV file with more than the MAX Excel can handle, and I really need to be able to see all the data. I understand and have tried the method of "splitting" it, but it doesnt work.

Some background: The CSV file is an Excel CSV file, and the person who gave the file has said there are about 2m rows of data.

When I import it into Excel, I get data up to row 1,048,576, then re-import it in a new tab starting at row 1,048,577 in the data, but it only gives me one row, and I know for a fact that there should be more (not only because of the fact that "the person" said there are more than 2 million, but because of the information in the last few sets of rows)

I thought that maybe the reason for this happening is because I have been provided the CSV file as an Excel CSV file, and so all the information past 1,048,576 is lost (?).

DO I need to ask for a file in an SQL database format?

回答1:

You should try delimit it can open up to 2 billion rows and 2 million columns very quickly has a free 15 day trial too. Does the job for me!



回答2:

I would suggest to load the .CSV file in MS-Access.

With MS-Excel you can then create a data connection to this source (without actual loading the records in a worksheet) and create a connected pivot table. You then can have virtually unlimited number of lines in your table (depending on processor and memory: I have now 15 mln lines with 3 Gb Memory).

Additional advantage is that you can now create an aggregate view in MS-Access. In this way you can create overviews from hundreds of millions of lines and then view them in MS-Excel (beware of the 2Gb limitation of NTFS files in 32 bits OS).



回答3:

First you want to change the file format from csv to txt. That is simple to do, just edit the file name and change csv to txt. (Windows will give you warning about possibly corrupting the data, but it is fine, just click ok). Then make a copy of the txt file so that now you have two files both with 2 millions rows of data. Then open up the first txt file and delete the second million rows and save the file. Then open the second txt file and delete the first million rows and save the file. Now change the two files back to csv the same way you changed them to txt originally.



回答4:

Excel 2007+ is limited to somewhat over 1 million rows ( 2^20 to be precise), so it will never load your 2M line file. I think that the technique you refer to as splitting is the built-in thing Excel has, but afaik that only works for width problems, not for length problems.

The really easiest way I see right away is to use some file splitting tool - there's tons of 'em and use that to load the resulting partial csv files into multiple worksheets.

ps: "excel csv files" don't exist, there are only files produced by Excel that use one of the formats commonly referred to as csv files...



回答5:

You can use PowerPivot to work with files of up to 2GB, which will be enough for your needs.



回答6:

Try using Open Refine. It has been able to handle datasets that otherwise crashed Excel for me.



回答7:

I'm surprised no one mentioned Microsoft Query. You can simply request data from the large CSV file as you need it by querying only that which you need. (Querying is setup like how you filter a table in Excel)

Better yet, if one is open to installing the Power Query add-in, it's super simple and quick. Note: Power Query is an add-in for 2010 and 2013 but comes with 2016.



回答8:

If you have Matlab, you can open large CSV (or TXT) files via its import facility. The tool gives you various import format options including tables, column vectors, numeric matrix, etc. However, with Matlab being an interpreter package, it does take its own time to import such a large file and I was able to import one with more than 2 million rows in about 10 minutes.

The tool is accessible via Matlab's Home tab by clicking on the "Import Data" button. An example image of a large file upload is shown below: Once imported, the data appears on the right-hand-side Workspace, which can then be double-clicked in an Excel-like format and even be plotted in different formats.



回答9:

I would strongly recommend you import the data into Access so you can then query it from inside access. You could try to use R to query you file as well, which I'd be more than happy to help with. Otherwise, you could look at a free solution such as this product, which allows you to run SQL statements from within an Excel file. http://www.querystorm.com/Home/Guide



回答10:

Use MS Access. I have a file of 2,673,404 records. It will not open in notepad++ and excel will not load more than 1,048,576 records. It is tab delimited since I exported the data from a mysql database and I need it in csv format. So I imported it into Access. Change the file extension to .txt so MS Access will take you through the import wizard.

MS Access will link to your file so for the database to stay intact keep the csv file



回答11:

I was able to edit a large 17GB csv file in Sublime Text without issue (line numbering makes it a lot easier to keep track of manual splitting), and then dump it into Excel in chunks smaller than 1,048,576 lines. Simple and quite quick - less faffy than researching into, installing and learning bespoke solutions. Quick and dirty, but it works.



回答12:

"DO I need to ask for a file in an SQL database format?" YES!!!

Use a database, is the best option for this problem.

Excel 2010 specifications .



回答13:

Split the CSV into two files in Notepad. It's a pain, but you can just edit each of them individually in Excel after that.