I had to do a number of Google searches to figure out how to do this, so I thought I would bring together everything I learned into a single post in case it would help someone else with a similar scenario.
Experience: Moderate
Step 1: Split the file into multiple files, each 1,000,000 rows long.
split -l 1000000 filename.csv filename-
This will give you multiple files with filenames like filename-aa, filename-ab, filename-ac, etc.
Step 2: Move the large file (filename.csv) to another directory. You don't need it anymore, but you'll want to hang on to it until everything's as you want.
Step 3: Rename all the files as CSV files.
for i in *; do mv "$i" "$i.csv"; done
Step 4: Open filename-aa.csv with Excel. Copy the first row (the headers) into a new worksheet. Save as "header.csv" Close filename-aa.csv (don't save).
Step 5: Move filename-aa.csv to a different directory. We don't want to add the header to it a second time.
Step 6: Open header.csv with a text editor to make sure there's a "return" at the end of the row. (If you move to the last character in your header and press right-arrow, does it drop to a new line? If not, press return or enter and then save. If so, you're good.)
Step 7: Rename header.csv to header.txt - it will warn you, but it's ok.
Step 8: Add the header to all of the other files
for i in *.csv; do mv "$i" tmp; cat header.txt tmp > "$i"; done
Step 9: Move filename-aa.csv back into the directory
You should now have a bunch of files, each with a header and 1,000,000 rows. (And a file called "tmp" that you can safely delete.)
No comments:
Post a Comment