View the latest, best formatted and most complete version of this manual online at http://mindprod.com/application/csv.manual.html.
This package lets you read write, transform, align, change case, dedup, dump, entify, deentify, pack, patch, reshape, sort, template … comma, tab and semicolon- separated variable files, commonly known as CSV files.
The csv package consists of a 20 Java classes/utilities CSVReader, CSVWriter, CSVAlign, CSVChangeCase, CSVCondense, CSVDedup, CSVDeEntify, CSVDump, CSVEntify, CSVPack, CSVPatch, CSVReshape, CSVSort, CSVTemplate, CSVTabToComma, CSVToSRS, CSVToTable, CSVTuple, LinesToCSV and TableToCSV. It also handles tab-separated and semicolon-separated files. This is the format use by Microsoft Word and other Microsoft products. This version now supports # embedded comments.
Complete Java source is provided.
You can use them either as standalone utilities or as classes you can embed in your own Java programs.
Note that CSV files are perhaps 10 times slower to process than binary files . CSV files are for data interchange with other languages or when human-readability or editability is important. If you want speed, use binary format files, e.g. DataInputStream or possibly the convenient but slower ObjectInputStream.
I found that by converting data to CSV format, then I can often no longer need to write any programs to massage my data. I just use combinations of the CSV utility suite. In particular, I used it in hundreds of ways to extract the data from myriad web pages, Excel spreadsheets, pdfs etc to collect the information about state, county and city sales taxes used by the AmericanTax program.
## favourite, alts, quotation apple, "pear, peach", "he said ""ouch"", then stopped" # sample comment
, separates fields. Fields may contain spaces or 's.
"s enclose fields that contain ,s or "s.
"s are doubled that occur incidentally inside "s.
# starts a comment on the end of a line.
## is a special comment that provides names for the various fields separated by commas,
no quotes allowed.
I found that when composing CSV files by hand, it is best to put in the quotes even when they are not needed. This has two benefits: If I accidentally leave out a comma, the parser will detect the error. If I fail to notice a comma embedded in a field, it will still work.
Click on the name to get more information about each component. The package consists of:
Summary of CSV Utilities and Classes | |||
---|---|---|---|
Name | Use as Class? | Use as Utility? | Purpose |
CSVReader | reads a CSV file. Only of use to Java programmers. | ||
CSVWriter | writes a CSV file. Only of use to Java programmers. | ||
CSVAlign | aligns a CSV file in columns. | ||
CSVChangeCase | converts selected columns in a CSV file to lower, upper or book title case. | ||
CSVCondense | removes blank lines and condenses multiple spaces in fields to one space. | ||
CSVDedup | removes “duplicate’ records in a CSV file. | ||
CSVDeEntify | convert HTML entities back to UTF-8 characters. | ||
CSVDump | dumps out a CSV file on the console. | ||
CSVEntify | convert awkward characters to HTML entities. | ||
CSVPack | removes all excess spaces and quotes from a file. | ||
CSVPatch | patches/updates columns is a CSV file using a CSV file of from:to pairs. | ||
CSVReshape | add, remove, rearrange the order of fields in a CSV file. | ||
CSVSort | sort a CSV file alphabetically, case/case insensitively, or numerically on multiple columns. | ||
CSVTabToComma | converts a tab-separated file to a comma separated file. | ||
CSVToSRS | converts a CSV file to a Funduc Search and Replace script. | ||
CSVTemplate | Expands boilerplate with variable CSV data. | ||
CSVToTable | Converts a CSV file to an HTML table with awkward characters entified. | ||
CSVTuple | Breaks lines into fixed length tuples. | ||
LinesToCSV | converts a text file of lines into a CSV file blocking multiple lines into one CSV record. | ||
TableToCSV | Converts an HTML table to a UTF-8 CSV file with entities converted to Unicode and tags stripped. |
CSVReader reads a CSV file. CSVReader is of interest only to Java programmers. You don’t have to be a programmer to use the utilities in the csv package. Here is how you typically use CSVReader:
For more information, read the enclosed Javadoc or look at the documentation in the CSVReader and CSVWriter source on the constructors. Also look at the Java source and Javadoc for the various utilities that use CSVReader and CSVWriter.
CSVWriter writes a CSV file. CSVWriter is of interest only to Java programmers. You don’t have to be a programmer to use the utilities in the csv package. Here is how some code that shows off some of the things you can do with a CSVWriter:
CSVAlign aligns the fields in a CSV file in columns for easier proofreading. CSVAlign is done purely in RAM, so is not suitable for giant files. It does its job very quickly without any console output. It left aligns alpha data and right aligns numeric data. It does not align on decimal points.
java.exe com.mindprod.csv.CSVAlign somefile.csv
alternatively:
java.exe -jar csvalign.jar somefile.csv
or even:
csvalign.jar somefile.csv
and if you have Jet:
csvalign.exe somefile.csv
For this last technique to work, you must have the *.jar extension associated to invoke java.exe.
The output overwrites the input file, so if you have any doubts about your parameters, make a backup copy first.
There is a companion program called Align for aligning text files that are not CSV files. It breaks columns on commas or spaces, (except quoted fields) and discards the commas in the output.
Change the case of selected fields in a CSV file. List the name of the file to be modified on the command line followed by the 0-based columns to change. After each column, insert either the letter l for lower case, u for upper case or t for Book Title Case. Don’t use CSVChangeCase on columns containing entities. That will just change the case of the entity, not the case of the character the entity represents. CSVDeEntify them first, then CSVChangeCase, then CSVEntify them again.
or alternativelyRemoves “duplicate’ records in a CSV file. In the simplest case you use it like this:
java.exe com.mindprod.csv.CSVDedup somefile.csv rem ignore 0-based cols 3 and 4 when comparing for duplicates. java.exe com.mindproc.csv.CSVDedup somefile.csv 3 4
alternatively:
java.exe -jar csvdedup.jar somefile.csv
If adjacent records in the file are identical, all but the first will be eliminated from the output file. If CSV file contains identical records, but they are not adjacent, they will be left intact. To eliminate them, you must sort the file first with CSVSort to arrange that duplicates are adjacent and the one you want to keep comes first. The comparison is case sensitive, i.e. CSVDedup will not eliminate records that differ only in case.
The output overwrites the input file, so if you have any doubts about your parameters, make a backup copy first.
It is also possible to eliminate records that are almost identical. You specify a list of 0-based columns to ignore when comparing adjacent records. If the other columns match, then the record is eliminated. You could use this for example to collapse a list of countries, cities, populations to a list of just countries, by sorting by country, then doing a CSVDedup asking it to ignore columns 1 and 2 so that records with the same country but different cities or populations would be considered duplicated. You could then use CSVReshape to chop off the remaining city and population columns.
java.exe com.mindprod.csv.CSVDedup countrycitypop.csv 1 2
alternatively:
java.exe -jar csvdeup.jar countrycitypop.csv 1 2
The output overwrites the input file, so if you have any doubts about your parameters, make a backup copy first.
Like CSVPack, CSVCondense removes all excess spaces and quotes from a CSV file to make it as compact as possible, but in addition it removes blank lines and collapses multiple spaces inside fields down to one.
Here is how you use it:
java.exe com.mindprod.csv.CSVCondense somefile.csv
alternatively:
java.exe -jar csvcondense.jar somefile.csv
The output overwrites the input file, so if you have any doubts about your parameters, make a backup copy first.
CSVDeEntify converts HTML entities in a UTF-8 encoded CSV file to UTF-8 characters. You select which 0-based columns you want to be converted. The first column is column 0.
Here is how you use it:
alternatively:
The output overwrites the input file, so if you have any doubts about your parameters, make a backup copy first.
CSVDump dumps out the contents of the CSV file to the console. You might just as easily dump it out with a word processor or text editor. This is primarily a debugging tool to check that all your punctuation is correctly placed, since it will complain if the CSV file is malformed.
Here is how you use it:
alternatively:
java.exe -jar csvdump.jar somefile.csv
It leaves the input file unchanged.
CSVEntify converts awkward characters in a UTF-8 encoded CSV file to HTML entities. You select which 0-based columns you want to be converted. The first column is column 0.
Here is how you use it:
rem entify columns 0 4 and 5 java.exe com.mindprod.csv.CSVEntify somefile.csv 0 4 5
alternatively:
rem strip entities from columns 0 4 and 5 converting the to UTF-8 characters java.exe -jar csventify.jar somefile.csv 0 4 5
The output overwrites the input file, so if you have any doubts about your parameters, make a backup copy first.
CSVPack removes all excess spaces and quotes from a CSV file to make it as compact as possible.
Here is how you use it:
java.exe com.mindprod.csv.CSVPack somefile.csv
alternatively:
java.exe -jar csvpack.jar somefile.csv
The output overwrites the input file, so if you have any doubts about your parameters, make a backup copy first.
CSVPatch patches/updates 0-based columns in a CSV file using a CSV file of from:to pairs. It is like a column-limited, case-sensitive, multiple source-target pair, search/replace that operates on a single CSV file.
Here is how you use it:
alternatively:
java.exe -jarl csvpatch somefiletopatch.csv somefileoffromtopairs.csv 0 3 2
You specify the columns you want to patch. The column numbers are 0-based (the first column in column 0 ). Typically this would be used to update a column of URLs with new values for some of the URLS, The from-to pairs of old and new values live in the second file. They are applied to each field in the column in the first file to bring those values up to date. There must be a perfect case-insensitive match of the complete field for any replacement to take place. Fields are stripped of lead and trailing spaces before comparing/replacing. This utility is useful in conjunction with the program BrokenLinks to automatically refresh URLs in CSV files that have recently been redirected. The patch file must have at least two complete columns. If there are duplicates in column 0, the first duplicate will be the one used.
CSVReshape adds, removes and rearranges the order of fields in a CSV file.
Here is how you use it:
java.exe com.mindprod.csv.CSVReshape somefile.csv 0 3 2
alternatively:
java.exe -jar csvreshape.jar somefile.csv 0 3 2
You specify the columns you want retained in the order you want them to appear. The column numbers are 0-based (the first column in column 0). Any columns you do not mention will be dropped. You can duplicate columns. You can create new empty columns simply by using a column number larger than any column you used.
The output overwrites the input file, so if you have any doubts about your parameters, make a backup copy first.
CSVSort sorts a CSV file alphabetically, case/case insensitively or numerically on multiple columns. It works by reading all the data into memory, sorting it, then writing it out. The size of the file it can sort it thus limited by your virtual RAM capacity.
Use it like this:
java.exe com.mindprod.csv.CSVSort somefile.csv 0s+ 2i- 5n+ 3x+
alternatively:
java.exe -jar csvsort.jar somefile.csv 0s+ 2i- 5n+ 3x+
You give a list of the columns you want sort, 0-based.
s | case sensitive (lower case are sorted after upper case) |
i | case insensitive (lower and upper case are intermixed as equivalent) |
l | sort by the length of the field rather than the contents. Longer fields sort later. |
n | numeric with or without decimal point. Empty fields are treated as 0. |
x | hex (e.g. 08f6b or \ue7ff or 0x123456789abcdef up to 64-bits) |
+ | ascending |
- | descending |
99 | 0-based column number. First column in column 0. |
The output overwrites the input file, so if you have any doubts about your parameters, make a backup copy first. CSVSort preserves the position of # comments at the top and bottom of the file. # comments just prior to a data record are moved to stay just ahead of the data record in its new sorted position. Files need not be aligned prior to sorting.
The sort is done purely in RAM, son is not suitable for giant files. The sort produces packed, not aligned files. If you want aligned, you will need to use CSVAlign on the sorted result.
CSVTabToComma Converts a tab-separated file to a comma separated file.
Here is how you use it:
java.exe com.mindprod.csv.CSVTabToComma somefile.csv
alternatively:
java.exe -jar csvtabtocomma.jar somefile.csv
The output overwrites the input file, so if you have any doubts it will work, make a backup copy first.
Converts a CSV file to a Funduc Search and Replace script. It converts a CSV file of string pairs, source/target to produce a corresponding *.srs file that you can tweak to create Funduc Search Replace Script to create a script to do bulk search/replaces over many files. It is of no use if you don’t own Funduc Search/Replace, other than as a example for your own utilities to export CSV data to other formats.
java.exe com.mindprod.csv.CSVToSRS somepairs.csv
alternatively:
java.exe -jar csvtosrs.jar somepairs.csv
In this example, the output will be somepairs.srs in the same directory as somepairs.csv. You don’t explicitly specify the name of the output file. The input file is left unchanged.
Expands boilerplate with variable CSV data. The output appears in a file with the same name as the input file and the same extension as the template file. Be careful. If you called files script.csv and script.txt the output would go into script.txt, overwriting your template. The template file is just a text file with strings of the form %0 %1… emebbed in it. Each line expands the template replacing the %x string with the value of the corresponding column. The template is expanded once for each line in the CSV file, with all the expansions glued together in the output file.
or alternativelyConverts a CSV file to an HTML table with awkward characters entified. It takes a *.csv file and produces a correspondingly named *.html file.
java.exe com.mindprod.csv.CSVToTable somefile.csvor alternatively
java.exe -jar csvtotable.jar somefile.csv
Converts a CSV file into fixed length tuples. If your file looked like this, and you asked for tuples of length 4 you would see this transformation:
a,b,c,d,e,fjava.exe com.mindprod.csv.CSVTuple somefile.csv 4or alternatively
java.exe -jar tuple.jar somefile.csv 4
You can specify how many fields you want included in each tuple. Short lines are padded with empty fields.
converts a text file of lines into a CSV file blocking multiple lines into one CSV record.
java.exe com.mindprod.csv.LinesToCSV somefile.txt 3
alternatively:
java.exe -jar linestocsv.jar somefile.txt 3
You specify the number of lines that should be grouped together as fields on a single CSV file line. The text file should contain an even multiple of this number of lines.
Note that the input file is not a CSV file, but an ordinary *.txt text file. The output is a CSV file with a matching name, e.g. somefile.csv in this example.
Let us say the input file somefile.txt looked like this: The first line is the name of a species, the second is the number of legs, and the third, the colour. This pattern of three lines repeats for each animal.
lion 4 brown ostrich 2 black and white zebra 4 striped black and white
The output somefile.csv would look like this:
lion,4,brown ostrich,2,black and white zebra,4,striped black and white
Converts a an HTML file containing tables to a UTF-8 CSV with entities converted to their equivalent Unicode and tags stripped. It takes a *.html file and produces a correspondingly named *.csv file.
java.exe com.mindprod.csv.TableToCSV somefile.htmlor alternatively
java.exe -jar tabletocsv.jar somefile.html
CSVAlign will align any ## comments.
Eventually I would like to figure out a way to control all the utilities using field names rather than field numbers. It takes two passes though the input, but does not require holding all the data in RAM, thus it can handle arbitrarily large files.
Package | Version | Released | Licence | Language | Notes | ||
---|---|---|---|---|---|---|---|
![]() CSVReader/Writer |
5.9 | 2011-05-11 | free | Java |
![]() ![]() First install the most recent Java. To install, extract the zip download with WinZip, (or similar unzip utility) into any directory you please, often J:\ — ticking off the “user folder names” option. To check out the corresponding source from the Subversion repository, use the TortoiseSVN repo-browser to After you have installed the jar, you can run it as an application. Type: java -jar J:\com\mindprod\csv\csv.jar parms
adjusting as necessary to account for where the jar file is. download ASP PAD XML program description for the current version of CSVReader/Writer. CSVReader/Writer is free. Full source included.
You may even include the source code, modified or unmodified
in commercial programs that you write and distribute. Non-military use only. |
||
![]() ![]() |
You can get the freshest copy of this page from: | or possibly from your local J: drive (Java virtual drive/mindprod.com website mirror) |
http://mindprod.com/application/csv.manual.html | J:\mindprod\application\csv.manual.html | |
![]() | ![]() ![]() | |
Canadian Mind Products | ||
mindprod.com IP:[65.110.21.43] | ||
view Blog | Your face IP:[] | |
Feedback | You are visitor number . | The information on this page is for non-military use only. Military use includes use by defence contractors. |