5
I need to read a file in python pandas of the following type
"column1","column2","column3","column4"
"value1","value,1","value2","value3"
"value5","value6","value7","value8"
"value32","value21","value,31","value,44"
I tried using
file1 = pd.read_csv('sample.txt',sep=',\s+',skipinitialspace=True,quoting=csv.QUOTE_ALL,engine=python)
it says something like ValueErro(Expected some lines got something else ) not exactly
I need to read a large CSV file of this type and load it to dataframe. what changes should i make to read it correctly.
I think you need to use
sep=',\s*'
instead ofsep=',\s+'
. As about comma inside quoted value (as it is a case for"value,31"
) it comply with rfc4180 and shouldn't be an issue – Alex – 2017-03-03T11:01:31.270Earlier it was showing**ValueError('Expected 1 fields in line 328, saw4',) **and after changing it to * it shows ValueError('Expected 1 fields in line 328, saw6',) – Ajay K S – 2017-03-03T12:00:26.787
It looks like the issue with source data. Check that line 328 in source data file – Alex – 2017-03-03T12:27:50.210
I am sorry i haven't mentioned about that, I have checked it and found that there is an extra comma inside double quotes. i removed it manually and the code works fine. But i cannot do this all time, how can I change code to handle the situation. There is another problem that inside the double quotes for one value there was another " " it also make the program to exit. – Ajay K S – 2017-03-03T12:43:35.267
comma inside double quotes is Ok. As about
" "
- you need to clean up source file before processing. If double quotes stay together as""
it shouldn't be an issue too because it comply with CSV standard, it calls escaped double quotes. If there is a space between double quotes then runsed -r 's/\"\s+\"/\"\"/g' src.csv >cleared.csv
before you feeding CSV to pandas. It will remove space between quotes or runsed -r 's/\"\s+\"//g' src.csv >cleared.csv
to remove internal quotes completely – Alex – 2017-03-03T13:09:13.557Thanks @Alex, i cleaned the data data and now it works well. Thanks for the sed. – Ajay K S – 2017-03-03T14:12:50.370
No problem, glad I was able to help you. I summarized everything in answer, so may be someone will find it helpful too – Alex – 2017-03-03T14:41:19.907
single quote is missing
pd.read_csv( ...... engine='python')
– Dipankar Nalui – 2018-11-23T10:01:13.597