Can the paste command correctly put two unicode files side by side without duplicating any unicode BOM?


This is the version of paste that I am using.

C:\cygwin\bin>.\paste.exe --version
paste (GNU coreutils) 8.26
Packaged by Cygwin (8.26-2)
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.
This is free software: you are free to change and redis
There is NO WARRANTY, to the extent permitted by law.

Written by David M. Ihnat and David MacKenzie.


I am not sure if it's most up to date, as I don't see paste listed here, which is where I guess i'd look to update

enter image description here

But I have the paste command installed in cygwin.

But it's not working.. It is inserted extra characters

xxd -p is a command that shows hex.

file1.txt and file2.txt are two UTF-8 files

C:\cro\a>file file1.txt
file1.txt: UTF-8 Unicode (with BOM) text, with no line terminators

C:\cro\a>file file2.txt
file2.txt: UTF-8 Unicode (with BOM) text, with no line terminators

file1.txt has the code for utf-8 bom, that's EFBBBF followed by the hex for the letters 'aaa'. file2 has the text bbb.

C:\cro\a>xxd -p file1.txt

C:\cro\a>xxd -p file2.txt

We see that here. Don't worry about the ´╗┐ that's just cmd trying to show UTF-8 BOM. That's not the issue I have.

C:\cro\a>type file1.txt
C:\cro\a>type file2.txt

The problem is that the paste command, when I use it to try to put file1 and file2 side by side, as you see looking at the hex, it has duplicated the unicode bom (efbbbf), and it shouldn't.

C:\cro\a>paste file1 file2 >a.a
paste: file1: No such file or directory

C:\cro\a>paste file1.txt file2.txt >a.a

C:\cro\a>type a.a
´╗┐aaa  ´╗┐bbb

C:\cro\a>xxd -p a.a


Is there a later version of paste for windows that doesn't do that? Or does this problem exist even in the latest linux version of paste.. And is there a way around it?

It the meantime i'll encode utf-8 files without the BOM, before using paste.


Posted 2017-12-11T15:11:32.337

Reputation: 18 677



Past belongs to coreutils.

You can use the search on the website

or cygcheck -p bin/paste

As of course it does not care of the encoding if you have two BOM you will have a duplicate.


Posted 2017-12-11T15:11:32.337

Reputation: 1 662