macos – How can I hilight or course of Unicode characters when pasting to Terminal apps from PDFs?

0
26
macos – How can I hilight or course of Unicode characters when pasting to Terminal apps from PDFs?


FOREWORD:

The query above was deleted by the OP whereas I used to be engaged on the next reply. Not being eager on wasted effort, I managed to repeat the OP’s authentic query, and pasted it into the “new query” above. Sure… this is a bit odd 🙂


I believe what you might be on the lookout for is a CLI utility known as iconv. Inconveniently, iconv requires “from” and “to” argument declarations (ref man iconv) of the encoding sort (e.g. UTF-8, ascii, unicode, and many others)… and AFAIK, “shady” is just not a acknowledged encoding sort 🙂 Nevertheless – the encoding sort could also be decided from one other CLI utility known as file. Nonetheless extra inconveniently, each iconv and file specify that the enter be contained in a file :/

Your query intrigued me because it appears an inexpensive factor to do; i.e. C&P from PDF to CLI. So I spent a couple of minutes wrangling with iconv and file to get the next reply; a solution which doesn’t require you to C&P your PDF strings right into a file. <caveat>This works on my Ventura Mac beneath zsh, but it surely’s been examined nowhere else.</caveat>

You have not offered an instance, and I used to be unable to search out any malfunctioning PDF code strings in a quick search. So – as an alternative, I discovered this string in a French-language PDF on Python programming:

print(“Numéro de boucle”, i)

So – first we’ll have to run this string by means of file to find out the encoding (be aware using the “sprint” -: a reference to stdin in lieu of a correct filename):

echo "print("Numéro de boucle", i)" | file -
/dev/stdin: Unicode textual content, UTF-8 textual content

So – the string was encoded in UTF-8. Now let’s convert the string to ASCII from UTF-8 utilizing iconv:

NOTE: The //translit choice is just not addressed within the macOS model of man iconv, but it surely nonetheless works (?!). It’s used as a flag to inform iconv to transliterate the output to the command line. Another choice is to ignore the non-ascii character(s): //ignore

echo "print("Numéro de boucle", i)" | iconv -f utf-8 -t ascii//translit
print(Num'ero de boucle, i)

And so you might be questioning, “Why did it add the additional ' character”??. That is an excellent query, and maybe the reply is right here. Apple could also be utilizing utf-8-mac as an alternative of utf-8. Which I suppose can be OK if that they had bothered to mirror that of their implementation of iconv! In actual fact, there’s a UTF8-MAC encoding listed within the output of iconv --list – but it surely would not enhance the transliteration:

echo 'print("Numéro de boucle", i)' | iconv -f utf8-mac -t ascii//translit
print("Num'ero de boucle", i)  

echo 'print('Numéro de boucle', i)' | iconv -f utf-8-mac -t ascii//translit
print(Num'ero de boucle, i)

As written, the iconv utility for macOS Ventura can not correctly convert all utf-8 characters to ASCII. It converts those it will probably, and points an error (or inserts inappropriate characters) for the others. To get a “finest effort” from iconv you may add the -c choice, inflicting iconv to easily drop the characters it can not convert.

As an experiment: If in case you have a fairly present Linux field helpful, you may strive iconv on the instance phrase right here. After I tried this on my Linux programs (two variations of Debian; ‘bookworm’ & ‘bullseye’), I discovered that iconv did a wonderfully appropriate and correct ‘transliteration’ (//TRANSLIT) of the instance used on this reply (and a number of other others); i.e. no additional ' character.

These outcomes might be improved with a sed “filter”:

echo 'print("Numéro de boucle", i)' | iconv -f utf-8 -t ascii//translit | sed 's/[^a-zA-Z 0-9 , ( )]//g'

However having to make use of sed to enhance iconv strikes me as an unpleasant hack – one which must be pointless.

And so, iconv appears to work at the very least a number of the time in macOS… hope this helps.

LEAVE A REPLY

Please enter your comment!
Please enter your name here