macos – Spotlight “shady” characters in scripts copied in Terminal from PDFs

0
28
macos – Spotlight “shady” characters in scripts copied in Terminal from PDFs


FOREWORD:

The query above was deleted by the OP whereas I used to be engaged on the next reply. Not being eager on wasted effort, I managed to repeat the OP’s authentic query, and pasted it into the “new query” above. Sure… this is a bit odd 🙂


I feel what chances are you’ll be in search of is a CLI utility known as iconv. Inconveniently, iconv requires “from” and “to” argument declarations (ref man iconv) of the encoding sort (e.g. UTF-8, ascii, unicode, and so forth)… and AFAIK, “shady” isn’t a acknowledged encoding sort 🙂 Nevertheless – the encoding sort could also be decided from one other CLI utility known as file. Nonetheless extra inconveniently, each iconv and file specify that the enter be contained in a file :/

Your query intrigued me because it appears an affordable factor to do; i.e. C&P from PDF to CLI. So I spent a couple of minutes wrangling with iconv and file to get the next reply; a solution which doesn’t require you to C&P your PDF strings right into a file. <caveat>This works on my Ventura Mac underneath zsh, however it’s been examined nowhere else.</caveat>

You’ve got not offered an instance, and I used to be unable to search out any malfunctioning PDF code strings in a short search. So – as a substitute, I discovered this string in a French-language PDF on Python programming:

print(“Numéro de boucle”, i)

So – first we’ll have to run this string by means of file to find out the encoding (observe the usage of the “sprint” -: a reference to stdin in lieu of a correct filename):

echo "print("Numéro de boucle", i)" | file -
/dev/stdin: Unicode textual content, UTF-8 textual content

So – the string was encoded in UTF-8. Now let’s convert the string to ASCII from UTF-8 utilizing iconv:

NOTE: The //translit choice isn’t addressed within the macOS model of man iconv, however it nonetheless works (?!). It’s used as a flag to inform iconv to transliterate the output to the command line. Another choice is to ignore the non-ascii character(s): //ignore

echo "print("Numéro de boucle", i)" | iconv -f utf-8 -t ascii//translit
print(Num'ero de boucle, i)

And so chances are you’ll be questioning, “Why did it add the additional ' character”??. That is an excellent query, and I assume the reply has already been equipped right here. Apple could also be utilizing utf-8-mac as a substitute of utf-8. Which I suppose could be OK if that they had bothered to replicate that of their implementation of iconv! In actual fact, there’s a UTF8-MAC encoding listed within the output of iconv --list – however it would not enhance the transliteration!

As written, the iconv utility can’t correctly convert all utf-8-mac characters to ASCII. It converts those it may, and points an error for the others. To get a “finest effort” from iconv you may add the -c choice, inflicting iconv to easily drop the characters it can’t convert. If in case you have a fairly present Linux field useful, you may confirm that iconv does an accurate and correct ‘transliteration’ (//TRANSLIT) of the instance used on this reply; i.e. no further ' character.

And so, iconv appears to work at the least a few of the time in macOS… hope this helps.

LEAVE A REPLY

Please enter your comment!
Please enter your name here