FOREWORD:
The query above was deleted by the OP whereas I used to be engaged on the next reply. Not being eager on wasted effort, I managed to repeat the OP’s authentic query, and pasted it into the “new query” above. Sure… this is a bit odd 🙂
I feel what chances are you’ll be in search of is a CLI utility known as iconv
. Inconveniently, iconv
requires “from” and “to” argument declarations (ref man iconv
) of the encoding sort (e.g. UTF-8, ascii, unicode, and so forth)… and AFAIK, “shady
” isn’t a acknowledged encoding sort 🙂 Nevertheless – the encoding sort could also be decided from one other CLI utility known as file
. Nonetheless extra inconveniently, each iconv
and file
specify that the enter be contained in a file :/
Your query intrigued me because it appears an affordable factor to do; i.e. C&P from PDF to CLI. So I spent a couple of minutes wrangling with iconv
and file
to get the next reply; a solution which doesn’t require you to C&P your PDF strings right into a file. <caveat>This works on my Ventura Mac underneath zsh
, however it’s been examined nowhere else.</caveat>
You’ve got not offered an instance, and I used to be unable to search out any malfunctioning PDF code strings in a short search. So – as a substitute, I discovered this string in a French-language PDF on Python programming:
print(“Numéro de boucle”, i)
So – first we’ll have to run this string by means of file
to find out the encoding (observe the usage of the “sprint” -
: a reference to stdin
in lieu of a correct filename):
echo "print("Numéro de boucle", i)" | file -
/dev/stdin: Unicode textual content, UTF-8 textual content
So – the string was encoded in UTF-8. Now let’s convert the string to ASCII from UTF-8 utilizing iconv
:
NOTE: The
//translit
choice isn’t addressed within the macOS model ofman iconv
, however it nonetheless works (?!). It’s used as a flag to informiconv
to transliterate the output to the command line. Another choice is to ignore the non-ascii character(s)://ignore
echo "print("Numéro de boucle", i)" | iconv -f utf-8 -t ascii//translit
print(Num'ero de boucle, i)
And so chances are you’ll be questioning, “Why did it add the additional '
character”??. That is an excellent query, and I assume the reply has already been equipped right here. Apple could also be utilizing utf-8-mac
as a substitute of utf-8
. Which I suppose could be OK if that they had bothered to replicate that of their implementation of iconv
! In actual fact, there’s a UTF8-MAC
encoding listed within the output of iconv --list
– however it would not enhance the transliteration!
As written, the iconv
utility can’t correctly convert all utf-8-mac
characters to ASCII. It converts those it may, and points an error for the others. To get a “finest effort” from iconv
you may add the -c
choice, inflicting iconv
to easily drop the characters it can’t convert. If in case you have a fairly present Linux field useful, you may confirm that iconv
does an accurate and correct ‘transliteration’ (//TRANSLIT
) of the instance used on this reply; i.e. no further '
character.
And so, iconv
appears to work at the least a few of the time in macOS… hope this helps.