FOREWORD:
The query above was deleted by the OP whereas I used to be engaged on the next reply. Not being eager on wasted effort, I managed to repeat the OP’s authentic query, and pasted it into the “new query” above. Sure… this is a bit odd 🙂
I believe what you might be on the lookout for is a CLI utility known as iconv
. Inconveniently, iconv
requires “from” and “to” argument declarations (ref man iconv
) of the encoding sort (e.g. UTF-8, ascii, unicode, and many others)… and AFAIK, “shady
” is just not a acknowledged encoding sort 🙂 Nevertheless – the encoding sort could also be decided from one other CLI utility known as file
. Nonetheless extra inconveniently, each iconv
and file
specify that the enter be contained in a file :/
Your query intrigued me because it appears an inexpensive factor to do; i.e. C&P from PDF to CLI. So I spent a couple of minutes wrangling with iconv
and file
to get the next reply; a solution which doesn’t require you to C&P your PDF strings right into a file. <caveat>This works on my Ventura Mac beneath zsh
, but it surely’s been examined nowhere else.</caveat>
You have not offered an instance, and I used to be unable to search out any malfunctioning PDF code strings in a quick search. So – as an alternative, I discovered this string in a French-language PDF on Python programming:
print(“Numéro de boucle”, i)
So – first we’ll have to run this string by means of file
to find out the encoding (be aware using the “sprint” -
: a reference to stdin
in lieu of a correct filename):
echo "print("Numéro de boucle", i)" | file -
/dev/stdin: Unicode textual content, UTF-8 textual content
So – the string was encoded in UTF-8. Now let’s convert the string to ASCII from UTF-8 utilizing iconv
:
NOTE: The
//translit
choice is just not addressed within the macOS model ofman iconv
, but it surely nonetheless works (?!). It’s used as a flag to informiconv
to transliterate the output to the command line. Another choice is to ignore the non-ascii character(s)://ignore
echo "print("Numéro de boucle", i)" | iconv -f utf-8 -t ascii//translit
print(Num'ero de boucle, i)
And so you might be questioning, “Why did it add the additional '
character”??. That is an excellent query, and maybe the reply is right here. Apple could also be utilizing utf-8-mac
as an alternative of utf-8
. Which I suppose can be OK if that they had bothered to mirror that of their implementation of iconv
! In actual fact, there’s a UTF8-MAC
encoding listed within the output of iconv --list
– but it surely would not enhance the transliteration:
echo 'print("Numéro de boucle", i)' | iconv -f utf8-mac -t ascii//translit
print("Num'ero de boucle", i)
echo 'print('Numéro de boucle', i)' | iconv -f utf-8-mac -t ascii//translit
print(Num'ero de boucle, i)
As written, the iconv
utility for macOS Ventura can not correctly convert all utf-8
characters to ASCII. It converts those it will probably, and points an error (or inserts inappropriate characters) for the others. To get a “finest effort” from iconv
you may add the -c
choice, inflicting iconv
to easily drop the characters it can not convert.
As an experiment: If in case you have a fairly present Linux field helpful, you may strive iconv
on the instance phrase right here. After I tried this on my Linux programs (two variations of Debian; ‘bookworm’ & ‘bullseye’), I discovered that iconv
did a wonderfully appropriate and correct ‘transliteration’ (//TRANSLIT
) of the instance used on this reply (and a number of other others); i.e. no additional '
character.
These outcomes might be improved with a sed
“filter”:
echo 'print("Numéro de boucle", i)' | iconv -f utf-8 -t ascii//translit | sed 's/[^a-zA-Z 0-9 , ( )]//g'
However having to make use of sed
to enhance iconv
strikes me as an unpleasant hack – one which must be pointless.
And so, iconv
appears to work at the very least a number of the time in macOS… hope this helps.