text-scrambler

Using the Unicode confusable characters and other tricks, we can transform a text into another that looks exactly like it but remains different from a machine view.

Examples

Replacing randomly the Latin characters by Greek or Cyrillic letters and adding the ZW(N)J.

Original text:

Herman Melville (August 1, 1819 – September 28, 1891) was an American novelist, short story writer, and poet of the American Renaissance period. Among his best-known works are Moby-Dick (1851), Typee (1846), a romanticized account of his experiences in Polynesia, and Billy Budd, Sailor, a posthumously published novella. Although his reputation was not high at the time of his death, the centennial of his birth in 1919 was the starting point of a Melville revival and Moby-Dick grew to be considered one of the great American novels.

Srambled text with zw(n)j added (looking the same but totally different):

H‍e‌r‌m‍a‍n‌ ‌M‍e‌l‌v‌i‍l‍l‍e‍ ‌(‍A‍u‍g‌u‌s‍t‌ ‌1‌,‌ ‍1‍8‌1‍9‌ ‌–‌ ‌S‍e‌p‌t‌e‍m‍b‍e‌r‌ ‍2‌8‍,‍ ‍1‍8‌9‌1‍)‌ ‌w‍a‌s‍ ‌a‍n‌ ‍A‌m‌e‌r‌i‌c‍a‌n‍ ‍n‌o‍v‌e‍l‌i‌s‍t‍,‍ ‌s‍h‌o‍r‍t‌ ‍s‌t‌o‌r‍y‍ ‌w‌r‍i‌t‍e‌r‌,‌ ‍a‍n‍d‍ ‌p‌o‌e‌t‌ ‍o‌f‍ ‌t‌h‍e‌ ‌A‍m‌e‍r‌i‍c‌a‌n‍ ‍R‍e‌n‍a‍i‍s‍s‌a‌n‍c‌e‍ ‌p‍e‌r‍i‌o‌d‌.‍ ‍A‌m‍o‌n‍g‍ ‍h‍i‌s‍ ‌b‍e‌s‍t‍-‌k‍n‌o‍w‌n‍ ‌w‍o‌r‍k‍s‌ ‍a‌r‍e‍ ‍M‍o‌b‍y‌-‍D‌i‍c‍k‍ ‍(‌1‍8‌5‍1‍)‍,‌ ‌T‍y‌p‍e‌e‌ ‍(‌1‍8‌4‌6‍)‌,‍ ‌a‌ ‍r‌o‌m‍a‌n‍t‌i‍c‍i‌z‌e‍d‌ ‍a‍c‌c‍o‌u‍n‌t‌ ‍o‌f‌ ‍h‌i‍s‌ ‍e‍x‍p‍e‌r‌i‌e‌n‍c‌e‍s‌ ‌i‍n‍ ‍P‍o‌l‌y‌n‍e‍s‌i‍a‌,‍ ‍a‍n‍d‍ ‍B‍i‍l‍l‌y‌ ‌B‌u‌d‍d‌,‍ ‍S‍a‌i‌l‌o‍r‌,‍ ‍a‌ ‍p‌o‌s‍t‌h‍u‍m‌o‍u‌s‍l‍y‌ ‌p‍u‌b‍l‍i‌s‌h‍e‌d‍ ‍n‌o‌v‌e‌l‍l‍a‍.‌ ‍A‍l‍t‍h‌o‍u‍g‍h‍ ‍h‌i‍s‌ ‌r‍e‌p‍u‍t‌a‍t‍i‌o‌n‍ ‌w‍a‌s‌ ‍n‌o‌t‌ ‍h‌i‌g‍h‌ ‍a‌t‌ ‌t‌h‌e‌ ‌t‍i‍m‍e‍ ‍o‍f‌ ‌h‌i‍s‍ ‌d‌e‍a‍t‍h‌,‌ ‌t‍h‌e‍ ‌c‍e‌n‍t‌e‍n‌n‌i‍a‍l‌ ‍o‍f‌ ‍h‍i‌s‍ ‍b‍i‌r‌t‍h‍ ‌i‌n‌ ‍1‌9‍1‌9‌ ‌w‍a‍s‍ ‌t‌h‌e‍ ‌s‌t‍a‍r‌t‍i‍n‍g‍ ‍p‍o‌i‌n‌t‌ ‍o‌f‌ ‍a‌ ‍M‍e‌l‌v‌i‍l‌l‌e‍ ‌r‍e‍v‌i‌v‍a‍l‍ ‌a‌n‍d‌ ‍M‍o‍b‌y‍-‌D‌i‌c‌k‍ ‍g‍r‌e‌w‍ ‌t‌o‌ ‌b‍e‍ ‌c‌o‌n‍s‌i‌d‌e‍r‌e‌d‌ ‌o‍n‍e‌ ‍o‌f‌ ‍t‌h‌e‍ ‍g‍r‌e‍a‌t‌ ‌A‌m‍e‌r‌i‍c‍a‍n‍ ‍n‌o‌v‌e‍l‌s‍.

Srambled text with latin letter replaced with their Cyrillic/Greek equivalent:

Неrman Melvіllе (Αuguѕt 1, 1819 – Septеmber 28, 1891) wаѕ an Аmеrісаn nοvеlist, shοrt story writеr, and poеt оf the Americаn Rеnaіssanсe pеriоd. Amоng his bеst-known works arе Μoby-Dісk (1851), Τyреe (1846), a rоmаnticizеd accоunt оf hіs eхрerіencеs in Ρоlynеѕiа, аnd Вilly Budd, Ѕаіlοr, а pοsthumously рublіshed nоvеllа. Although hiѕ reputation was nοt hіgh at thе tіme οf hіѕ dеаth, the сentennіаl оf hіs bіrth in 1919 waѕ thе stаrting point οf a Μelville revival and Μοby-Dick grew tο bе соnѕidеrеd οne оf the great American novels.

Srambled text with both changes:

H‍e‌r‌m‍a‍n‌ ‌Μ‍e‍l‍v‌і‍l‍l‌е‍ ‌(‍А‌u‌g‍u‍ѕ‌t‍ ‌1‍,‍ ‌1‌8‍1‍9‌ ‌–‍ ‍S‍e‌p‌t‌e‌m‍b‍e‍r‍ ‌2‍8‌,‌ ‍1‍8‍9‍1‍)‍ ‍w‍a‍ѕ‌ ‌a‍n‌ ‌Α‍m‍e‌r‌i‍с‌a‌n‍ ‌n‌o‍v‍e‌l‍i‍ѕ‌t‌,‌ ‌s‍h‌ο‍r‍t‍ ‍ѕ‌t‌ο‌r‍y‍ ‍w‌r‍i‍t‌е‌r‌,‍ ‌а‌n‌d‍ ‌p‌о‌е‌t‍ ‌ο‍f‌ ‍t‌h‍e‍ ‍А‍m‌e‌r‍і‌c‍а‍n‍ ‍R‍е‍n‍a‍i‌s‍s‍a‍n‍с‌е‌ ‌p‍е‍r‍i‍о‍d‌.‌ ‌A‍m‍ο‍n‌g‌ ‌h‌i‌ѕ‍ ‍b‍е‌s‍t‌-‍k‌n‌ο‍w‍n‍ ‌w‌о‌r‌k‌ѕ‌ ‍a‌r‌е‌ ‌M‍о‍b‍y‌-‍D‌i‍c‌k‍ ‌(‌1‍8‍5‍1‍)‌,‌ ‌T‍y‍p‌е‍е‍ ‍(‌1‌8‌4‌6‌)‍,‍ ‌a‍ ‍r‍ο‍m‌а‌n‌t‌і‍с‍і‍z‍e‌d‌ ‍a‌с‍c‍о‌u‍n‍t‍ ‌ο‌f‌ ‍h‍і‍s‍ ‍e‌x‌р‍e‍r‌і‌е‍n‍c‌e‍s‌ ‌і‌n‍ ‍Р‍о‍l‌y‌n‍е‍s‍і‍а‌,‌ ‍a‍n‍d‌ ‍В‌i‍l‍l‌y‍ ‌Β‍u‌d‍d‍,‍ ‌Ѕ‌а‍i‌l‌ο‍r‌,‍ ‌a‌ ‌p‍ο‍ѕ‌t‌h‍u‍m‍о‍u‌ѕ‍l‌y‍ ‌p‌u‍b‌l‍i‌ѕ‌h‍е‌d‍ ‌n‌о‌v‍е‍l‌l‍a‍.‍ ‍A‍l‍t‍h‌о‌u‌g‍h‍ ‍h‌i‍s‌ ‌r‌e‌р‌u‌t‍a‍t‌і‌o‍n‌ ‍w‍а‌ѕ‌ ‌n‌о‍t‌ ‍h‍і‌g‌h‍ ‍а‍t‌ ‍t‍h‌е‍ ‌t‍і‍m‍e‍ ‌o‍f‍ ‌h‌і‍s‍ ‌d‍e‍а‍t‍h‍,‍ ‍t‌h‌е‍ ‌с‌e‍n‍t‍e‌n‌n‍і‍a‌l‌ ‍o‍f‌ ‍h‍і‍ѕ‍ ‍b‌i‍r‍t‌h‌ ‌і‌n‌ ‍1‍9‍1‍9‌ ‌w‌а‌s‌ ‌t‌h‍e‍ ‌s‌t‌а‌r‌t‌і‍n‍g‍ ‌р‍ο‍і‍n‌t‍ ‌ο‌f‌ ‍a‌ ‌Μ‍e‌l‍v‍i‍l‌l‍е‌ ‍r‌е‍v‍i‌v‍а‍l‍ ‍a‌n‌d‍ ‍М‍o‌b‍y‌-‍D‌і‌с‌k‍ ‌g‍r‌е‌w‌ ‍t‍ο‍ ‌b‌e‍ ‍с‍o‍n‍s‍i‌d‌e‌r‌e‌d‍ ‍о‍n‍e‌ ‌o‍f‍ ‌t‌h‍е‍ ‌g‍r‌e‌а‍t‍ ‍А‍m‌е‌r‌i‌с‌а‍n‌ ‌n‍o‌v‍e‍l‌s‍.

It is worth to notice that search engines can’t find the original webpage (as free online plagiarism checkers). Searching for Μelvillе (with cyrillic letters) (copy-paste it) on Google doesn’t return any match, though the original word Melville does.

Using all of the confusable characters of unicode (see the unicode confusable characters below), we can generate weird looking text worthy of old spam messages:

𝚮‍𝒆‌𝕣‍m‍𝓪‍n‍ ‍𝝡‍ҽ‌𝟙‍∨‍𝘪‍𝘐‌𞺀‍𝓮‍ ‍﴾‍𝓐‍𝞄‍𝓰‍ꞟ‌𑣁‍t‌ ‌1‌,‌ ‍1‍8‌1‍Ⳋ‌ ‍–‍ ‌Ꮥ‌𝖊‍𝞺‌𝐭‍𝖾‌m‍Ƅ‌𝔢‌𝔯‌ ‍Ƨ‍𐌚‌ꓹ‌ ‍1‍ଃ‌𝟿‍1‍]‌ ‍𝘸‍𝐚‍𝚜‍ ‍𝖺‌𝔫‍ ‍Α‍m‌ℯ‌𝔯‌𝓲‌ꮯ‌𝒶‌𝓷‌ ‍n‌ം‍𝝼‍𝔢‍𝙸‌i‌s‌𝖙‍؍‍ ‍𐑈‌𝖍‌ꬽ‍ꭇ‍𝓽‍ ‌𝓼‌𝖙‍ⲟ‌r‌𑣜‍ ‍𝐰‌𝓻‌і‍𝒕‍е‍𝕣‍٫‍ ‍α‌𝒏‌𝕕‍ ‍𝙥‌𝜊‍e‍𝕥‍ ‍ﮨ‍f‌ ‌𝘵‍h‍𝗲‌ ‌Α‌m‍𝐞‍𝐫‌ꙇ‌𝒸‍a‍n‌ ‍𖼵‍𝘦‍𝑛‌𝐚‌𝒾‌𝑠‌𑣁‌𝜶‌𝕟‌𝗰‌𝒆‍ ‌𝟈‍𝖾‌r‍⍳‌ﮫ‌ᑯ‌𐩐‌ ‍Α‌m‍o‍𝓃‌𝖌‍ ‌𝓱‌Ꭵ‌𝐬‍ ‌Ꮟ‍𝙚‌𝗌‍𝕥‌۔‍𝖐‌𝖓‌o‌𝑤‍𝐧‍ ‌𑜎‌о‌ꮁ‍𝐤‌𝗌‍ ‌𝜶‍𝗿‍𝖾‌ ‌𝕸‍໐‍Ꮟ‍𝙮‍Ⲻ‍𝖣‍𝑖‍𝔠‌𝒌‌ ‍〔‍1‌𝟪‌5‍1‍〕‌ꓹ‌ ‌𝖳‍𝗒‌𝓹‍𝘦‌𝚎‌ ‌〔‍1‍🯸‌𝟜‌6‍❳‍ꓹ‌ ‍𝖆‍ ‌𝕣‌ꬽ‍m‍⍺‌𝘯‌𝘵‌і‌ꮯ‌𝛊‍𝐳‍ⅇ‍𝙙‍ ‍𝕒‌c‍ᴄ‌ჿ‌𝚞‍𝚗‌𝐭‍ ‍𞹤‍𝔣‍ ‍𝚑‌ӏ‌𝓈‌ ‍𝕖‍𝑥‌𝙥‍𝔢‍𝗿‍ꙇ‌e‌𝓷‍c‌℮‍ꮪ‌ ‌𝖎‍𝚗‍ ‌𝙋‍𝘰‌Ӏ‍γ‌𝓷‍𝖾‍𝔰‍𝚒‌𝗮‌؍‍ ‌𝛼‍𝔫‍𝖉‌ ‍𝔅‌Ꭵ‌𝖑‌l‌𝔂‌ ‌𝓑‍𝐮‌𝖉‌𝒹‌‚‌ ‍Ꮥ‌а‌ꙇ‌𝘭‍𝝈‍𝗋‌,‍ ‌α‍ ‍𝑝‍ꬽ‍𐑈‍𝓽‌һ‍𝛖‍m‍𞺄‌ᴜ‍𝔰‍𝗹‌𝑦‍ ‌𝖕‍ᴜ‍Ꮟ‍𝝞‌𝜄‌s‍h‍𝗲‍ꓒ‌ ‌𝓃‍𝗈‌𝓋‍𝒆‌𐌉‌ו‌𝞪‍꘎‍ ‍𖽀‍𝜤‍𝑡‍һ‍𝙤‍𝑢‌ց‍𝘩‌ ‌𝒉‌ι‍ѕ‌ ‌𝖗‌𝒆‌𝛠‍𝚞‍𝐭‌𝓪‌𝙩‌ɪ‍ﮨ‍𝓷‍ ‌𑜊‍𝖺‍s‌ ‍𝘯‍𞹤‍𝚝‌ ‌𝐡‌𝜄‌ᶃ‍𝕙‍ ‍𝖆‍𝘁‍ ‌𝙩‍h‍ꬲ‌ ‍𝓉‌𝔦‍m‍е‍ ‌𝞼‍ẝ‍ ‍ℎ‌ı‍ƽ‍ ‌𝐝‌𝕖‍𝖆‍𝚝‌𝔥‌ꓹ‌ ‍𝙩‌Ꮒ‌ꬲ‍ ‌𝗰‌ⅇ‌𝗻‌𝔱‍𝖊‌𝖓‌n‍𝛊‍𝙖‌𐌠‌ ‍ﻫ‍𝘧‌ ‌𝒽‍𝖎‍𝘴‍ ‍b‍ı‌𝚛‌𝓽‌𝘩‌ ‌i‌𝐧‍ ‍1‍𑣖‌1‍𝟵‌ ‍𑜏‌α‌𝗌‌ ‌𝗍‌𝐡‌ҽ‍ ‍𝕤‍𝑡‍𝛂‌r‍𝓉‍Ꭵ‌𝚗‍ᶃ‍ ‌𝛒‍ס‌𝜾‍𝗻‌𝖙‌ ‌𝜊‌𝖋‌ ‍𝙖‌ ‍ꓟ‍𝙚‌ⵏ‌𝛎‍˛‍І‍𝘭‍ҽ‌ ‌𝔯‍𝐞‌v‌𝞲‌𝚟‌𝖆‍l‍ ‍ɑ‍𝘯‍𝖽‍ ‍𝑀‌ං‌𝒃‍𝚢‌‐‍𝐷‍ͺ‌𝚌‌𝗸‍ ‌𝓰‌ꭈ‌е‌ᴡ‌ ‍𝓉‌ﮭ‌ ‌ᑲ‍ℯ‍ ‌c‍ℴ‍𝙣‌𝔰‌𑣃‍d‍ⅇ‍𝔯‌℮‌ⅾ‍ ‍ﻬ‌𝓃‌℮‍ ‌੦‌𝙛‌ ‍𝙩‌𝔥‍𝔢‍ ‌𝚐‍ꮁ‌ℯ‍𝜶‍𝙩‍ ‍𝞐‍m‍𝘦‍ᴦ‌𝜾‌𝙘‌𝕒‍𝐧‍ ‍𝓃‌o‌𝓿‌ⅇ‍|‍𝒔‍ꓸ

Full documentation at https://text-scrambler.readthedocs.io

Installation

pip install text-scrambler

Quickstart

Python

>>> from text_scrambler import Scrambler
>>> scr = Scrambler()
>>> text = "This is an example"
>>> text_1 = scr.scramble(text, level=1)
>>> #############
>>> # adding only zwj/zwnj characters
>>> print(text, text_1, sep="\\n")
This is an example
T‌h‍i‍s‌ ‍i‌s‍ ‌a‌n‌ ‍e‍x‌a‍m‍p‌l‌e
>>> assert text != text_1
>>> print(len(text), len(text_1))
18 35
>>> # though the texts look similar, the second one has more characters
>>> #############
>>> text_2 = scr.scramble(text, level=2)
>>> # replacing some latin letters by their cyrillic/greek equivalent
>>> print(text_2)
Тhiѕ iѕ an ехаmple
>>> for char, char_2 in zip(text, text_2):
...     if char != char_2:
...             print(char, char_2)
...
T Т
s ѕ
s ѕ
e е
x х
a а
>>> #############
>>> text_3 = scr.scramble(text, level=3)
>>> # adding zwj/zwnj characters and replacing latin letters
>>> print(text_3)
T‌h‍і‍s‌ ‍i‌ѕ‍ ‍а‌n‍ ‌e‌х‍а‌m‍p‌l‌e
>>> print(text, text_3, sep="\\n")
This is an example
T‌h‍і‍s‌ ‍i‌ѕ‍ ‍а‌n‍ ‌e‌х‍а‌m‍p‌l‌e
>>> assert text_3 != text
>>> #############
>>> text_4 = scr.scramble(text, level=4)
>>> # replacing all characters by any unicode looking like character
>>> print(text_4)
⊤‌𝒽‍𝐢‌𝘴‌ ‌𝘪‍𝙨‌ ‍𝞪‍ռ‍ ‌𝙚‍⨯‍𝚊‍m‌ρ‍𝟙‌ҽ
>>> #
>>> # generating several versions
>>> versions = scr.generate(text, 10, level=4)
>>> for txt in versions:
...     print(txt)
...
𝕋‌𝗵‌𝕚‍𝔰‍ ‍𝙞‌ѕ‌ ‌ɑ‍𝗇‌ ‌ꬲ‍𝗑‍𝒂‍m‌𝛠‍Ⲓ‍𝚎
𝔗‌һ‌𑣃‍ƽ‌ ‌˛‍ꜱ‍ ‍𝛼‍𝐧‌ ‌𝐞‍𝖝‍𝛼‌m‌𝜌‌𝟏‌ℯ
T‌h‌𝓲‌𝔰‌ ‌ⅈ‌𝔰‍ ‌α‌n‌ ‍ꬲ‌⤬‌α‌m‌⍴‍𞸀‌e
𝗧‍𝗵‍i‍𝑠‍ ‌i‌𝖘‌ ‍⍺‍𝘯‌ ‌𝗲‌𝔁‍а‌m‍𝘱‍𝙸‍𝔢
⊤‌𝚑‍𝑖‌s‌ ‍ɪ‌𝚜‌ ‍𝜶‍𝑛‌ ‍𝖾‍𝘅‍𝒶‍m‍𝛒‍𝑙‌𝓮
𝘛‌h‍𝙞‍ꮪ‍ ‌ⅈ‌𝗌‍ ‍𝗮‌𝐧‍ ‍ꬲ‌ᕽ‍𝓪‌m‌𝜌‌⏽‍𝓮
𝙏‌𝕙‍і‍𝓈‌ ‌ı‍ꜱ‍ ‌𝔞‍𝕟‍ ‍𝗲‍𝕩‍𝛂‍m‍р‍𐌉‌𝚎
𝕿‌Ꮒ‌ℹ‌𝐬‌ ‍𝗶‌𝗌‌ ‍𝛼‍𝔫‌ ‍𝗲‍𝐱‍𝓪‌m‍𝞎‌𝙡‌𝖊
⟙‌h‍𝜾‍ꮪ‍ ‌i‍𝘴‍ ‌𝝰‍𝒏‌ ‌𝙚‍ᕽ‍𝗮‍m‌𝗽‌𝗜‍𝗲
𝖳‌հ‌𝒊‌s‌ ‍𝕚‌𝙨‌ ‌𝖆‌𝑛‌ ‌𝘦‌𝔁‌а‌m‌𝜌‌𝐈‍𝗲
>>> versions = scr.generate(text, 1000, level=1)
>>> assert len(versions) == len(set(versions))
>>> # all unique

Command line interface (CLI)

To get words from input words through CLI, run

$ python -m text_scrambler
usage: Usage : python -m text_scrambler file

Replace/insert the charaters of the file using the unicode confusable characters

positional arguments:
  file                  encoded in UTF-8

optional arguments:
  -h, --help            show this help message and exit
  -l LEVEL, --level LEVEL

                                1: insert non printable characters within the text
                                2: replace some latin letters to their Greek or Cyrillic equivalent
                                3: insert non printable characters and change the some latin  to their Greek or Cyrillic equivalent
                                4: insert non printable chraracters change all possible letter to a randomly picked unicode letter equivalent
                                default=1
  -n N, --generate N
                                Scramble n times the string
                                default=1