What language does Google think this obscure natlang is?

A forum for discussing linguistics or just languages in general.
Post Reply
protondonor
cuneiform
cuneiform
Posts: 146
Joined: 07 Mar 2015 03:59

What language does Google think this obscure natlang is?

Post by protondonor »

Inspired by this thread over on the conlangs board, I became curious how Google Translate determines what language an unknown text is, and started to put in random samples of obscure (or obscure-ish, or sometimes just misleadingly transcribed) natlangs from the Guess The Language thread into Google Translate to find out what it would guess. Generally the results went into one of four categories:
  • Google Translate didn't even try, and just assumed the text was English and translated none of it. Languages in this group include but are not limited to Potawatomi, Seneca, Chemehuevi, Judeo-Berber, Tuareg, Tigrinya, Gaulish, Scandoromani, Svan, Burushaski, Miyako, and Fijian (all samples of these were romanized).
  • Google Translate recognized a couple words in an unrelated language, and translated those but left most of the sample unchanged. Some examples of this include Dungan (GT thought Bulgarian), Ho-Chunk (GT thought Hindi), Adyghe (GT thought Russian on one sample, Belarusian on another), Cherokee (romanized; GT thought Bulgarian... for some reason...), Tzeltal (GT thought Uzbek), Ch'ol (GT thought Spanish because "Jesucristo" was in the sample), Khoekhoe (GT thought Vietnamese), Silesian (for the standard romanization, GT thought Polish, but a slightly modified romanization, GT thought... Arabic?!?).
  • Google Translate recognized it as a related language and translated it semi-faithfully, e.g.
    • Gagauz Onnara verilmiş akıl hem üz da läazım., translated as Turkish "I have been given the prince of mind, alas"; actual translation should be something like "They are endowed with reason and conscience"
    • Urdu Rekhta ke tumhiⁿustād tu nahīⁿ ho Ghālib / Kahte haiⁿ agle zamāne meⁿ ko'ī Mīr bhī thā, translated as Hindi "Reports are not available at Ghālib / I am sorry to say that I am sorry to say"
    • French Guiana Creole Sa ki la pou to dlo pa ka charyié'l, translated to Haitian Creole as "Those for water rates can not charyié'l"
    • Upper Sorbian Mojich zbožnych sonow raj!, translated as Slovak "My pious snow paradise!" (Wikipedia translates the corresponding line of Rjana Luzica as "Land of dreams, resplendent soil", but GT may be more literally accurate here)
  • (by far my favorite category) Google Translate recognized enough words to translate something, and gamely produced complete nonsense.
    • Trumai haits chɨn inatlek atlat mapa ka, translated as Hindi "Map of the map"
    • Rade Phung buon sang nao dru pu atao kma hlam bong, translated as Vietnamese "Go to the top of the page"
    • Juba Arabic Ita ma bi derisu ana, translated as Japanese "Lifestyles Happy New Year"
    • Palawa kani "Varanta takara milajthina nara takara!", translated as Japanese "Because I was a victim!"
    • Truku Yaasa mrana bi puyun ka masu siida ni mttuku dha bi uqun. Siida, asi lu maa qqbhni ka idaw masu dni., translated as Japanese "Hello, I do not want to m behind. I am sorry to have you to qbh."
    • Warembori E-kue, emuni orive kombe inai. Nana ipayave, make matinna. Emamiekepayave. translated as Japanese "Well, I do not have time to eat. Nanase way, we misbehave. Ehmiyagaya yeah."
    • Magahi Maiya ge sankar jee ke ajbi rahaniya / Ho, Maiya ge Sankar jee ke ajbi rahaniya, translated as Hindi "I have been living in the hybrid of the hybrid / Yes, I am a stranger living hybrid."
    • Nùng Tú cá bạt hảhn tú nohc tang-na. / Sláo bô càng pehn tế. / Mưhn hủ hủ hủ. / Sãu náh dạ chán tửhn ma. translated as Vietnamese "Pangasius canal is very hard-na. / Slightly more potent. / Tofu / I'm sorry to death."
    • Monguor Bi zou huashini mieshi xiku duoghuolang diaolia bai. Huashi nangda zhua kaikeku, ta ghuluo tini jiashi beila ri bai., translated as Chinese "You take a lot of money out of the panties, and it's about the nudity."
User avatar
Frislander
mayan
mayan
Posts: 2088
Joined: 14 May 2016 18:47
Location: The North

Re: What language does Google think this obscure natlang is?

Post by Frislander »

I love this! I'm trying this with a languages on Omniglot, using their translations of the 1st article of the UDHR.

Those that worked:
  • Dagaare: Nengsaala zaa ba nang dɔge so la o menga, ka o ne o taaba zaa sengtaa noba emmo ane yɛlɛsoobo sobic poɔ. Ba dɔgɛɛ ba zaa ne yɛng ane yɛlɛ-iruu k'a da seng ka ba erɛ yɛlɛ korɔ taa a nga yɔɔmine.. Came out as Hausa:
    "Nengsaala incorporate codeine dɔge want la o Meng o o taaba incorporate sengtaa plagues emmo ane yɛlɛsoobo sobic poɔ. Dɔgɛɛ incorporate yɛng ane yɛlɛ iruu k'a Seng erɛ yɛlɛ korɔ her NGA yɔɔmine."
  • Asháninka: Aquempetavacaajeita maaroni atiri. Timatsi aquenqueshirejeitantari maaroni, timatsi amejeitari, ayojeiti paitarica ocameetsati antajeitiri: te oncameetsateji intsaneapitsajeiteero itsipapee. Te oncameetsateji imperanajeitee, te oncameetsateji iroashinoncaajeitee, irointi ocameetsati aacameetsatavacaajeitea.. Came out as Chichewa/Nyanja: "v
    The qu e viral him after he jeita of the Aaron atiri. Timatsi the qu e nqu e sh a German of tantari of Aaron, timatsi of a German itari and the jeiti the tarica are of a etsati the Internet German Ith'rite: turn the too is the tsateji the ntsa that's the German's teero the firm papee. Security too is the e tsateji of him in the German itee, take too is the e o tsateji of the sh too small to German itee, it is of the rointi etsati of the following is true of the viral after German itea."
  • Tuvaluan: E fā'nau mai a tino katoa i te saolotoga kae e 'pau telotou tūlaga fakaaloalogina mo telotou aiā. Ne tuku atu ki a lātou a te mafaufau mo te loto lagona, tēlā lā, e 'tau o gā'lue fakatasi lātou e pēlā me ne taina.. Came out as Samoan: "
    It fā'nau's body katoa on freedom but to fall telotou quality fakaaloalogina for telotou right. Ne tuku keys they will think for you will feel, big sun, 'price gā'lue fakatasi user Pella May ne play."
  • Tuvan: Бүгү кижилер хостуг база мөзүзү болгаш эргелери дең кылдыр төрүттүнер. Оларга угаан-
    сарыыл болгаш арын-нүүр бердинген болур болгаш олар бот-боттарынга акы-дуңмалышкы
    хамаарылганы көргүзер ужурлуг.
    . Came out as Kyrgyz: "And all murmured, free, base mözüzü native lead and the shock of divorce. ugaan- them Yîn and the gift-nüür berdingen they may have and the bot-bottarınga duŋmalışkı hamaarılganı Palms also."
Among the notable failures, Ainu was detected as English while East Cree was detected as Japanese (why not the other way round?), while Cantonese was detected as English with both the Yale and Jyutping romanisations. Inari Saami was detected as Arabic
User avatar
qwed117
mongolian
mongolian
Posts: 4094
Joined: 20 Nov 2014 02:27

Re: What language does Google think this obscure natlang is?

Post by qwed117 »

protondonor wrote: Urdu Rekhta ke tumhiⁿustād tu nahīⁿ ho Ghālib / Kahte haiⁿ agle zamāne meⁿ ko'ī Mīr bhī thā, translated as Hindi "Reports are not available at Ghālib / I am sorry to say that I am sorry to say"
Wait...
Spoiler:
My minicity is [http://zyphrazia.myminicity.com/xml]Zyphrazia and [http://novland.myminicity.com/xml]Novland.

Minicity has fallen :(
The SqwedgePad
User avatar
Imralu
roman
roman
Posts: 960
Joined: 17 Nov 2013 22:32

Re: What language does Google think this obscure natlang is?

Post by Imralu »

A friend of mine posted "meep" on FB and I commented with "moop". Then a guy took screen captures ... Bing determined that "meep" means "mip". OK. Helpful. But it then also determined that "moop" means "recarbonization" ... [o.O] In what language??
Glossing Abbreviations: COMP = comparative, C = complementiser, ACS / ICS = accessible / inaccessible, GDV = gerundive, SPEC / NSPC = specific / non-specific, AG = agent, E = entity (person, animal, thing)
________
MY MUSIC | MY PLANTS
Post Reply