Rust + Toki Pona
Any language is accepted, so I wrote a program in Rust that generates some sentences in Toki Pona.
Toki Pona is an attempt to create a minimal natural language, and it has a super simple and regular grammar. That's a very useful property for this contest!
use std::rand;
#[deriving(Rand)]
struct Phrase { a: Option<~GNominal>, b: ~Sujet, c: ~Predicat }
#[deriving(Rand)]
enum Sujet { A(~GNominal), B(~SCompose) }
#[deriving(Rand)]
enum Predicat { C(~GVerbal), D(~PCompose) }
#[deriving(Rand)]
struct SCompose { a: ~Sujet, b: ~Sujet }
#[deriving(Rand)]
struct PCompose { a: ~Predicat, b: ~Predicat }
#[deriving(Rand)]
struct GNominal { a: ~nom::Nom, b: Multi<~adjectif::Adjectif> }
#[deriving(Rand)]
struct GVerbal { a: ~verbe::Verbe, b: Multi<~adjectif::Adjectif>, c: Multi<~ODirect> }
#[deriving(Rand)]
struct ODirect { a: ~GNominal}
#[deriving(Rand)]
enum Multi<T> { Zero, One(T), Two((T,T)) }
mod nom {
#[deriving(Rand)]
#[deriving(ToStr)]
pub enum Nom {akesi,ala,ale,anpa,ante,ijo,ike,ilo,insa,jaki,jan,jo,kala,kalama,kama,kasi,ken,kili,kiwen,ko,kon,kule,kulupu,lape,lawa,len,lete,linja,lipu,luka,lupa,ma,mama,mani,meli,mi,mije,moku,moli,monsi,mun,musi,mute,nanpa,nasin,nena,nimi,noka,oko,olin,ona,pakala,pali,palisa,pana,pilin,pimeja,pini,pipi,poka,poki,pona,seli,selo,sewi,sijelo,sike,sina,sinpin,sitelen,sona,soweli,suli,suno,supa,suwi,tan,tawa,telo,tenpo,toki,tomo,tu,unpa,uta,utala,walo,wan,waso,wawa,weka,wile}
}
mod verbe {
#[deriving(Rand)]
#[deriving(ToStr)]
pub enum Verbe {ante,awen,ijo,ike,jaki,jan,jo,kalama,kama,ken,kepeken,kule,kute,lape,lawa,lete,lili,lon,lukin,moku,moli,musi,mute,nasa,olin,open,pakala,pali,pana,pilin,pimeja,pini,pona,seli,sin,sitelen,sona,suli,suwi,tawa,telo,toki,tu,unpa,utala,wan,wawa,weka,wile,}
}
mod adjectif {
#[deriving(Rand)]
#[deriving(ToStr)]
pub enum Adjectif {ala,ale,anpa,ante,awen,ike,insa,jaki,jan,jelo,kama,kin,kiwen,kon,kule,kute,kulupu,lape,laso,lawa,lete,lili,linja,loje,luka,lukin,mama,meli,mi,mije,moli,monsi,mun,musi,mute,nasa,ni,olin,ona,pali,pimeja,pini,poka,pona,sama,seli,sewi,sike,sin,sina,suli,suwi,taso,tawa,toki,tomo,unpa,uta,walo,wan,wawa,weka,wile,}
}
impl ToStr for Phrase {
fn to_str(&self) -> ~str {
self.a.as_ref().map_or(~"", |g| format!("{:s} la ", g.to_str()))
+ format!("{:s} li {:s}", self.b.to_str(), self.c.to_str())
}
}
impl ToStr for Sujet {
fn to_str(&self) -> ~str {
match *self {
A(ref v) => v.to_str(),
B(ref v) => v.to_str(),
}
}
}
impl ToStr for Predicat {
fn to_str(&self) -> ~str {
match *self {
C(ref v) => v.to_str(),
D(ref v) => v.to_str(),
}
}
}
impl ToStr for SCompose {
fn to_str(&self) -> ~str {
format!("{:s} en {:s}", self.a.to_str(), self.b.to_str())
}
}
impl ToStr for PCompose {
fn to_str(&self) -> ~str {
format!("{:s} li {:s}", self.a.to_str(), self.b.to_str())
}
}
impl ToStr for GNominal {
fn to_str(&self) -> ~str {
format!("{:s} {:s}", self.a.to_str(), self.b.to_str())
}
}
impl ToStr for GVerbal {
fn to_str(&self) -> ~str {
format!("{:s} {:s} {:s}", self.a.to_str(), self.b.to_str(), self.c.to_str())
}
}
impl ToStr for ODirect {
fn to_str(&self) -> ~str {
format!("e {:s}", self.a.to_str())
}
}
impl<T: ToStr> ToStr for Multi<~T> {
fn to_str(&self) -> ~str {
match *self {
Zero => ~"",
One(ref v) => v.to_str(),
Two((ref v,ref w)) => format!("{:s} {:s}", v.to_str(), w.to_str()),
}
}
}
fn main() {
let phrase = rand::random::<Phrase>();
println!("{:s}\n{:?}", phrase.to_str(), phrase);
}
I don't speak Toki Pona, but I found the syntax of Toki Pona as a set of BNF rules on Wikipedia. I created one struct or enum for each BNF rule, and I annotated them with deriving(Rand)
, which gives me a way to generate a random Phrase
struct for free! Then, I implemented ToStr
for each of these structs to convert them to a string.
I intentionnaly left the struct names in french, because the BNF rules I found are in french, and also because it reinfoces the multilingual nature of my submission!
Sample outputs
Some outputs and their translations, that I did based on the BNF rules and a Toki Pona dictionary. I'm sure these translations are mostly wrong, but Toki Pona actually leaves a lot of room for the interpretation of a sentence.
nasin mi tawa la jan li jaki
While on my trip, someone polluted
monsi li jaki li jan ike musi
The butt is dirty and is a funny bad person
sina li tawa ale jelo e kili tawa e insa
You moved the fruit and the center to the yellow universe
Issues
- I don't check if a verb is transitive or not, thus some sentences are grammatically incorrect.
- Some structs are recursive, and when a rule can be repeated I randomly choose to output 0, 1 or 2 elements. This can lead to veeeeeery long generated sentences, containing thousands of words...
- I cannot really verify the validity of the output, I rely entirely on the BNF syntax, the dictionary, and my own wild guesses :)
7I think it's clear from some of the answers (MatLab I'm looking at you) that you should modify the rules such that data-mining is not allowed to pull consecutive words from any source. – Carl Witthoft – 2014-02-21T13:50:54.937
While I'm being a smartass: since it's purely a popularity contest, someone should just post a HotModelBikini jpg. That'll get more votes than anything. – Carl Witthoft – 2014-02-21T15:03:41.360
And if somebody happens to have a list of phrases on the internet somewhere, can I http for them? – Cruncher – 2014-02-21T15:11:59.417
@Cruncher - Yes, but you have do generate several unique sentences to use them. – TheDoctor – 2014-02-21T15:12:54.723
7I'll upvote anyone who uses repetitions of "buffalo" or "fish" as sample sentences! – None – 2014-02-21T16:19:12.807
@YiminRong Don't you mean "correct horse battery stapler" ? – Carl Witthoft – 2014-02-21T16:28:30.147
7Most answers here either mine valid, full sentences from text sources, or generate output that does not meet the criteria. To me, both approaches seem against the spirit of the question! If someone really wants to impress, might I suggest a program that starts with a set of valid sentence structures like
[Adjective] [pl. noun] [verb] [adjective] [pl. noun]
and pulls from a real dictionary (maybe using one of the Dictionary APIs available out there) to fill in the blanks? I'd write it myself if I had a few minutes to spare! :( After all...Lazy Developers Write Lousy Programs.
– Brian Lacy – 2014-02-21T17:21:11.410Of all the answers below (including my own) I think the one by squeamish ossifrage is the only one so far that's truly in the spirit of this challenge (@BrianLacy I came up here to post that and saw your comment; there you go!). Well, and Yimin Rong's creative interpretation of it too! :) – Jason C – 2014-02-21T23:27:28.617
@BrianLacy I've made my attempt. – Pureferret – 2014-02-22T01:40:28.520
"Understandable" is a bit subjective. Would something like "May Axes Labour Police Beat Pledge" or "Foot heads arms body" count?
– Mechanical snail – 2014-02-22T21:41:56.363@Mechanicalsnail LOL. "Foot heads arms body" is so awesome in context. Definitely added to my list of favorite headlines, right next to basically anything the NY Post prints. – Jason C – 2014-02-23T17:51:11.027
@BrianLacy I think my submission will please you, then :) – barjak – 2014-02-23T20:59:04.970
@BrianLacy, my lazy program tries to use the form
(pronoun | [article] [modifier]* [noun conjunction] noun) [adverb]* verb-ish [article] [modifier] noun [preposition noun]
. It is, however, lousy at correctly classifying parts of speech given random wordlists. – AShelly – 2014-02-24T11:49:47.227What differentiates 'word-lists' from 'hard-coding'? – Pureferret – 2014-02-25T06:58:19.327
1@Pureferret Word lists would be lists of individual words. Hard coding would be a list of complete sentences. With word lists, you would typically need some logic in the program to piece together a complete sentence. With hard coded sentences, you basically just need a print statement. – 8bittree – 2014-02-27T15:03:45.373