Pages

Jumat, 20 Juni 2014

Uji coba sederhana grammar relationship extractor bahasa Indonesia

Alhamdulillah grammar relationship extractor bahasa Indonesia relex-id berhasil untuk kasus sangat sederhana :-)


Saat ini relex-id hanya dapat mengenali bentuk Subjek(pronoun)-Predikat(verb)-Object(noun/pronoun)
namun ini sesuai dengan grammar relation rules yang dimuat, saat ini ada 2 rules saja.

Dengan pemisahan modul grammar relationship extractor menjadi 2 bagian yaitu pattern matcher dan rules, diharapkan rules dapat ditambahkan lebih lanjut untuk mengakomodasi berbagai pola struktur klausa yang bervariasi.

Untuk kalimat input "aku cinta kamu", struktur kalimatnya adalah:

(S (PP i) (VP dbpedia:Love (PP you_o)) . )

struktur tersebut menjadi input dan apabila diekstraksi menjadi:

_subj(dbpedia:Love, I)
_obj(dbpedia:Love, you)

Untuk kalimat input "aku suka gajah.", struktur kalimatnya adalah:

(S (PP i) (VP dbpedia:Like (NP dbpedia:Elephant)) . )

struktur tersebut menjadi input dan apabila diekstraksi menjadi:

_subj(dbpedia:Like, I)

_obj(dbpedia:Like, dbpedia:Elephant)

Pattern matcher tersebut sudah mendukung partial subtree matching (meski algoritmanya seadanya sih...) sehingga dapat melakukan matching part-of-speech dengan kedalaman hierarchy bebas (1 level, 2 level, 3 level, dsb. meski saya kira untuk analisa grammar, 3 level sudah mentok deh).

Output yang dihasilkan saya modelkan mirip dengan hasil dari OpenCog RelEx, agar memudahkan perbandingan dan interoperabilitas. Meski saya menggunakan teknologi semantic RDF (resource URI/QName) sebagai penanda konsep, namun saya pikir konversi dari RDF ke format lain (misalnya OpenCog AtomSpace) tidak terlalu memberatkan. Dari sudut pandang eksternal, penggunaan RDF di sini dapat diasumsikan hanya sebagai ID unique saja.

Meski secara internal, relex-id menggunakan subset fitur ontology (OWL Lite) dari RDF terutama untuk klasifikasi resource, misalnya bahwa dbpedia:Elephant adalah termasuk jenis dbpedia-owl:Animal.

Next step adalah melakukan semantic relationship extraction, yay! The meat of NLP :-) Dan mungkin yang paling susah juga dari semua tahapan NLP dari awal.

Untuk antecedent analysis saya masih ragu apakah mau dibikin proof-of-conceptnya dulu atau ditunda dan langsung bikin proof-of-concept semantic reasoning. We'll see :)

Log:

19:57:48.490 [main] DEBUG id.ac.itb.ee.lskk.relexid.core.RelEx - Matcher pronoun matches (PP i)
19:57:48.490 [main] DEBUG id.ac.itb.ee.lskk.relexid.core.RelEx - Matcher pronoun matches (PP you_o)
19:57:48.490 [main] DEBUG id.ac.itb.ee.lskk.relexid.core.RelEx - Matcheds 1 for matchers [pronoun] against [(PP you_o)]
19:57:48.490 [main] DEBUG id.ac.itb.ee.lskk.relexid.core.RelEx - Matcher verb matches (VP dbpedia:Love (PP you_o))
19:57:48.490 [main] DEBUG id.ac.itb.ee.lskk.relexid.core.RelEx - Matcheds 2 for matchers [pronoun, verb] against [(PP i), (VP dbpedia:Love (PP you_o))]
19:57:48.490 [main] DEBUG id.ac.itb.ee.lskk.relexid.core.RelEx - Relation rule [pronoun verb => _subj(2, 1) || _obj(2, 2/1)] matches 0..1 [(PP i), (VP dbpedia:Love (PP you_o))]
19:57:48.491 [main] DEBUG id.ac.itb.ee.lskk.relexid.core.RelEx - Matcheds 0 for matchers [pronoun, verb] against [(VP dbpedia:Love (PP you_o)), .]
19:57:48.491 [main] DEBUG id.ac.itb.ee.lskk.relexid.core.RelEx - Matcher pronoun matches (PP i)
19:57:48.491 [main] DEBUG id.ac.itb.ee.lskk.relexid.core.RelEx - Matcheds 0 for matchers [noun] against [(PP you_o)]
19:57:48.491 [main] DEBUG id.ac.itb.ee.lskk.relexid.core.RelEx - Matcheds 1 for matchers [pronoun, verb] against [(PP i), (VP dbpedia:Love (PP you_o))]
19:57:48.491 [main] DEBUG id.ac.itb.ee.lskk.relexid.core.RelEx - Matcheds 0 for matchers [pronoun, verb] against [(VP dbpedia:Love (PP you_o)), .]
19:57:48.491 [main] DEBUG id.ac.itb.ee.lskk.relexid.core.RelEx - Deduced 2 relations from 3 parts [(PP i), (VP dbpedia:Love (PP you_o)), .] >> [_subj(dbpedia:Love, I), _obj(dbpedia:Love, you)]
19:57:48.492 [main] DEBUG id.ac.itb.ee.lskk.relexid.core.RelEx - Deduced 2 relations for sentence 'null': [_subj(dbpedia:Love, I), _obj(dbpedia:Love, you)]
19:57:48.492 [main] INFO  i.a.i.ee.lskk.relexid.core.RelExTest - Sentence structure: (S (PP i) (VP dbpedia:Love (PP you_o)) . )
19:57:48.492 [main] INFO  i.a.i.ee.lskk.relexid.core.RelExTest - Sentence in English: I love you.
19:57:48.492 [main] INFO  i.a.i.ee.lskk.relexid.core.RelExTest - Sentence in Indonesian: Aku cinta kamu.
19:57:48.492 [main] INFO  i.a.i.ee.lskk.relexid.core.RelExTest - Relations: [_subj(dbpedia:Love, I), _obj(dbpedia:Love, you)]
_subj(dbpedia:Love, I)
_obj(dbpedia:Love, you)

19:59:30.905 [main] DEBUG id.ac.itb.ee.lskk.relexid.core.RelEx - Matcher pronoun matches (PP i)
19:59:30.905 [main] DEBUG id.ac.itb.ee.lskk.relexid.core.RelEx - Matcheds 0 for matchers [pronoun] against [(NP dbpedia:Elephant)]
19:59:30.905 [main] DEBUG id.ac.itb.ee.lskk.relexid.core.RelEx - Matcheds 1 for matchers [pronoun, verb] against [(PP i), (VP dbpedia:Like (NP dbpedia:Elephant))]
19:59:30.905 [main] DEBUG id.ac.itb.ee.lskk.relexid.core.RelEx - Matcheds 0 for matchers [pronoun, verb] against [(VP dbpedia:Like (NP dbpedia:Elephant)), .]
19:59:30.905 [main] DEBUG id.ac.itb.ee.lskk.relexid.core.RelEx - Matcher pronoun matches (PP i)
19:59:30.905 [main] DEBUG id.ac.itb.ee.lskk.relexid.core.RelEx - Matcher noun matches (NP dbpedia:Elephant)
19:59:30.905 [main] DEBUG id.ac.itb.ee.lskk.relexid.core.RelEx - Matcheds 1 for matchers [noun] against [(NP dbpedia:Elephant)]
19:59:30.905 [main] DEBUG id.ac.itb.ee.lskk.relexid.core.RelEx - Matcher verb matches (VP dbpedia:Like (NP dbpedia:Elephant))
19:59:30.906 [main] DEBUG id.ac.itb.ee.lskk.relexid.core.RelEx - Matcheds 2 for matchers [pronoun, verb] against [(PP i), (VP dbpedia:Like (NP dbpedia:Elephant))]
19:59:30.906 [main] DEBUG id.ac.itb.ee.lskk.relexid.core.RelEx - Relation rule [pronoun verb => _subj(2, 1) || _obj(2, 2/1)] matches 0..1 [(PP i), (VP dbpedia:Like (NP dbpedia:Elephant))]
19:59:30.906 [main] DEBUG id.ac.itb.ee.lskk.relexid.core.RelEx - Matcheds 0 for matchers [pronoun, verb] against [(VP dbpedia:Like (NP dbpedia:Elephant)), .]
19:59:30.907 [main] DEBUG id.ac.itb.ee.lskk.relexid.core.RelEx - Deduced 2 relations from 3 parts [(PP i), (VP dbpedia:Like (NP dbpedia:Elephant)), .] >> [_subj(dbpedia:Like, I), _obj(dbpedia:Like, dbpedia:Elephant)]
19:59:30.907 [main] DEBUG id.ac.itb.ee.lskk.relexid.core.RelEx - Deduced 2 relations for sentence 'null': [_subj(dbpedia:Like, I), _obj(dbpedia:Like, dbpedia:Elephant)]
19:59:30.907 [main] INFO  i.a.i.ee.lskk.relexid.core.RelExTest - Sentence structure: (S (PP i) (VP dbpedia:Like (NP dbpedia:Elephant)) . )
19:59:30.907 [main] INFO  i.a.i.ee.lskk.relexid.core.RelExTest - Sentence in English: I like elephant.
19:59:30.907 [main] INFO  i.a.i.ee.lskk.relexid.core.RelExTest - Sentence in Indonesian: Aku suka gajah.
19:59:30.907 [main] INFO  i.a.i.ee.lskk.relexid.core.RelExTest - Relations: [_subj(dbpedia:Like, I), _obj(dbpedia:Like, dbpedia:Elephant)]
_subj(dbpedia:Like, I)

_obj(dbpedia:Like, dbpedia:Elephant)