Senin, 07 Juli 2014

Experimental Performance Test using GridGain for Distributed Natural Language Processing

I did an experimental performance test using GridGain to simulate AtomSpace processing. This is related to discussion in OpenCog group about AtomSpace architecture.

Disclaimer: This is not a benchmark, please don't treat it as such!

First I loaded up 212,351 YAGO labels (from MongoDB, but the actual backend doesn't matter here) for resources starting with letter M :

13:13:43.178 [main] INFO  i.a.i.e.l.l.yago.YagoLabelCacheStore - Loading 212351 labels...
13:13:45.595 [main] DEBUG i.a.i.e.l.l.yago.YagoLabelCacheStore -   [23%] 50000 labels loaded, 162351 more to go...
13:13:47.571 [main] DEBUG i.a.i.e.l.l.yago.YagoLabelCacheStore -   [47%] 100000 labels loaded, 112351 more to go...
13:13:49.139 [main] DEBUG i.a.i.e.l.l.yago.YagoLabelCacheStore -   [70%] 150000 labels loaded, 62351 more to go...
13:13:50.608 [main] DEBUG i.a.i.e.l.l.yago.YagoLabelCacheStore -   [94%] 200000 labels loaded, 12351 more to go...
13:13:50.914 [main] INFO  i.a.i.e.l.l.yago.YagoLabelCacheStore - Loaded 212351 labels...
13:13:50.917 [main] INFO - For yagoLabel, I have 100000 primary out of 100000 entries + 112351 swap

To make it somewhat more realistic, grid data for a node is capped at 100,000 entries. The configuration is partitioned, so for all 3 nodes then the entire dataset should be held entirely in memory. Then I started two more nodes, and the latest node does a search which resource ID has the label "Muhammad". So it's basically a reverse hashmap lookup, that can be perfectly be done using an index. But I'm treating the entries as atoms, just for the sake of doing distributed-parallel computation on them.

Collection<Set<String>> founds = labelCache.queries().createScanQuery(null).execute(new GridReducer<Entry<String, String>, Set<String>>() {
Set<String> ids = new HashSet<>();
public boolean collect(Entry<String, String> e) {
if (e.getValue().equalsIgnoreCase(upLabel)) {
ids.add( e.getKey() );
return true;
public Set<String> reduce() {
return ids;

The results, using my workstation i5-3570K @ 4x 3.40GHz, 3 nodes at 1 GB heap each:

[13:29:40] GridGain node started OK (id=03a07172)
[13:29:40] Topology snapshot [ver=5, nodes=3, CPUs=4, heap=3.0GB]
13:29:40.043 [main] INFO  i.a.i.e.l.l.yago.YagoLabelLookupCli2 - Finding resource for label 'Muhammad'...
13:29:43.131 [main] INFO  i.a.i.e.l.l.yago.YagoLabelLookupCli2 - Found for Muhammad: [[Muhammad_Khalil_al-Hukaymah, Muhammad_S._Eissa, Muhammad_Musa, Muhammad_Okil_Musalman, Muhammad_Loutfi_Goumah, Muhammad_Sadiq, Muhammad_Salih, Muhammad_Ismail_Agha, Muhammad_Yusuf_Hashmi, Mustafah_Muhammad, Muhammad_Mahbubur_Rahman, Muhammad_Ahmad_Said_Khan_Chhatari, Muhammad_Jamiruddin_Sarkar, Muhammad_Ibrahim_Joyo, Muhammad_bin_Tughluq, Muhammad_Sohail_Anwar_Choudhry, Muhammad_Tariq_Tarar], [Muhammad_Salman, Muhammad_Jailani_Abu_Talib, Muhammad_Qutb], [Muhammad_Ibrahim_Kamel, Muhammad_Amin_Khan_Turani, Muhammad_Ali_Pate, Muhammad_Rafi_Usmani, Muhammad_Faisal, Muhammad, Muhammad_Ilham, Muhammad_Kurd_Ali, Muhammad_Umar, Muhammad_Shahidullah, Muhammad_Anwar_Khan, Muhammad_Saifullah, Muhammad_Saqlain]]
[13:29:43] GridGain node stopped OK [uptime=00:00:03:865]

Searched 212,351 entries in 3088 ms, using 3 nodes × 4 threads = 12 total threads on single host. So the rate is ~68766 entries/second.

To be fair, GridGain is giving performance hints: (so for serious benchmark, these should be tuned)

[13:29:40]   ^-- Decrease number of backups (set 'keyBackups' to 0)
[13:29:40]   ^-- Disable fully synchronous writes (set 'writeSynchronizationMode' to PRIMARY_SYNC or FULL_ASYNC)
[13:29:40]   ^-- Enable write-behind to persistent store (set 'writeBehindEnabled' to true)
[13:29:40]   ^-- Disable query index (set 'queryIndexEnabled' to false)
[13:29:40]   ^-- Disable peer class loading (set 'peerClassLoadingEnabled' to false)
[13:29:40]   ^-- Disable grid events (remove 'includeEventTypes' from configuration)

Of course, 12 threads running on a single host isn't optimal, and there's no network saturation effects since all nodes are on the same host.

From how GridGain works, the performance should be (much?) better when there are 3 actual nodes/processors to work on. The key thing is that the calculation (map/reduce) is done on each node, so the "Mind Agent" (node 3) here only does roughly ~33% of the job, the other 2 "AtomSpace" nodes aren't just serving data, they're also processing data they already have, no need to move these bits around the network.

Since the closure is code (Java code), it's possible to use OpenCL/GPU for certain tasks, which should increase performance for math-intensive processing.

Fault tolerance also works very well, so you can kill and rearrange nodes at will, the grid will stay there as long as at least 1 node is up.

Tidak ada komentar:

Posting Komentar