Sunday, May 20, 2007

Japanese Spammers got it going on

So I've been up doing some *looong* overdue fixing on handling international character sets in Mailinator. Seeing some international spam, I must say, English-language spammers got nothing on Japanese spammers.

These guys take spamming as an art form!

I cut and pasted one below - actually, I'm not exactly "sure" its a spam, but it sure is pretty :)

-----------------------------------------------------------

Inbox: azz
From: eaglebomber@one-crest.com
Subject: ワケあって欲求不満なんで
Charset: ISO-2022-JP


"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#"#
■ 今月のイチオシ! ■■■ 中出し・顔写・緊縛・エトセトラ… ■■■■■
■■■■■■■■■■■■■ あなたの求める全てのエロが    ■■■■■
■■■■■■■■■■■■■   ここでなら、体感できます!!!!!■■■■■
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
■■ ┏━━┳┓ ┏┓  ┏━━┳━━┓┏━━┳━━┳━━┳━━┓ ■■
■■ ┃┏┓┃┃ ┃┃  ┗┓┏┫┏┓┃┃━━┫┏┓┃━━┫━━┫ ■■
■■ ┃┏┓┃┗━┫┗━┓┏┛┗┫┗┛┃┃┏━┫┏┓┫━━┫━━┫ ■■
■■ ┗┛┗┻━━┻━━┛┗━━┻━━┛┗┛ ┗┛┗┻━━┻━━┛ ■■
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
■ ┏┓  ┏━━━┓┏┓ ┏┓┏┓┏━┓  ┏━━┓┏━━━━━━┓
■┏┛┗━━┓━━┓┃┃┃ ┃┃┃┃┗━┛┏┓┃┏┓┃┃┏━━━━┓┃
■┗┓┏━┓┃  ┃┃┃┃ ┃┃┃┃   ┃┃┃┃┃┃┃┃┏━━┓┃┃
■ ┃┃┏┛┃  ┃┃┃┃ ┃┃┃┃   ┃┃┗┛┃┃┃┃┃┏┓┃┃┃
■ ┃┃┗━┛  ┃┃┃┃ ┗┛┃┃   ┃┃  ┃┃┃┃┃┃┃┃┃┃
■ ┃┗━━┓ ┏┛┃┃┗━┓┏┛┃┏━━┛┃ ┏┛┃┃┃┃┗┃┃┃┃
■ ┗━━━┛ ┗━┛┗━━┛┗━┛┗━━━┛ ┗━┛┃┃┗━┗┛┃┃
■     直  メ  即  会  い  掲  示  板   ┃┗━━━━┛┃
■ ・・・………━━━━━━━━━━━━━━━━━━━┗━━━━━━┛
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
■■■■                           ■■■■
■■■■     http://adap.jp/main.php?adv=LP18325      ■■■■
■■■■                           ■■■■
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
■                              ■■■■
■┌───────────┐                  ■■■
■│   セフレリンクが  │┏━━━━━━━━━━━━━━━━━━━┓
■│必ずヤレル!?つのワケ│┃                   ┃
■└───┐┌──────┘┃ ?アドレス・電話番号公開掲示板だから ┃
■    ││       ┃  自由に連絡がトレル!!        ┃
■    ││   │\  ┃                   ┃
■    ││   │ \ ┃ ?目的が決まっている人ばかりなので、 ┃
■    │└───┘  \┃  スグ待ち合わせデキル!!       ┃
■    └────┐  /┃                   ┃
■         │ / ┃ ?検索機能も充実で、初心者でも簡単に ┃
■■        │/  ┃  利用デキル!!            ┃
■■■           ┃                   ┃
■■■■          ┃ まずは無料で、即会いセックスを   ┃
■■■■■          ┃             お試し下さい!!┃
■■■■■■        ┗━━━━━━━━━━━━━━━━━━━┛
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■


┏━━┓                本┃日┃即┃ハ┃メ┃!!┃
┃\/┃ アドレス公開BBS      ━┛━┛━┛━┛━┛━┛
┗━━┛━━━━━━━━━━━━━━━━━━━━━━━━………・・・
                     TEL & アドレス公開   
 名前:Dr.SHIHO さん   ┏━┫   ┏━━┳━┳┓┏┳┓
                 ┃□┃━━┓┃┏┓┃┃┃┃┃┃┃
 年齢:29 歳          ┃==┃\/┃┃┗┛┃┃┃┃┣╋┫
                 ┗━┛━━┛┗━━┻┻━┛┗┻┛

                                  
 はじめまして、しほって言います。1人しか経験がないので色々教えてく
 れるセフレを探してます。


  ┏━━┓    ┏━━━━━━━━━━━━━━━━━━━━━┓
 ┏┛ ━┻━━┓ ┃       直接メール!!        ┃
 ┛  ━┳━━┛ ┃                     ┃
    ━┫    ┃  http://adap.jp/main.php?adv=LP18325  ┃
 ━┓ ━┫    ┗━━━━━━━━━━━━━━━━━━━━━┛
  ┗━━┛      
・・・………━━━━━━━━━━━━━━━━━━━━━━━━━━━━




      実際のサイトはこんな感じ!! レッツ・ヌプヌプ♪      

 簡┃単┃?┃S┃T┃E┃P┃
 ━┛━┛━┛━┛━┛━┛━┛────────────────────
     あなたの選んだタイプにマッチする女性を紹介します。     
       簡単?ステップで、好みの女性を今すぐ!!         

              ▼▲▼▲▼▲▼              
               ▼▲▼▲▼
                ▼▲▼                
                 ▼                 
                                   
             ┏━┓
┏━━┳━━┳━━┳━━┓┗┓┃
┃┏━┻┓┏┫━━┫ □┃ ┃┃    好みのタイプを選んで下さい!! 
┗━━┃┃┃┃━━┫┏━┛┏┛┗┓┏┓     ※欲張り過ぎに注意!! 
┗━━┛┗┛┗━━┻┛  ┗━━┛┗┛     
 ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄
 ┌┐      ┌┐      ┌/      ┌┐
 └┘ ★人妻  └┘ ★巨乳  〆┘ ★SM  └┘ ★セレブ   

 ┌/      ┌┐      ┌┐      ┌┐
 〆┘ ★20代  └┘ ★30代  └┘ ★40代  └┘ ★中出し   

           ※チェックは例です。ここでは、20代とSM♪♪♪♪

 
             ┏━━┓
┏━━┳━━┳━━┳━━┓┗━┓┃
┃┏━┻┓┏┫━━┫ □┃┏━┛┃   あなたのことを教えて下さい!! 
┗━━┃┃┃┃━━┫┏━┛┃━━┓┏┓ 
┗━━┛┗┛┗━━┻┛  ┗━━┛┗┛                
 ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄

 あなたの名前と年齢を教えて下さい。───────────────  
   │\               │\             
  ┌┘ \ ニックネーム      ┌┘ \ 年齢         
  └┐ / ┌───────┐   └┐ /  ┌──┐      
   │/  │       │さん  │/   │  │歳     
       └───────┘         └──┘      

 お住まいと、ログインに使用するパスワードを決めて下さい。────  
   │\               │\             
  ┌┘ \ お住まい        ┌┘ \ パスワード      
  └┐ / ┌───────┐   └┐ /  ┌───┐     
   │/  │       │    │/   │   │     
       └───────┘         └───┘     

             ┏━━┓
             ┗━┓┃
┏━━┳━━┳━━┳━━┓┏━┛┃
┃┏━┻┓┏┫━━┫ □┃┗━┓┃                  
┗━━┃┃┃┃━━┫┏━┛┏━┛┃┏┓ ボタンを押すだけ!!!     
┗━━┛┗┛┗━━┻┛  ┗━━┛┗┛                
 ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄
   ┏━━━━                          
   ┃┏━━━━━━━━━━━━━━━━━━━━━━━━━┓   
   ┃┃                         ┃   
   ┃┃   あなたの好みの女性からメールを受け取る!!  ┃    
   ┃┃                         ┃   
     ┃   http://adap.jp/main.php?adv=LP18325     ┃   
    ┃          ┏┓             ┃┃  
     ┃          ┃┃             ┃┃  
    ┃         ┏┫┣┳┳┓          ┃┃   
    ┗━━━━━━━  ┃┫┃┃┃┃ ━━━━━━━━━┛┃  
              ┃    ┃        ━━━━┛  
               ┗┓  ┏┛ ポチッ!! 

     あとは、2人が決める待ち合わせの場所まで一直線です!     
     あなたも人生が変わるような出会いを体験してください!!     

Saturday, May 5, 2007

Nonblocking-Reader/Blocking-Writer in Java

So it's Saturday. I should be off like chasing women or motorcycle racing or something, but instead I'm at home obsessing over non-blocking threadsafe algorithms.

(yeah. again.)

In particular, I had an email exchange with (my ex-Ph.D.advisor) Doug Lea regarding the java.util.concurrent.lock's ReentrantReadWriteLock (which Doug also happen to be the author of). The beauty of this class is that it provides a solution to the classic Reader/Writer problem. That is, it allows many threads interesting reading something to operate simultaneously (as reading does not conflict with reading). However, it also allows you a writer-lock that when switched on, blocks all readers and other writers. This follows the assumption that writing could corrupt simultaneous readers or writers.

This wasn't very easy in Java prior to JDK 1.5.

Our discussion was over a microbenchmark I wrote to prove to myself how great it was that I could have a flock of readers all banging on a map (or something) simultaneously. I wrote (or borrowed) a standard regularly synchronized map wrapper:

public synchronized V put(K key, V value) {
  return map.put(key, value);
}

public synchronized V get(Object key) {
  return map.get(key);
}


That version will have readers waiting for both writers and other readers. And one that took advantage of the new Reader/Writer lock:


ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
Lock readLock = lock.readLock();
Lock writeLock = lock.writeLock();

public V put(K key, V value) {
  writeLock.lock();
  try {
    return map.put(key, value);
  } finally {
    writeLock.unlock();
  }
}

public V get(Object key) {
  readLock.lock();
  try {
    return map.get(key);
  } finally {
    readLock.unlock();
  }
}


Here, readers will not wait for other readers, however they will wait for writers (and writers wait for everyone).

This example does do better, but maybe not as "better" as I would think. Pounding the code with 50 simultaneous readers runs about 6.5 times faster (i.e., I got 6.5 times as many operations in a 1 second interval) than the fully synchronized version.

Interestingly, with 5 readers and 20 writers the synched version does about 10000 writes and 1200 reads. The reader/writer version does 300 writes and 10000 reads (sort of the opposite). In other words, the writers (probably expectedly) really get hurt in the reader/writer configuration. (If you're complaining microbenchmarks suck, you can stop. Highly contended parallel access to a map is pretty real - and if nothing else, I was really just trying to make threads fight, and this seemed like a reasonable way to do it).

Doug quickly pointed out that the reader/writer lock has a fair bit of overhead as compared to a simple synchronized method and that the map operations are so fast, they simply don't hold the lock long enough to cause much contention.

I probably should have quit there and played some more desktop tower defense, but I tried something else. Of course it seems that (in true double-checked locking form) if a reader can know that no writers are about, then there's no need for it to do any locking at all. Thus I played around awhile with a non-blocking threadsafe reader. This didn't go so well as I found counterexamples to every version of my code right as I typed the final semi-colon.

Then last nite I woke up at 3am (yeah, I know) and wrote this:


AtomicInteger writeCounter = new AtomicInteger();

public V get(Object key) {
  int save = 0;
  V value = null;
  do {
    while (((save = writeCounter.get()) & 1) == 1);
    value = map.get(key);
  } while (save != writeCounter.get());
  return value;
}

Lock lock = new ReentrantLock();

public V put(K key, V value) {
  lock.lock();
  try {
    writeCounter.getAndIncrement();
    map.put(key, value);
    writeCounter.getAndIncrement();
  } finally {
    lock.unlock();
  }
  return value;
}


The write lock used in the put method is pretty standard. Only one writer can be active at any time. The reader code (i.e., the get method) is the tricky part. Effectively, a reader won't start while it knows that a writer is active. And, once its done it checks again to see if a writer is now active or was while it was running. If so, it throws away its result and tries again. In theory, a reader might reread the same value 1000 (or infinite) times before its satisfied that the data is good. In practice, it only happens a very few times.

In threadsafe (blocking) code we're used to coding such that all important code can only happen when its safe. Nonblocking code works more like a "teenager model" - it says "Let's try it and then when we're done, we'll see if what we did was ok. If not, we'll try it again until it is ok."

If the very busy looking do or while loops seem odd to you, they're quite normal in non-blocking threadsafe code. See Brian Goetz's great introduction to nonblocking algorithms. In the right circumstance (as Brian is happy to point out), nonblocking code can smoke blocking code even if it ends up busy looping awhile. And of course, still be 100% correct. Of course, the latter statement is often difficult to make happen.

I love this kind of all hell's breaking loose code. I created some tests to try to expose a race condition and so far it looks good. I ran this idea by Jeremy Manson ("Dr. Memory Model") and he confirmed for me that the use of atomic ints above will cause memory consistency (keep in mind, Jeremy is not a big fan of lock-free algorithms it seems). My empirical tests verified it so far - I can't make this code race (I can easily make races by changing just tiny little bits). I'm going to play with some benchmarks and see what I can drum up.

(Edit: Honestly, if you're looking for a non-blocking hashtable the above is thought-provoking but not terribly clever. You'd be wise to check out Cliff Click's Nonblocking Hashtable implementation. Its more comprehensive, proven, and is very likely faster.)