Tuesday, April 16, 2013

Code Review for a String Lock Map

I recently needed a mechanism to lock on given Strings in a particular piece of code. As maybe a weak example, say I was doing disk file updates to files each with unique filenames (i.e. read, update, write-back). I could have synchronized the methods that modified the files, but I preferred to not lock at that level - in other words, only one file update could occur at a time then.

I'd prefer to simply lock on that file - or more specifically, that filename. Its a dubious proposition to lock on String objects in Java as different objects can represent the same set of characters. You can intern() all your strings (which makes sure the objects you have are the "one" system-wide representation of that String) but that is rather global. I wanted to do things locally - where I can control the scoping of the lock.

(The filename example is just an example, I've needed this structure in many different scenarios - so don't look to just solve that problem).

I might have missed it, but I didn't find any facilities in java.util.concurrent to help me (and to be fair, I haven't kept up with the happenings there. Long story short, I wrote my own.

Of course the best advice when writing concurrent locking datastructures is - DON'T.

But I did. And I probably plan on using it eventually, so I put it here for review. I'll update it as comments come in (so if you see a comment below that makes little sense it's probably because I heeded it and changed the code).

The use case I want:

public static final StringLock stringLock = new StringLock();

...

try {
     stringLock.lock(filename);

     // load disk file
     // change its contents a bit
     // rewrite it
     
} finally {
     stringLock.unlock(filename);
}

So.. find below the code I've come with so far. Its a bit heavy under contention but the normal case (no contention) it syncs 3 times (writes on ConcurrentHashMap being a sync). I could also use Cliff Click's NonBlockingHashMap to remove two of the syncs but and I'd be happy to be convinced that'd be superior for some reason.

public class StringLock {

 private final ConcurrentHashMap lockedStrings = new ConcurrentHashMap();

 public void lock(String s) { 
   Object lockObject = null; 
   while (true) { 
     lockObject = lockedStrings.get(s); 

     if (lockObject != null) { 
       synchronized (lockObject) { 
          Object lockObject2 = lockedStrings.get(s); 
          if (lockObject2 == lockObject) { 
            try { lockObject2.wait(); } catch (Exception e) { } 
          } 
        } 
     } else { 
        lockObject = new Object(); 
        Object lockObject1 = lockedStrings.putIfAbsent(s, lockObject); 
        if (lockObject1 == null) { 
          break; 
        } 
     } 
   } 
 } 

 public void unlock(String s) { 
    Object lockObject = lockedStrings.get(s); 
    synchronized (lockObject) { 
      lockedStrings.remove(s); 
      lockObject.notifyAll(); 
    } 
  }
}

14 comments:

Alex Silverstein said...

Hi Paul,

Is it essential that the map be inside the Lock? Or is it to avoid having to do the lazy creation of locks whenever you use this class? I think it's most likely the former, in which case there are a few approaches you could use, the first leading to the second.

They both use structures found in the excellent Google Guava (https://code.google.com/p/guava-libraries/ if you're not familiar).

Basically, I think you want a LoadingCache where the CacheLoader simply returns a new Lock. Then any further calls with the same String will get the same Lock. To use it then is simply:

stringLocks.get(filename).lock();

(obviously with try/catch, etc.)


It turns out that this ends up being a common enough pattern that Guava has a class called Striped which is essentially a Map which handles the lazy loading for you, and comes with a few other nice optional features. (e.g. you can specify the locks to be weakly referenced so they get GC'd when no one is holding them). Check out https://code.google.com/p/guava-libraries/wiki/StripedExplained for more.

That's the highish level spiel. I think the links should provide enough context, otherwise I can follow up.


Also, I'm sure you've thought of it, but watch out for non-canonical filenames. You don't want "./foo/bar/.." and "./foo" to get separate locks.


And thanks for the blog :).

-Alex

Paul Tyma said...

LoadingCache is an interesting idea (and I use it tons, so I'm quite familiar).

But I assume in the general case we can't let the cache ever expire entries (via cache maxsize or time). Otherwise, a different lock for the same key could arise before the first was unlocked.

Overall I think the idea is the same except LoadingCache buries the lock/lockObject creation (and synchronization) inside the build method.

I'm very interested in Striped - off to read up on that now!

eyrieowl said...

I'm not entirely certain why you wouldn't just create a lock object to synchronize on, and just keep it mapped. I guess you're expecting to have enough files that memory is a concern? Worth noting that you're trading off memory consumption for a lot more GC b/c of the temporary locks as well as the Node objects. Biggest issue I see is some other thread could unlock the lock, which you don't want. It doesn't enforce that the locking thread also unlocks.

I think I'd probably do soemthing more like this:

WriteContainer wc = WriteManager.get(s);

try {
wc.lock();
//do file operations via wc
}
finally {
wc.unlock();
}

where the unlock method would both take care of notifying any waiting writer *as well as* unregistering the write container if no one is waiting. That would save a lot on extraneous garbage objects, since you'd potentially get reuse of the container across multiple gross operations.

That said, how's this look?

private final ConcurrentHashMap lockedStrings = new ConcurrentHashMap();

public void lock(String s) {
while (true) {
MyLock lock = lockedStrings.get(s);
if (lock == null) {
lock = new MyLock(s);
MyLock prevLock = lockedStrings.putIfAbsent(s, lock);
if (prevLock != null)
lock = prevLock;
}

if (lock.doLock())
break;
}
}

public void unlock(String s) {
MyLock lock = lockedStrings.get(s);
if (lock != null) {
lock.unlock();
}
}

private final class MyLock extends ReentrantLock {
private int count = 0;
private final String s;

public MyLock(String s) {
this.s = s;
}

public boolean doLock() {
synchronized(this) {
if (count < 0)
return false;
else count++;
}

super.lock();
return true;
}

public void unlock() {
super.unlock();
synchronized (this) {
if (--count <=0) {
count = -1;
lockedStrings.remove(s, this);
}
}
}
}

Hit me up on gchat if you want to talk through it, perhaps with more discussion about what you're trying to do a better path would present itself.

Michael D said...

While not really feedback on your underlying idea (which seems reasonable to me) your code would be somewhat cleaner if your StringLock class implemented AutoCloseable and therefore could take advantage of try-with-resources.

Alex Silverstein said...

The default Cache you get doesn't expire entries, so you'd be fine there.

Yup, it's pretty much the same, except that it falls under the domain of not writing your own synchronization utilities, which is a much more comfortable domain for me.

Let us know what you think!

Gnarly Buttons said...

For this particular use case, I think you could probably improve performance by using an array instead of a map. No matter what architecture you're on, the number of threads that might be executing at once is rather tiny. As such, I feel like you might be able to just create an array of, say, 64 lock objects, and then, to obtain the lock for a given string, you just hash the string, modulo 64, and lookup that index in the array. Preallocate all the locks before the concurrency begins, and you've removed all fancy concurrency logic other than the actual locks you're trying to use.

Gnarly Buttons said...

For this particular use case, I think you could probably improve performance by using an array instead of a map. No matter what architecture you're on, the number of threads that might be executing at once is rather tiny. As such, I feel like you might be able to just create an array of, say, 64 lock objects, and then, to obtain the lock for a given string, you just hash the string, modulo 64, and lookup that index in the array. Preallocate all the locks before the concurrency begins, and you've removed all fancy concurrency logic other than the actual locks you're trying to use.

Darren said...

I've depended on Guava's Interner for this

static Interner locks = Interners.newWeakInterner();

synchronized(locks.get(name)) {
// do work
}

It's not as cool as Striped, but it gets the job done.

Clint said...

My overly complicated implementation was intended to use the standard java.util.concurrent.locks.

public class ReadWriteMultiLock {

private final WeakHashMap> locks = new WeakHashMap>();

public ReadWriteMultiLock() {}

private enum LockType {
READ,
WRITE
}

public int count() {
synchronized (locks) {
return locks.size();
}
}

public Lock readLock(Object t) {
return getLock(LockType.READ, t);
}

public Lock writeLock(Object t) {
return getLock(LockType.WRITE, t);
}

protected Lock getLock(LockType lt, Object t) {
ReadWriteLock rwl = null;
synchronized (locks) {
WeakReference wrwrl = locks.get(t);
if (wrwrl != null) {
rwl = wrwrl.get();
}
if (rwl == null) {
rwl = new ReferringReentrantReadWriteLock(t);
locks.put(t, new WeakReference(rwl));
}
}
return lt == LockType.READ ? rwl.readLock() : rwl.writeLock();
}

// referring
private static class ReferringReentrantReadWriteLock implements ReadWriteLock {

private final ReentrantReadWriteLock rwlock = new ReentrantReadWriteLock();

// we are holding this reference to prevent gc of the key in the weakhashmap
private final Object t;

public ReferringReentrantReadWriteLock(Object t) {
if (t == null) { throw new NullPointerException(); }
this.t = t;
}

/* (non-Javadoc)
* @see java.util.concurrent.locks.ReadWriteLock#readLock()
*/
@Override
public Lock readLock() {
return new InternalLock(rwlock.readLock());
}

/* (non-Javadoc)
* @see java.util.concurrent.locks.ReadWriteLock#writeLock()
*/
@Override
public Lock writeLock() {
return new InternalLock(rwlock.writeLock());
}

private class InternalLock implements Lock {

private final Lock lock;

protected InternalLock(Lock l) {
this.lock = l;
}

@Override
public void lock() {
lock.lock();
}

/**
* @throws InterruptedException
* @see java.util.concurrent.locks.Lock#lockInterruptibly()
*/
@Override
public void lockInterruptibly() throws InterruptedException {
lock.lockInterruptibly();
}

@Override
public Condition newCondition() {
return lock.newCondition();
}

@Override
public boolean tryLock() {
return lock.tryLock();
}

@Override
public boolean tryLock(long time, TimeUnit unit) throws InterruptedException {
return lock.tryLock(time, unit);
}

@Override
public void unlock() {
lock.unlock();
}
}
}
}

Clint said...

My implementation was intended to use java.util.concurrent.locks.


/*Use new String("Literal") instead of "Literal" as keys because String interning may keep references to locks that are no longer used.
*/
public class ReadWriteMultiLock {

private final WeakHashMap> locks = new WeakHashMap>();

public ReadWriteMultiLock() {}

private enum LockType {
READ,
WRITE
}

public int count() {
synchronized (locks) {
return locks.size();
}
}

public Lock readLock(Object t) {
return getLock(LockType.READ, t);
}

public Lock writeLock(Object t) {
return getLock(LockType.WRITE, t);
}

protected Lock getLock(LockType lt, Object t) {
ReadWriteLock rwl = null;
synchronized (locks) {
WeakReference wrwrl = locks.get(t);
if (wrwrl != null) {
rwl = wrwrl.get();
}
if (rwl == null) {
rwl = new ReferringReentrantReadWriteLock(t);
locks.put(t, new WeakReference(rwl));
}
}
return lt == LockType.READ ? rwl.readLock() : rwl.writeLock();
}

// referring
private static class ReferringReentrantReadWriteLock implements ReadWriteLock {

private final ReentrantReadWriteLock rwlock = new ReentrantReadWriteLock();

// we are holding this reference to prevent gc of the key in the weakhashmap
private final Object t;

public ReferringReentrantReadWriteLock(Object t) {
if (t == null) { throw new NullPointerException(); }
this.t = t;
}

/* (non-Javadoc)
* @see java.util.concurrent.locks.ReadWriteLock#readLock()
*/
@Override
public Lock readLock() {
return new InternalLock(rwlock.readLock());
}

/* (non-Javadoc)
* @see java.util.concurrent.locks.ReadWriteLock#writeLock()
*/
@Override
public Lock writeLock() {
return new InternalLock(rwlock.writeLock());
}

private class InternalLock implements Lock {

private final Lock lock;

protected InternalLock(Lock l) {
this.lock = l;
}

@Override
public void lock() {
lock.lock();
}

/**
* @throws InterruptedException
* @see java.util.concurrent.locks.Lock#lockInterruptibly()
*/
@Override
public void lockInterruptibly() throws InterruptedException {
lock.lockInterruptibly();
}

@Override
public Condition newCondition() {
return lock.newCondition();
}

@Override
public boolean tryLock() {
return lock.tryLock();
}

@Override
public boolean tryLock(long time, TimeUnit unit) throws InterruptedException {
return lock.tryLock(time, unit);
}

@Override
public void unlock() {
lock.unlock();
}

}
}

}

Alex Silverstein said...

@Gnarly Looking deeper, this is in fact the only case that Striped supports. There doesn't seem to be a way to create an unbounded Striped; the size is required at creation.

So if having a guaranteed unique lock per file is essential, I think Striped is currently a non-starter. But it would surprise me if that was a requirement.

Paul Tyma said...

Wow... great responses.. I learned a lot already :)

@eyrieowl Your solution looks good but doesn't fundamentally strike me as different from mine. You replace Object with MyLock but still create just as many?
Am I missing something?

@GnarlyButtons That's a solid solution and I have a feeling you're finding a middle-ground between my solution and Alex's (guava's) Striped class.

You're trading GC for occasional unnecessary lock contention - i.e. two filenames hash to the same value. Its a pretty reasonable tradeoff however.

@Darren - wow - I had no idea that existed. (which pretty much describes Guava's secrets in general).


@Clint - A few years ago I had a lot of exchange with Doug Lea (author of lots java.util.concurrent) and I couldn't create a use case where ReadWriteLock outperformed simpler methods. It might have changed but I veer away from it ever since. Nice solution though ! And definitely more featured than what I did.



eyrieowl said...

So, there's a few things I pointed to. What I noted as my "biggest issue" was that a different thread could unlock the lock without having acquired it first. So, using a derivative of Lock was intended to address that.

Another aspect to using the Lock impl is that the scheduling in case of contention is more...'normal'. You could set it to be fair, if you wanted, but even if not the scheduling is based on the jvm implementation of which waiting thread should get the lock rather than potentially multiple threads racing back around through the loop to see who wins. The only reason mine has the loop is to address the race condition where the lock could get removed after another thread has referenced it but before locking it, which should be a rarer occurrence. And, if, for some strange reason, you have MANY threads waiting for a file, it would perform MUCH better than having that notifyAll in there...

Separate from addressing the malicious unlocking issue, the solution does some other things. In the contended situation (even light contention), it avoids creating a new lock but reuses the existing lock. The lock is only removed when no one is using it/waiting on it. Obv not much of a win if the expected scenario is only occasional contention.

My comments on trading off memory vs GC were addressing the possibility of keeping the lock around from one use to the next, perhaps in a canonical cache which you could bound. Why I suggested chatting is that what I don't know about the problem you're solving is how likely it is that you'll need to update a file you've previously updated. If the system has many thousands of files, and you'll rarely be updating the same file, then it's probably not helpful, but if you are realistically just looking at hundreds of files, I'd think you'd probably benefit from just keeping the lock around. It wouldn't take up much memory, and you would simplify the design a lot, you'd be able to do everything with the ConcurrentMap, and just synchronize on the bound object directly instead of needing logic to determine if someone is waiting, etc.

Something like this:


public void getLock(String s) {
Object lock = lockedStrings.get(s);
if (lock == null)
{
Object newLock = new Object();
lock = lockedStrings.putIfAbsent(s, newLock);
if (lock == null)
lock = newLock;
}
return lock;
}

synchronize(getLock(fileName)) {
// load disk file
// change its contents a bit
// rewrite it
}

That being the simplest solution, have you truly validated that it wouldn't work for your use?

Of course, we're also avoiding another solution...to lock the file in the filesystem. Is there a reason why you don't want to do that?


RandomAccessFile raf = null;
FileLock fl = null;
try
{
raf = new RandomAccessFile(fileName, "rw");
FileChannel fileChannel = file.getChannel();

fl = fileChannel.lock();

// load disk file
// change its contents a bit
// rewrite it

}finally{
if (fl != null){
fl.release();
}
}

Juan Cruz Nores said...

You could make it general and use a simpler implementation. Also by using AutoClosables you can reduce the risk of forgetting the finally clause. See here: http://pastebin.com/PP6MfWeQ