Terracotta is Network Attached Memory (NAM) for Java VMs. It provides up to a terabyte of virtual heap for Java applications that spans hundreds of connected JVMs.
NAM is best suited for storing what they call scratch data. Scratch data is defined as object oriented data that is critical to the execution of a series of Java operations inside the JVM, but may not be critical once a business transaction is complete.
The Terracotta Architecture has three components:
JVM-level clustering can turn single-node, multi-threaded apps into distributed, multi-node apps, often with no code changes. This is possible by plugging in to the Java Memory Model in order to maintain key Java semantics of pass-by-reference, thread coordination and garbage collection across the cluster. Terracotta enables this using only declarative configuration with minimal impact to existing code and provides fine-grained field-level replication which means your objects no longer need to implement Java serialization.
Ari Zilka, the founder and CTO of Terracotta had a
video session organized by Skills Matter. He will show you how it works and how you can start clustering your POJO-based Web applications (based on Spring, Struts, Wicket, RIFE, EHCache, Quartz, Lucene, DWR, Tomcat, JBoss, Jetty or Geronimo etc.).
Comments
Re: Product: Terracotta - Open Source Network-Attached Memory
This looks pretty impressive. Has anyone actually tried implementing this into their own, existing, system? Definitely seems like this could be very powerful for large scale java apps (is anyone doing that? :o) )
http://www.samalamadingdong.com
Re: Product: Terracotta - Open Source Network-Attached Memory
"Definitely seems like this could be very powerful for large scale java apps (is anyone doing that? :o) )"
LinkedIn is a big Java shop.
Cheers
Terracotta isn't highly scalable
I am particularly unimpressed by Terracotta. Take for instance their justifications for avoiding a peer system:
http://www.ddj.com/java/199703478
"The peer-to-peer approach bottlenecks on the network, since every node needs to know everything that the other nodes know."
Also, http://www.theserverside.com/tt/articles/article.tss?l=TerracottaScalabi... :
"If an application is distributed across four nodes as opposed to two, should that cluster not take twice as many operations to update objects on all four nodes? If a clustering architecture were to send updates one-by-one, then four nodes would achieve half the clustered throughput of two. And, if we were to use a multicast approach, then we would lower the elapsed time for the update by going parallel in our cluster updates. But, if we confirm (ACK) those updates in all clustered JVMs—and for the case of correctness we really ought to acknowledge all updates on all nodes where the object lies resident—we still have to wait for 3 acknowledgements to each update in a four-node cluster and our design is, thus, O(n)."
The idea that a distributed (peer) system requires an O(n) routing algorithm is patently false. Take any modern research on DHTs which provide the same basic functionality as terracotta; these systems have, at worst, an O(log(n)) routing algorithm for object lookup. Or take a look at memcached, which has an O(1) lookup algorithm. My point is, Consistent hashing solves this problem.
Due to this erroneous reasoning on the part of the Terracotta developers, they have chosen to create an inherently unscalable system. Look closely at how Terracotta actually scales -- they essentially use a master slave architecture in order to distribute reads. They scale fine for a read-heavy application, but there is a single node bottleneck for writes. Thats right, Terracotta is a hierarchical system with a single node as the bottleneck.
Yes, It can be made faster with better hardware. And yes, you can use your own manual sharding strategies. But no, this is not a truly (horizontally) scalable product. When I read high scalability, I'm looking for technologies that avoid the problem of having a single bottleneck.
Disclosure: My only interaction with their product has been to read a couple of hyped up articles about it and look closely at their "cutting edge" architecture diagrams. I have never (and currently don't) work on any products/projects in the same realm.
Re: Product: Terracotta - Open Source Network-Attached Memory
I'm also skeptical of the "no code change" benefits. Most applications likely assume inter thread latency is faster than inter host latency. Likewise, failures in clusters are a lot more complicated with things like asymmetric partitions. Using spindles for locks sounds slow as well.
Re: Product: Terracotta - Open Source Network-Attached Memory
I want to address the concerns raised here:
1. "No code changes' is more accurately characterized as pure Java and direct JDK support. Our users say that they end up with cleaner code, fewer bugs, and faster apps once Terracotta has been introduced. So yes, apps are not always prepared to cluster, but a clustered app runs w/o Terracotta present with zero changes. As a specific example, I just spoke to a user who wanted to build a data structure made up of concurrenthashmaps, treemaps inside that, and linkedlists inside that. Worked fine with us. He wanted to benchmark against a distributed cache and this proved quite hard. His linkedlist has a fixed 180 time-based elements. Every so often he pops the oldest item off one end and adds the newest item to the other. I wouldn't want to model this use case only having a map. I would have to do some sort of indexing trick like writing an entry keyed "head of list 7" to the map which would contain as a value, the key of the entry which currently represented the head of that list. And each entry would have to maintain the ID of the next value in the chain. Why? Because linked structures cannot be clustered w/o Terracotta. They get serialized with other frameworks.
2. TC is O(1) like Memcached. And it is 10X higher thoughtput than anything else out there because of the runtime data it can leverage to route and batch--much like GC or hotspot can improve code, so can we. Our customers find that a typical TC installation of 2-4 JVMs (plus our server) would be replaced by 20 - 50 servers of the nearest competitor's solution. While they scale linearly to hundreds of nodes, they seem to need lots of nodes. And, our server has consistent hashing built-in to stripe data across Terracotta servers for linear scale. Why quote our blog entries and posts from 2 -4 years ago and ignore the ones from this year where we announced the existence of active / active server striping with Terracotta?
Anyways, try it. You will definitely like the programming model. You may have to tune a bit to get it to go fast, but we have all these visualization tools to help with that so that you aren't blind to what's going on like you would be with many other frameworks. (Our cluster profiling tools show lock hopping, object locality and load balancing visualization, etc.)
Cheers,
--Ari
Re: Product: Terracotta - Open Source Network-Attached Memory
Ari,
Sorry for referencing material that is out of date. I only read material from the front page of a google search -- that aside, could you explain how your recent improvements to Terracotta keep the system from becoming write-constrained? As I understand it, there is a master node that all writes need to pass through, or is this no longer the case?
-Michael Carter
Post new comment