Free To Feel

Heading to entrepreneur.

Joshua Chi

    What RabbitMQ Cluster Can Do

    What RabbitMQ cluster can do - Scaling and Reliability

    • Example 1:

    All data/state required for the operation of a RabbitMQ broker is replicated across all nodes.

    Let's take an example to help you understand better.

    There is a publisher which publish 10 messages to RabbitMQ broker(*1).

    |Publisher| --> |10 Messages| --> |Broker( = Node_A + Node_B)|

    So Node_A will receive Message 1, Message 3, 5, 7, 9; Node_B will receive Message 2, Message 4, 6, 8, 10.

    • Example 2:

    We still use the above example. If the Node_A crashed when it is consuming the messages, what happens?

    Before Node_A crashed (I simply powered off this machine):

    Node_A finished Message 1, Message 3, and it is starting to consume Message 5;

    Node_B finished Message 2, Message 4 and it is starting to consume Message 6.

    After Node_A crashed:

    Node_A stopped working;

    Node_B finished Message 6, Message 7, Message 8, Message 9, Message 10 and Message 5.

    Yes, as you can see, even Node_A dies unexpected. Broker still can get the status of Message 5 and send it to Node_B. And you can find following explanation:

      If a node goes down or becomes unreachable what effects can this have on the cluster? Do things 'hang' for a bit?
      Topology change operations (see above) could potentially pause operation for a brief time but they will complete eventually. We use the net_kernel erlang module to do monitoring between nodes. The default "tick" time there is 60 seconds but this can be reduced. Further, in the event of a failure, any communication between the nodes will likely result in an error being generated and detected immediately: i.e. the only time at which you would not know about a node failure for 60 seconds is if there was no communication between the nodes for that amount of time.

    What RabbitMQ cluster can not do - High Availability

    So the 60 seconds is what RabbitMQ can not do. We call it high availability.

    Whilst RabbitMQ also supports clustering, clustering is intended to facilitate scalability, not availability. Thus in a cluster, if a node fails, queues which were on the failed node are lost. With the high availability setup described in this guide, when a node fails, the durable queues and the persistent messages within them can be recovered by a different node.

    To build a high availability and scalable application, you can take a look RabbitMQ Placemaker


    1. RabbitMQ broker - a logical grouping of one or several Erlang nodes, each running the RabbitMQ application and sharing users, virtual hosts, queues, exchanges, etc. Sometimes we refer to the collection of nodes as a cluster.

    Do we still need scrum if ...

    Just learned an interesting point from 'balance software' post:

    When Scrum has died

      In the journey of dramatic improvements to bring your code base under control, there are few things that you should take notice off.
      * An architecture will emerge that supports the design of the resident parts.  Things fit together sweetly and each part supports the other part in symbiotic relationships.
      * The code base will get smaller and the team will shrink to, perhaps,  just 2 or 3 people.
      * Each developer will take on greater responsibility and will find it difficult to break core principles.  The act of living those principles will result in a values that don't need to be listed on a poster on the wall.
      * The scrum master will become redundant.
      * The product owner will do more by working directly with the developers.
      * The developers will represent the interest of their customers directly.
      * The bottleneck will shift from development to the product owner and eventually the customer.

    In my opinion, Scrum is used for company to delivery features to users quickly and interactively. As a team, with scrum you can easily prove your works after a short time. You don't have to wait for one year or longer. And big feature can be split into small ones. Everyone know what need to be done in following weeks. You don't have to write so many documents to trace what's going on. Ideally, as the authors said, if the code base can be made smaller and smaller. And the new feature will be just a plugin to your platform. Of course you don't need scrum. Should we call it as 'factory model'? As workers(developers) just need to collect the feature request and compose it by some existing functions. And plug it into the platform(codebase). I have not seen a company who has such a powerful platform. Will the bottleneck be the platform itself then?

    Web Server Memory Monitor


    free-m: to see how much memory you are currently using

    Output:           total    used   free    shared buffers cached
          Mem: 90      85       4      0       3       34
    -/+ buffers/cache:  46      43
        Swap:   9        0       9
    The top row 'used' (85) value will almost always nearly match the top row mem value (90). Since Linux likes to use any spare memory to cache disk blocks (34).

    The key used figure to look at is the buffers/cache row used value (46). This is how much space your applications are currently using. For best performance, this number should be less than your total (90) memory. To prevent out of memory errors, it needs to be less than the total memory (90) and swap space (9).

    If you wish to quickly see how much memory is free look at the buffers/cache row free value (43). This is the total memory (90)- the actual used (46). (90 - 46 = 44, not 43, this will just be a rounding issue)

    ps aux

    ps aux: to see where all your memory is going.That will show the percentage of memory each process is using.  You can use it to identify the top memory users (usually Apache, MySQL and Java processes).

    root 854 0.5  39.2 239372  36208 pts/0 S     22:50 0:05 /usr/local/jdk/bi
    n/java -Xms16m -Xmx64m -Djava.awt.headless=true -Djetty.home=/opt/jetty -cp /opt
    We can see that java is using up 39.2% of the available memory.


    vmstat:helps you to see, among other things, if your server is swapping.

    - vmstat 1 2 procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 0 0 0 39132 2416 804 15668 4 3 9 6 104 13 0 0 100 0 0 0 39132 2416 804 15668 0 0 0 0 53 8 0 0 100 0 0 0 39132 2416 804 15668 0 0 0 0 54 6 0 0 100 The first row shows your server averages. The si (swap in) and so (swap out) columns show if you have been swapping (i.e. needing to dip into 'virtual' memory) in order to run your server's applications. The si/so numbers should be 0 (or close to it). Numbers in the hundreds or thousands indicate your server is swapping heavily. This consumes a lot of CPU and other server resources and you would get a very (!) significant benefit from adding more memory to your server.

    Some other columns of interest: The r (runnable) b (blocked) and w (waiting) columns help see your server load. Waiting processes are swapped out. Blocked processes are typically waiting on I/O. The runnable column is the number of processes trying to something. These numbers combine to form the 'load' value on your server. Typically you want the load value to be one or less per CPU in your server.

    The bi (bytes in) and bo (bytes out) column show disk I/O (including swapping memory to/from disk) on your server.

    The us (user), sy (system) and id (idle) show the amount of CPU your server is using. The higher the idle value, the better.

    Resolving: High Java Memory Usage

    Java processes can often consume more memory than any other application running on a server.

    Java processes can be passed a -Xmx option. This controls the maximum Java memory heap size. It is important to set a limit on the heap size, otherwise the heap will keep increasing until you get out of memory errors on your VPS (resulting in the Java process - or even some other, random, process - dying.

    Usually the setting can be found in your /usr/local/jboss/bin/run.conf or /usr/local/tomcat/bin/ config files. And your RimuHosting default install should have a reasonable value in there already.

    If you are running a custom Java application, check there is a -XmxNNm (where NN is a number of megabytes) option on the Java command line.

    The optimal -Xmx setting value will depend on what you are running. And how much memory is available on your server.

    From experience we have found that Tomcat often runs well with an -Xmx between 48m and 64m. JBoss will need a -Xmx of at least 96m to 128m. You can set the value higher. However, you should ensure that there is memory available on your server.

    To determine how much memory you can spare for Java, try this: stop your Java process; run free -m; subtract the 'used' value from the "-/+ cache" row from the total memory allocated to your server and then subtract another 'just in case' margin of about 10% of your total server memory. The number you come up with is a rough indicator of the largest -Xmx setting you can use on your server.

    Resolving: High Apache Memory Usage

    Apache can be a big memory user. Apache runs a number of 'servers' and shares incoming requests among them. The memory used by each server grows, especially when the web page being returned by that server includes PHP or Perl that needs to load in new libraries. It is common for each server process to use as much as 10% of a server's memory.

    To reduce the number of servers, you can edit your httpd.conf file. There are three settings to tweak: StartServers, MinSpareServers, and MaxSpareServers. Each can be reduced to a value of 1 or 2 and your server will still respond promptly, even on quite busy sites. Some distros have multiple versions of these settings depending on which process model Apache is using. In this case, the 'prefork' values are the ones that would need to change.

    To get a rough idea of how to set the MaxClients directive, it is best to find out how much memory the largest apache thread is using. Then stop apache, check the free memory and divide that amount by the size of the apache thread found earlier. The result will be a rough guideline that can be used to further tune (up/down) the MaxClients directive. The following script can be used to get a general idea of how to set MaxClients for a particular server:

    echo "This is intended as a guideline only!"
    if [ -e /etc/debian_version ]; then
    elif [ -e /etc/redhat-release ]; then
    RSS=`ps -aylC $APACHE |grep "$APACHE" |awk '{print $8'} |sort -n |tail -n 1`
    RSS=`expr $RSS / 1024`
    echo "Stopping $APACHE to calculate free memory"
    /etc/init.d/$APACHE stop &ampamp> /dev/null
    MEM=`free -m |head -n 2 |tail -n 1 |awk '{free=($4); print free}'`
    echo "Starting $APACHE again"
    /etc/init.d/$APACHE start &> /dev/null
    echo "MaxClients should be around" `expr $MEM / $RSS`

    Note: httpd.conf should be tuned correctly on our newer WBEL3 and FC2 distros. Apache is not installed by default on our Debian distros (since some people opt for Apache 2 and others prefer Apache 1.3). So this change should only be necessary if you have a Debian distro.

    from "Setting MaxRequestsPerChild to a non-zero limit solves some memory-leakage problems caused by sloppy programming practices and bugs, whereby a child process consumes a little more memory after each request. In such cases, and where the directive is left unbounded, after a certain number of requests the children will use up all the available memory and the server will die from memory starvation."

    Resolving: High MySQL Memory Usage

    -- if your are not using the innodb table manager, then just skip it to save some memory -- skip-innodb innodb_buffer_pool_size = 16k key_buffer_size = 16k myisam_sort_buffer_size = 16k query_cache_size = 1M

    Troubleshooting Irregular Out Of Memory Errors

    Sometimes a server's regular memory usage is fine. But it will intermittently run out of memory. And when that happens you may lose trace of what caused the server to run out of memory.

    In this case you can setup a script (see below) that will regularly log your server's memory usage. And if there is a problem you can check the logs to see what was running.

    wget -O /root/ chmod +x /root/

    -- create a cronjob that runs every few minutes to log the memory usage
    echo '0-59/10 * * * * root /root/ >> /root/memmon.txt' > /etc/cron.d/memmon
    /etc/init.d/cron* restart 
    -- create a logrotate entry so the log file does not get too large
    echo '/root/memmon.txt {}' > /etc/logrotate.d/memmon

    This article was copied from


    To deal with this error, there are a lot of suggestions. In my case, it is because of:

      Make sure that the Charset and Collate options are the same both at the table level as well as individual field level for the key columns. (Thanks to FRR for this tip)

    Same update query only update once

    Just learned more about the mysql update statement. I am very curious with this query:

    UPDATE `table_name` SET `column3` = (`column1` | (`column2` * 2) ;

    The first time, there are 10,000 items were updated. When I execute it again, 0 affected.

    So there is no where condition, how can this happen? Learn it from mysql doc: Speed of UPDATE Statements.

    Yes, it is because of the index.

      ...The speed of the write depends on the amount of data being updated and the number of indexes that are updated. Indexes that are not changed do not get updated. 

    Deal with mysql ERROR 1005 (HY000)

    How to deal with 'mysql ERROR 1005 (HY000): Can't create table '' (errno: 150)'? Usuall it was caused by mysql foreign key. But how you know which foreign key is wrong? Or why it is incorrect? To get more information, you can execute:

    . In my case:

    CREATE TABLE `table3`
      `name` VARCHAR(255),
      `author_id` INTEGER  NOT NULL,
      `is_published` TINYINT default 0,
      `identifier` VARCHAR(100),
      `path` VARCHAR(255),
      `description` TEXT,
      PRIMARY KEY (`id`),
      KEY `books_I_1`(`is_published`),
      INDEX `books_FI_1` (`author_id`),
      CONSTRAINT `books_FK_1`
        FOREIGN KEY (`author_id`)
        REFERENCES `users` (`id`)

    I get mysql 1005 error with above sql, if I execute show innodb status, I will get:

    110121 16:11:12 INNODB MONITOR OUTPUT
    Per second averages calculated from the last 13 seconds
    OS WAIT ARRAY INFO: reservation count 28, signal count 28
    Mutex spin waits 0, rounds 81, OS waits 4
    RW-shared spins 38, OS waits 19; RW-excl spins 5, OS waits 5
    110121 16:05:56 Error in foreign key constraint of table table3:
       FOREIGN KEY (`author_id`)
       REFERENCES `users` (`id`)
    You have defined a SET NULL condition though some of the
    columns are defined as NOT NULL.

    Yes, as you can see, the last sentence is what I want. Actually, I define author_id can not be null. Good luck to you.

    Cross Lanaguage Service Development

    Thrift is a software framework for scalable cross-language services development. Now it supports C++, C#, Erlang, Haskell, Java, Objective C/Cocoa, OCaml, Perl, PHP, Python, Ruby, and Squeak.

    So basically, what you need to do is define a .thrift definition file. And Thrift will help you generate the RPC client and RPC server. There are a lot of examples in this wiki page.

    You can use this demo to understand how it works.

    There is a similar project called Protocol Buffers which provided by google.

    This is what I have done in todays' Innovation Day. And I totall agree:

    Individuals and interactions over processes and tools

    We know people is the kernel in a technical company. Anything is beneficial for people should be encouraged. But of course, it also should be beneficial for company firstly. Any rule that restrict people's mind should be removed.