Free To Feel

Heading to entrepreneur.


Joshua Chi
Github

    PROXYPASS Evil

    You are forced to invent new solution for your business, otherwise you are going to kick your own ass. :-)

    Although the post is targed with name "proxypass evil", it is not true. I like proxypass, it hides(protects) backend services from outside, especially when we are practicing Microservices nowdays.

    The post here is explainning a new solution to implement your heavy user experience website, with light backend service and Aliyun services.

    Background

    Aliyun(China) service becomes more and more popular since more and more none-technical company transfer their website from their own hosting to Aliyun, the same as Amazon's.

    OSS could be used to store static resource.

    CDN could be used to speed up the resoure loading, by binding with your domain CNAME and uploading website file object to OSS.

    Context

    1. You are giving a Amazon EC2 T2 medium node. Website is full of images and videos for around 1GB.
    2. You are told to make the site loading speed with a duration which human could accept, let's say 30s[1], with an expect a high traffic coming in following days.
    3. There are some small backend services, which frontend needs to talk with.

    Let's compare two solutions:

    A. EC2 node to host site and all static resource requests are proxypassed to Aliyun OSS CDN;

    B. Aliyun OSS CDN to host site and it will talk to backend service hosted at EC2 node;

    If you choose A, you have to prepare a powerful EC2 node, no matter of bandwidth or CPU, considering your node need to accept all traffic and pass back and forth from OSS/CDN.

    If you find solution B, you are binding your website with Aliyun service. Binding, I mean, actually your website experience is provided by Aliyun. Only if Aliyun dies, your website is going to die. The only tricky part is if you have backend services need to be integrated, you have to provide CORS in your backend services, considering www.example.com has a CNAME record bound with CDN already.

    Is this a cool idea? For me, yes! I can have a good sleep without worrying about our marketing team traffic plan.

    Pains from EC2 or your own hosting.

    In my first day implementing solution A in a EC2 node(T2 medium), the site is really fast, let's say 10s. After one day, nothing changes, when I load speed test the website again, it loads for around 1m. Unbelievable that it is "NOT STABLE" for Aliyun CDN. After comparing with solution B, it is not the CDN fault.

    Amazon EC2 T2 instances has a feature named "CPU Credits"

    When a T2 instance uses fewer CPU resources than its base performance level allows (such as when it is idle), the unused CPU credits (or the difference between what was earned and what was spent) are stored in the credit balance for up to 24 hours, building CPU credits for bursting. When your T2 instance requires more CPU resources than its base performance level allows, it uses credits from the CPU credit balance to burst up to 100% utilization. The more credits your T2 instance has for CPU resources, the more time it can burst beyond its base performance level when more performance is needed.

    Something to prove it. EC2 T2 Left CPU Credits

    EC2 T2 CPU Credits

    2016.09.12 3:00 UTC - the time the website is really fast, comparing with following days(e.g. 09.13, 09.14), the CPU credits usage never exceeds 0.2.

    It is the same reason if you have a poor server hosting by yourself, unless you could prepare a powerful CPU and memory.

    REFs

    1. 30s - yes, since I joined this new company, this number is exciting all the creative guys. Sadly to say that we are using more and more cool technologies[e.g. WebGL] nowdays, but with slower and slower website load speed.

    NPM NodeJS Environment Setup Best Practice with MACOS

    NPM Installation

    Do not install NPM with homebrew

    Install it from official site

    • List all node modules installed globally:
        npm ls -g --depth=0
    
    • Delete global node_modules folder:
        sudo rm -rf /usr/local/lib/node_modules
    
    • Unintall Node
    brew uninstall node
    or
    sudo rm /usr/local/bin/node
    
    • Clean symbol link to node modules globally.
        cd  /usr/local/bin && ls -l | grep "../lib/node_modules/" | awk '{print $9}'| xargs rm
    
    • Install NPM
    curl -L https://www.npmjs.com/install.sh | sh
    

    Node Installation

    Install nvm

    curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.29.0/install.sh | bash
    

    Switch version

    nvm install stable #Install latest stable node
    nvm install 4.2.2 #Install 4.2.2 version
    nvm install 0.12.7 #Install 0.12.7 version
    nvm use 0 #Will switch to use v0.12.7
    

    Use a smart .npmrc

    By default, npm doesn't save installed dependencies to package.json (and you should always track your dependencies!).

    If you use the --save flag to auto-update package.json, npm installs the packages with a leading carat (^), putting your modules at risk of drifting to different versions.

    One solution is installing packages like this:

    $ npm install foobar --save --save-exact
    

    Even better, you can set these options in ~/.npmrc to update your defaults:

    $ npm config set save=true
    $ npm config set save-exact=true
    $ cat ~/.npmrc
    

    Now, npm install foobar will automatically add foobar to package.json and your dependencies won't drift between installs!

    If you prefer to keep flexible dependencies in package.json, but still need to lock down dependencies for production, you can alternatively build npm's shrinkwrap into your workflow. This takes a little more effort, but has the added benefit of preserving exact versions of nested dependencies.

    Dependencies version management

    *: Match any verion
    1.1.0: Exactly match the version
    ~1.1.0: >=1.1.0 && < 1.2.0
    ^1.1.0: >=1.1.0 && < 2.0.0
    

    nrm switch NPM soruce

    http://www.tuicool.com/articles/nYjqeu

    Install
    $ npm install -g nrm
    
    Usage

    List all available sources

    nrm ls                                                                                                                                    
    
    * npm ---- https://registry.npmjs.org/
    cnpm --- http://r.cnpmjs.org/
    taobao - http://registry.npm.taobao.org/
    eu ----- http://registry.npmjs.eu/
    au ----- http://registry.npmjs.org.au/
    sl ----- http://npm.strongloop.com/
    nj ----- https://registry.nodejitsu.com/
    

    * means source you are currently using

    Switch

    Switch to use the one from taobao

    nrm use taobao                                                                                                 
    

    Registry has been set to: http://registry.npm.taobao.org/

    References

    Mongodb Read-only Performance Production Tuning

    MongoDB VS Tokumx

    Check this post Tokumx VS mongodb read-only performance @stackoverflow firstly, which I have already described the context when comparing MongoDB and Tokumx read-only performance.

    TokuMX 2.0.0 Community Edition for MongoDB is still built on MongoDB 2.4 which doesn't have GEO 2dsphere index yet. So the post@stackoverflow might not be fair for Tokumx. Personally, I like tokumx.

    • storage compress
    • document level write lock
    • concurrency write performance
    • do not have to worry about emebeded document size change
    • ...

    A lot of those cool features you can find in TOKUMX™ BENCHMARK VS. MONGODB – HDD and Tokumx Documentation

    2d index VS 2dsphere index

    2d indexes support:

    • Calculations using flat geometry
    • Legacy coordinate pairs (i.e., geospatial points on a flat coordinate system)
    • Compound indexes with only one additional field, as a suffix of the 2d index field

    2dsphere indexes support:

    • Calculations on a sphere
    • GeoJSON objects and include backwards compatibility for legacy coordinate pairs
    • Compound indexes with scalar index fields (i.e. ascending or descending) as a prefix or suffix of the 2dsphere index field

    So basically if you have multiple fields need to be compound together, you need use 2dshpere index.

    Nothing is perfect, if you are doing sorting some fields, like id desc, you will have the [geo query with sort performance] issue(http://stackoverflow.com/questions/12908871/mongodb-geospatial-query-with-sort-performance-issues).

    geoWithin VS near

    So we choose 2dsphere index for our case. But performance matters if you are using geoWithin and near.

    db.collection.find(
       {
       $query: {
         geo:
         {
           $near :
            {
              $geometry: img.geo,
              $maxDistance: 100000
            }
         },
         gender: 3
       },
       $orderby: { '_pid' : -1 },
       $limit: 3,
       $skip: 1
      }
    );
    
     "cursor" : "S2NearCursor",
     "isMultiKey" : false,
     "n" : 54,
     "nscannedObjects" : 502,
     "nscanned" : 502,
     "nscannedObjectsAllPlans" : 502,
     "nscannedAllPlans" : 502,
     "scanAndOrder" : true,
     "indexOnly" : false,
     "nYields" : 4,
     "nChunkSkips" : 0,
     "millis" : 3,
     "indexBounds" : {
     },
    
    
    db.collection.find(
      {
        $query: {
          geo : {
            $geoWithin : {
              $centerSphere : [ img.geo.coordinates , 100/3959 ]
            }
          },
          gender:3
        },
        $orderby: { '_pid' : -1 },
        $limit: 3,
        $skip: 1
      }
    );
    
     "cursor" : "BtreeCursor geo_2dsphere_gender_1",
     "isMultiKey" : false,
     "n" : 159,
     "nscannedObjects" : 249,
     "nscanned" : 337,
     "nscannedObjectsAllPlans" : 249,
     "nscannedAllPlans" : 337,
     "scanAndOrder" : true,
     "indexOnly" : false,
     "nYields" : 3,
     "nChunkSkips" : 0,
     "millis" : 3,
     "indexBounds" : {
          "geo" : [],
          "gender" : [
               [
                    3,
                    3
               ]
          ]
    

    You can see they are using two different cursor S2NearCursor and BtreeCursor. In our case S2NearCursor works better than BtreeCursor.

    MongoDB and NUMA Hardware

    NUMA and Interleaved Memory affect MongoDB perofmrnace, you can find below in Production notes.

    • Running MongoDB on a system with Non-Uniform Access Memory (NUMA) can cause a number of operational problems, including slow performance for periods of time and high system process usage.

    • When running MongoDB servers and clients on NUMA hardware, you should configure a memory interleave policy so that the host behaves in a non-NUMA fashion. MongoDB checks NUMA settings on start up when deployed on Linux (since version 2.0) and Windows (since version 2.6) machines, and prints a warning if the NUMA configuration may degrade performance.

    • See The MySQL “swap insanity” problem and the effects of NUMA post, which describes the effects of NUMA on databases. This blog post addresses the impact of NUMA for MySQL, but the issues for MongoDB are similar. The post introduces NUMA and its goals, and illustrates how these goals are not compatible with production databases.

    Tunning client performance

    In the /etc/sysctl.conf file:

    • net.ipv4.tcp_tw_recycle = 1
    • net.ipv4.tcp_tw_reuse = 1
    • TCP_TW_RECYCLE Description: Enables fast recycling of TIME_WAIT sockets. Use with caution and ONLY in internal network where network connectivity speeds are “faster”.

    • TCP_TW_REUSE Description: Allows for reuse of sockets in TIME_WAIT state for new connections, only when it is safe from the network stack’s perspective.

    Some suggestions when you work with solr

    Background

    Since we deploy solr to production, it was running fine for first few days, and then some day it will be slow to response. It happened at least several times like this. We can start this by analyzing from this graphite screenshot.

    Solr Graphite screenshot

    The whole server strucuture:

    • We have three solr instances: one master and two slave, which managed by zookeeper
    • All client requests will be queued into queue.size.solr firstly

    Notice: we will not discuss the strucuture was designed correctly or not in this post. This blog will just focus on how to use solr itself.

    More info about this graphite

    • queue.size.solr was increased since solr.request.time.avg increased, which is obvious;
    • The increasing of solr.request.time.avg was not caused by solr.requests which is quite stable during Fri 12PM to Fri 8PM;
    • Please ignore the drop of system.memory.Memfree which is caused by a restart of solr instance on master(BTW, we stored index in RAM);

    A warning from production solr log

    PERFORMANCE WARNING: Overlapping onDeckSearchers=X

    You will find an explanation from solr wiki page:

    This warning means that at least one searcher hadn't yet finished warming in the background, when a commit was issued and another searcher started warming. This can not only eat up a lot of ram (as multiple on deck searches warm caches simultaneously) but it can can create a feedback cycle, since the more searchers warming in parallel means each searcher might take longer to warm.

    Typically the way to avoid this error is to either reduce the frequency of commits, or reduce the amount of warming a searcher does while it's on deck (by reducing the work in newSearcher listeners, and/or reducing the autowarmCount on your caches)

    See also the option in SolrConfig.xml.

    I want to add addtional information before we start analyzing. The warning was always there in production log. We saw around double size this warning type entries comparing with the ones day before.

    Too early conclusion

    From the graphite screenshot I thought it must be something wrong with solr itself. The things which we can control might only be solr configure and JVM. After playing all those two factors and several round tsung stress testings, I kept getting this warning. I failed to get rid of this waring, which put me back to find more articles about this issue.

    Conclusion

    It was very possible that we used it wrong which was obvious if I had paied more attention to ...to avoid this error is to either reduce the frequency of commits, or reduce the amount of warming a searcher does....

    • Solr is not for realtime search;
    • Solr is not for replacement of RDB(e.g. mysql);
    • Solr rely on much of RAM if you have a big index size;
    • Read(Search) and Write(Update) should be treated differently;

    So the solution could be batch the write requests.

    Solrj client provides ConcurrentUpdateSolrServer which contained a queue. So we can queued all requests firstly with commitWithin enabled.

    More tips about how to optimize your index performance if you also have the write issue:

    Visit Appspot.com From China

    The reason behind this blog is to show you how to make GAppProxy work again. In China, most of people rely on GAppProxy to access the 'outside world'. But appspot.com is blocked in China. So here is a simple tutorial to show you how to make it work.

    The idea is very simple: find the workable google IP and add it to hosts file.


    Step1:

    nslookup www.google.com
    

    You can find something like:

    nslookup www.google.com
    Server:   192.168.1.1
    Address:  192.168.1.1#53
    
    Non-authoritative answer:
    Name: www.google.com
    Address: 74.125.128.99
    Name: www.google.com
    Address: 74.125.128.105
    Name: www.google.com
    Address: 74.125.128.103
    Name: www.google.com
    Address: 74.125.128.147
    Name: www.google.com
    Address: 74.125.128.106
    Name: www.google.com
    Address: 74.125.128.104
    

    Basically you just need to try https://74.125.128.x/ to check which one is working. 'x' can be any number. If you can visit https://74.125.128.x/, continue to Step2.


    Step2:

    Modify /etc/hosts file and add one line:

    74.125.128.x $your_app_engine_id.appspot.com
    

    Replace $your_app_engine_id with your registered google app engine ID.

    Hope this will help you.

    Move From Godaddy DNS to Dnspod

    It's the first time that my domain can not be resolved for around 6 hours. I learned godaddy DNS was blocked in China sometimes.

    This is what I hate to work on a Chinese site. After several minutes struggle, I decide to move my dns from godaddy to dnspod.

    dnspod did really a nice work. No matter the user interfaces or the services. All of them meet my requirements.

    I don't want to repeat how to setup the dnspod. google search will tell you more about this.

    Now I will just wait for the new dns takes affect. Actaully dns was solved quickly in China, but not in JiangSu province in my testing. Good luck to my www.dachebang.com.

    Linode Facility Migration to Fremont

    After using linode host center at Tokyo for a while, I decided to migrate my node to Fremont.

    Linode provide you a speedtest page

    Here is my speed testing result. Those two tasks are started at the same time.

    100MB-tokyo.bin 1.2/100MB, 47 min left
    100MB-fremont.bin 66.5/100MB, 22 secs left
    

    But If I ping the speed test domain:

    $ ping speedtest.tokyo.linode.com
    PING speedtest.tokyo.linode.com (106.187.96.148): 56 data bytes
    64 bytes from 106.187.96.148: icmp_seq=0 ttl=51 time=82.885 ms
    64 bytes from 106.187.96.148: icmp_seq=1 ttl=51 time=77.757 ms
    64 bytes from 106.187.96.148: icmp_seq=2 ttl=51 time=77.919 ms
    
    $ ping speedtest.fremont.linode.com
    PING speedtest.fremont.linode.com (50.116.14.9): 56 data bytes
    64 bytes from 50.116.14.9: icmp_seq=0 ttl=51 time=146.268 ms
    Request timeout for icmp_seq 1
    64 bytes from 50.116.14.9: icmp_seq=2 ttl=51 time=146.403 ms
    64 bytes from 50.116.14.9: icmp_seq=3 ttl=51 time=145.409 ms
    64 bytes from 50.116.14.9: icmp_seq=4 ttl=51 time=146.396 ms
    64 bytes from 50.116.14.9: icmp_seq=5 ttl=51 time=148.467 ms
    

    As you can see the result is totaly different. So which one you can trust?

    The response from linode support:

    Individual packets take less time to travel between our Tokyo datacenter and your location than our Fremont datacenter, but that our Fremont datacenter is able to provide you with faster download speeds to your location. Which location is better for you depends on whether your use case likes decreased latency, or better download speeds.
    

    At last I decide to switch to Fremont anyhow. So Linode provide me a new ip address. The interesting thing comes. My ISP blocked this ip address for some reason. You can find the fail point by using either "traceroute" or MTR.

    So I asked for a new ip address and reboot my node.

    Don't forget to update your hosts file with new ip address to make your application work again. Good luck!