Portal Home > Knowledgebase > Articles Database > Setting up a distributed web app.


Setting up a distributed web app.




Posted by twikamltd, 02-03-2010, 02:37 PM
I'm setting up what is basically a web app, and initially need to start with 3 servers. However, let's say the app resides in /var/www/html. I need this replicated accross the other 2 servers (and more as they are added). I figured that the best way to do this is simply to share the folder, and mount it on the other 2 servers, short of going for a SAN type thing. Is there any way of caching here? The load balancing will be a simple round robin, it's not a critical app so should be fine. Are there any alternatives, and does this have any implications for speed? The files in question are small (images and css), but the main php script which receives over 99% of the hits will be cached in memory by eaccelerator on each server. The other thing was, I'm planning on running memcached on each server, with the first server also being the database server, so should cut down on most of the traffic to the db server, and just invalidate the cache when anything's updated. Files and database values won't change very often. Anything I've missed here?

Posted by Steve_Arm, 02-03-2010, 03:40 PM
Keep in mind that user session with dns round robin will not be consistent.

Posted by cygnusd, 02-03-2010, 04:02 PM
What are our requirements and/or goals? Is it high-availability? Is it scalability? Is it performance? These are 3 different things. How about you try deploying your app first. Then measure where the bottlenecks are. That way, you're in a better position to anticipate and deal with your performance / scalability / availability requirements. Premature optimization is the root of all evil. As for the architecture ... I'd recommend a Share-Nothing architecture; if possible e.g. 1. store session information in signed cookies instead of the usual file-based / database-backed store. That way, session info can be handled by any of your app servers. 2. spread your IO, spread reads, spread writes; if possible Again, what usage / performance are you targetting to achieve? e.g. are you anticipating thousands of hits per second? Or give us an idea of your architecture / topology / component-structure

Posted by twikamltd, 02-03-2010, 07:16 PM
Thank you for your replies, it's a very simple web app actually, the only session data it stores is in the cookies client side, so no worries there. It's basically a hosting type platform, we're mainly looking for scalability, and ease of maintenance, it's mainly php script output, but we need each server to have its ever changing copy of the files as users edit their templates, so our only real concern is how to do that reliably and cheaply without taking too great of a performance hit, hence thinking either NFS (does it cache), or an r-sync every 10 minutes, which is fine in terms of the app.

Posted by cygnusd, 02-04-2010, 02:29 AM
Tell me if I understood the problem right. You are trying to serve files (e.g. the changing templates) to users that might access these files via many servers. In short, you'd want some sort of "master copy for every file" OR a distributed-filesystem where any app server can access any file without conflicts. As for the distributed filesystem usage, I would recommend you abstract the file storage in your app. That is, do not let your app know that the filesystem could spread across many servers. Your app should essentially just manage a namespaces, and perhaps directories, filenames. Let this filesystem/filestore be another layer in your design. Now, for me IMO, I'd evaluate: 1.) NFS - if you have the experience, and you found it reliable (realibility should be first), then go for it, otherwise, I don't have any experience with this tech. 2.) XtreemFS - This is a userspace FUSE-based filesystem ala NFS (xtreemfs.org) and checkout their demo at youtube, seems reliable, auto-replicated and easily-mounted which would fit your usecase. 3.) MogileFS - Another non-traditional filesystem (danga.com/mogilefs), replicated, distributed, self-healing 4.) Rsync - perhaps the easiest to get started with, though a potential issue would be potential race condition during the period where not all servers have sync'd

Posted by twikamltd, 02-04-2010, 08:49 AM
Basically, imagine lots of domains pointing to the servers, each site works in the exact same way using the same files, but my 'customers' can change their sites templates, and text etc. The text is fine, all stored in SQL, it's the custom templates that users sometimes upload that's the problem, apart from those, nothing changes, and it's not that often. Hence why I'm thinking an r-sync might be the one to start with, as if all servers aren't synced, it just means for a few minutes the site users will see an old copy, which is fine for this. I think I'll go with r-sync, as it seems more secure too that the others and won't result in any performance hit in terms of the filesystem. I'll look into the other options you posted cygnusd for a future upgrade, cheers. Thanks for all your help guys!

Posted by jjk2, 02-04-2010, 08:42 PM
amazon s2......

Posted by mattle, 02-05-2010, 01:49 PM
You may not even have that much difficulty with the periods between syncs. When the user first requests your domain, the nameserver will respond and point you to an IP address for one of your machines. From that point forward (usually the duration of the user's login session) their OS should cache that IP address and not perform future nslookups. It's worth testing out, but it's been my experience that within the context of one browsing session with round robin DNS, you will continuously make requests to the same IP.



Was this answer helpful?

Add to Favourites Add to Favourites    Print this Article Print this Article

Also Read
assist (Views: 617)
Need advice (Views: 677)

Language: