7

I often use SQLite for creating simple programs in companies. The database is placed on a file server. This works fine as long as there are not more than about 50 users working towards the database concurrently (though depending on whether it is reads or writes). Once there are more than this, they will notice a slowdown if there are a lot of concurrent writing on the server as lots of time is spent on locks, and there is nothing like a cache as there is no database server.

The advantage of not needing a database server is that the time to set up something like a company Wiki or similar can be reduced from several months to just days. It often takes several months because some IT-department needs to order the server and it needs to conform with the company policies and security rules and it needs to be placed on the outsourced server hosting facility, which screws up and places it in the wrong localtion etc. etc.

Therefore, I thought of an idea to create a distributed database server. The process would be as follows: A user on a company computer edits something on a Wiki page (which uses this database as its backend), to do this he reads a file on the local harddisk stating the ip-address of the last desktop computer to be a database server. He then tries to contact this computer directly via TCP/IP. If it does not answer, then he will read a file on the file server stating the ip-address of the last desktop computer to be a database server. If this server does not answer either, his own desktop computer will become the database server and register its ip-address in the same file. The SQL update statement can then be executed, and other desktop computers can connect to his directly.

The point with this architecture is that, the higher load, the better it will function, as each desktop computer will always know the ip-address of the database server. Also, using this setup, I believe that a database placed on a fileserver could serve hundreds of desktop computers instead of the current 50 or so. I also do not believe that the load on the single desktop computer, which has become database server will ever be noticable, as there will be no hard disk operations on this desktop, only on the file server.

Is this idea feasible? Does it already exist? What kind of database could support such an architecture?

Edit: I should point out that this idea is not pretty, stabile, best practice, or something I would really be proud of. The reason why I am still interested in the feasibility is that some of my clients are banks, and the bureaucracy involved with gaining access to a database is enormous. Often the project sponsor on such projects needs to be above Vice President level, due to their extreme security concerns with gaining access to servers. Needless to say, this means that there is a lot of work for setting up a Wiki. Later if the Wiki proves to be successful, it should, of course, be migrated onto a proper database server.

Edit2: The reason for this idea is to reduce the risk of Writer Starvation when using SQLite when the database is placed on the file server. This problem is described in section 5.1 here. Utilizing a desktop computer to have a cache of the most accessed information (i.e. Wiki pages), would mean that the work load on the file server would be reduced dramatically. This again should improve the user experience. Do you really think that I am still way off with this idea?

David
  • 437
  • 1
  • 5
  • 11
  • 5
    Someone's jumped in their DeLorean and gone back to the 80's! :) – Chopper3 Mar 05 '11 at 10:37
  • :)))))))))))))) – David Mar 05 '11 at 10:52
  • Regarding your last question, yes, you're still way off base. – EEAA Mar 05 '11 at 11:16
  • @ErikA Ok. Unless I get some overwhelming support stating that it is a brilliant idea within a couple of days, I will accept an answer, which states that it is not feasible. – David Mar 05 '11 at 11:27
  • 1
    @David - you will not get an answer affirming your "plan". Or rather, if someone does give you that answer, it'll be downvoted to oblivion. Your end goals are sound, but your solution is poorly thought-out and untenable. – EEAA Mar 05 '11 at 11:32
  • 1
    @david, if nothing else, give up the idea of using SQLite in a client/server system. It's just not designed for it. It's intended usage is for embedded applications. – John Gardeniers Mar 05 '11 at 11:39
  • @ErikA. With three highly upvoted answers from highly reputed users, stating in polite ways that this is plain stupid, I did understand my odds. Guess my Scandinavian self-irony was not obvious enough. Thank you for your time! – David Mar 05 '11 at 11:46
  • Depending on your application, it sure seems like you could come up with some way to use a DVCS system (e.g. git). You would probably have to version at the record level. – Zoredache Mar 05 '11 at 11:57
  • Errr… I think you're asking for Lotus Domino. :) – MikeyB Oct 18 '11 at 00:29

8 Answers8

12

Is this idea feasible?

No.

Does it already exist?

Not that I know of.

What kind of database could support such an architecture?

See above.

Honestly, this is a really bad idea on many levels. There's a reason companies keep critical data within the datacenter. You don't want business applications to be dependent on X number of desktop machines to be up and running. Another issue would be firewalls - in all but small environments, there would be no guarantees that Desktop X would be able to communicate with Desktop Y, and good luck getting that firewall change past your network team.

Is there any reason your company doesn't have a central well-maintained database server that this app can use? There's no reason a company wiki should need its own database server.

EEAA
  • 108,414
  • 18
  • 172
  • 242
  • I have never had problems with firewalls blocking internal traffic, at least on the ports I have tried, even in very paranoid companies. – David Mar 05 '11 at 10:56
  • Sure it exists! It's in a slightly different form though, but all the ideas are the same. Lotus Domino. – MikeyB Oct 18 '11 at 02:34
8

This question isn't related to system administration but when I read it so many warning alarms went off that I just have to answer.

I really have to tell you that your entire concept is so far off the mark that you won't find anyone else doing it. For starters, SQLite is unsuited to such jobs and the fact that you've had some success with it is more due to good luck that anything else.

Your plan has so many holes in it I really don't know where to start but I will tell you that it will be an overly complex system that will prove to be incredibly unreliable and poor performing.

Your comment

time to set up something like a company Wiki or similar can be reduced from several months to just days

Tells me much. To set up a wiki normally takes only minutes and any decent wiki system will have aides to speed up the importing of data from other systems.

I suggest you abandon your current design ideas and have a look at how such things are being done by others. Use any one of the common wiki systems (I prefer MediaWiki) with a regular database system (MySQL being very popular) and you'll not only save massive amounts of time but you'll end up with a system that is both more usable and more robust, plus much cheaper to implement.

In short, stop trying to reinvent the wheel, because your current design is going to end up more like a square with a hole roughly in the middle.

John Gardeniers
  • 27,262
  • 12
  • 53
  • 108
4

You could actually build a good distributed database environment if you partition (or target) your reads and writes at different databases. We do such work, and the trick is very simple. You have the master database on a file server and target all writes to it. You have a local copy of the database on every user's computer and you target the reads to it. You now also need a synchronizing mechanism between the master database and the local databases. This can be done in multiple ways. One way is to have a "delta" table in the master database. This delta table will contain the transactions that have been applied in the master database. Whenever the user's application performs a read or write operation, the delta on the master is first checked and updated locally. Only the transactions in the delta not yet applied (which can be checked based on time stamp) need to be applied. You could even have a background process doing this continuously. This delta could be a daily delta (or a weekly delta) when it is flushed. If a user has not logged on for a week or so, you just simply copy over the whole database to the user's computer. The advantage of having a local copy is that users can query stuff even when they are offline and - believe it or not - this is pretty fast even when you are online updating stuff.

Kirikera
  • 56
  • 1
3

As mentioned, this question is outside the scope of systems administration. That said, distributed databases and distributed data stores are being used in some very recognizable places. While SQLite's strengths do not generally lend itself to this type if application, it is not unheard of. Look, for example, at the Fossil project. Even though this is a distributed source control system based on SQLite, it also provides a distributed wiki and a blogging application, and might actually do the trick for you. While you should probably look beyond SQLite, this does not mean you need to abandon open-source. Consider implementing your project in Apache CouchDB or a Hadoop-based data store An even more novel approach is to create applications in a distributed user-space virtual environment like Inferno.

2

Your description sounds a lot like what POS (Point of Sale) systems use. One master terminal is declared on startup that does the database processing. A copy of the database is synced between the master and all the slave terminals for backup.

If the master was to fail, all the other terminals pop up a message saying "Make me the new master?". You press yes and everything continues on. This could continue until there was one terminal standing.

It works, and is sort of idiot proof, but having a corrupted database at the end of the day is common. Luckily, the terminals only store that days sales, so your daily totals may be off a little as some order didn't get saved right. This is preferred over the system going down for a few hours and loosing sales.

In a big network/power outage, the end of day cleanup is what over-time is for as the current days sales can get spreads across several different terminals and you have to sort it all out. I am glad I no longer do that work.

Stick to one big database server with good backups.

Porch
  • 680
  • 5
  • 12
  • Really good info, about a scenario where this idea is actually a good fit for the business needs, but requires cleanup when it has to be invoked. – mfinni Mar 05 '11 at 21:13
  • That might be how *some* POS systems operate but those will also be in the stores where they have no end of trouble with terminals going off-line. In general the terminals communicate directly with a server, not through another terminal. – John Gardeniers Mar 05 '11 at 21:54
  • Every POS system does it different and I have seen some real hack jobs. I was referring to how Aloha brand POS systems work. With the right hardware and lots of good PM, we had a very low terminal fail rate. Most of the time is was the touch screen going bad. When you put a computer in a hot and greasy kitchen, it's not going to live a long life. – Porch Mar 05 '11 at 22:05
  • I wasn't talking about the actual failure rate of the terminals but rather how frequently they are off-line. It's a natural consequence of daisy chaining through other terminals. – John Gardeniers Mar 06 '11 at 08:29
  • 1
    Doesn't @JohnGardeniers statement about no end of trouble going offline not get more validated by the statement that computers in hot, greasy kitchens not live long? For that reason alone I'd hesitate to design a system that relies on keeping data on said terminals but rather sync all data, as much as possible, on a centralize server in a safe location where it can be backed up and considered authoritative. But that's me... – Bart Silverstrim Oct 18 '11 at 10:56
1

It's not altogether clear from your question where the data ultimately resides? Does it live on the centralized fileserver? If so, moving the database engine around a multitude of desktops while using the centralized fileserver as a disk store will probably not buy you much performance. If anything, the remoteness of the disk from the engine will probably cause it to run worse, if at all.

If the data is not centralized, data consistency will be a problem if you've got multiple desktops all containing different bits of data.

Similar problems exist around database configuration and security, none of which is trivial. And finally, running a database server on a desktop machine serving 100+ active remote users will have a noticable affect on that desktop's performance.

growse
  • 7,830
  • 11
  • 72
  • 114
  • The data would be centralized on the file server. I cannot see how this could cause worse performance than just placing an SQLite database on the file server as I believer disk performance would be the bottle neck. – David Mar 04 '11 at 23:30
  • You would get worse peformance because there's more steps to go through to get access to the data. With SQLite, the client simply requests the data from the fileserver. In your proposal, the client has to ask the fileserver where the database is, connect to the database, request data from the database 'server', this then pulls the data from the fileserver before routing it back to the client. A lot more steps, a lot more latency, a lot less performance. – growse Mar 04 '11 at 23:35
1

Have you seen http://litereplica.io/ ? They have a sqlite3 driver for nodejs and it seems fairly well architected.

0

I have Recently completed developed of a distributed database layer for a SOA/ESB/RESTful middleware framework that needed to be proprietary without the reliance of database infrastructure, built in c# with a wrapper for SQLite.

My database layer operates as a cluster of nodes, comprising of witness nodes (master and failover), data write/commit nodes (again master and failover) and replication nodes which basically stores replicated data.

On write operations, the selected write node generates unique id's and foreign keys which the location of successful data writes to nodes are indexed. this ensures replicated data maintain the same Ids and foreign keys. There are threaded/parallel processes that maintains replication. Foreign keys don't get strictly enforced but do work.

I also wrote wrote a client wrapper for this data layer that provided client connection string failover between witnesses.

So far testing and Benchmarking seems to prove concept. I've tested with different sizes of data and it seems to handle well. Obviously as my database layer is designed to be Restful middleware it's speed is less important than high availability. Further more, the requirements for the structure of your data is the primary factor in whether this approach will work or not.

My next revision will be to see if I can distribute retrieval of large datasets across replicated nodes as the dataset is streamed to the client framework, kind of a data grid with json idea.

Marty
  • 32
  • 2
  • Wow, so much work... Just to reproduce what good RDBMSs already do. I can't imagine why someone thought this was a good idea in the first place, but I'm sure it has something to do with bureaucracy, politics, or incompetence. I'll bet maintaining such a spaghetti mess is even worse than writing it in the first place. – Chris S Oct 18 '11 at 12:56
  • I chose SQLite because it's lightweight and is allot quicker than mysql and progresssql because it has hardly any overhead. As I said, most of my datasets are small, as I use it for ESB messaging middleware and needed to include distribution to aid failover. – Marty Oct 18 '11 at 19:10