Your problem(s)
You've stated your main problem as
when one of the developer runs a non safe code (lets assume it is a php code) like running a infinite loop! the server completely crashes for all and im not able to identify who is the the faulty developer!
Because development code can always crash or hang, you are effectively dealing with two different problems:
- Processes of different users should not interfere with each other and not take all resources, starving others or even crashing the server.
- Identifying the user who "did it".
If you solve the second problem, you still need to solve the first; but if you solve the first, you don't need to solve the second one anymore, so I focus on solving the first here (also, solving problems without confronting people about bad things is always easier and gets less confrontation).
Possible solutions
You have three main paths to choose from when isolating user actions from one another:
- Processes & file system: The simplest and oldest way of separation is to give resources to each different user on a multi-user UNIX system.
- uptime365 did already give suggestions for
ulimit
, I just want to add that you have to make sure that your applications also use those users (instead of generic daemon users for each service) and that they do not want to start processes with other user accounts. Because of this downside this approach is seldom used today, except for simple things like build scripts without a continuous delivery infrastructure.
- File system quotas are more useful today, as storage is still a primary resource and it works with nearly all applications.
- Applications: Most of the software you access already has built-in user and quota management, but you need to manage/configure it separately. For example, the database can set quotas and permissions on each user and limit what he can do. The downside besides different configuration (which could be unified by scripts or applications) is that not all software is equally well equipped. For example, GitLab seems to still not support disk quotas. You will have to evaluate your error causes and see if the given features are good enough for your case.
- Containers & VMs: Leveraging virtualization (either full virtualization like KVM or ESXi; or containerization like Solaris Zones, FreeBSD Jails, Linux LXC, or application-based like Docker et al.) allows you to present each user with a full virtual system where he can do anything you allow him to, but limit his resources so that he does not disturb the other users. It is the most advanced form of segregation, because essentially each user has his own machine and does not interfere at all (if configured correctly), so you also have different configuration and runtime options, different networking, different hard drives and so on. The downside of this solution is that it has the highest overhead in configuration and resource usage of those three.
Of course, combinations are always possible, as the approaches work on different layers. For example, you could use containers so that everyone has his own webserver, but let all users use the same database to save resources (database access is faster than with several different databases), limited by database quotas.
What do do in your case?
As you see, the possibilities are many and each of them is considerable effort to implement, so you need to narrow it down:
- First analyze what problems you are encountering exactly. Take a look at your process (or if you don't have one, ask your colleagues how they do their work) and identify the problems that arise (disk space runs out when running test cases, endless loops reserve too much memory, web server crashes because of uncaught exceptions or bad architecture, etc.).
- After you know what you are up against, think about how to mitigate the problems while keeping it simple. Configuration takes roughly the same time for 10 PCs as for 100, so you should first try the simple stuff (most of the time, that is already enough). Disk space can be limited by user quotas while endless loops as well as memory over-consumption can be caught by unit tests prematurely (executed as local user instead of on web server).
- If this is not enough, you can think about implementing bigger and better solutions, like containers. Initially, you will lose time, but in the long run it may pay off - depending on your specific situation, of course.