I'm curious, in theory, how can one know if for example kernel that is distributed with Ubuntu Linux is really what is on https://github.com/torvalds/linux and not some modified kernel which contains tracking code etc...
Actually, what is distributed with your distribution usually isn't what's distributed via torvald's git tree. Your Linux distro is probably not vanilla kernel.
The reason for this is that distribution vendors usually backport fixes that may impact their customers from various sources. In Fedora, for example (these examples picked purely because I have recent experience custom building these packages):
- The kernel (currently
3.11.4-201.fc19.x86_64
at time of writing` contains a number of patches for v4l, iommu fixes, drm-intel-nex, iwl (intel wireless) drivers and so on.
- Grub2 contains a number of patches to resolve various (breaking) issues with its build, and patches to make EFI builds work.
On Fedora (and other yum-based systems) yumdownloader --source kernel
will pull down the source kernel RPM - open it up and you'll find everything used to build the kernel packages.
So, no, your Linux (and even your packages) might not be vanilla vendor packages.
Now, next:
The real question is, can you be sure that distributed open source software is really what you expect it to be?
Well, the point of the above was that your package database contents might not be what you expect them to be, but the question I guess is and could this have malicious content in it?.
The answer is that there is little way for you to tell as an end user unless you inspect the source and rebuild the package from that.
However, most package submission processes involve a little rigour. Usually, you can't just turn up and become a maintainer of a popular package. You would need to be known for contributions, approved by a staff member/committee member/whatever it is that gives you rights to push packages.
I can see this being a possible attack vector in obscure software, but major packages I think it is sufficiently likely that an attack would be detected eventually. Most packages are built from source and receive scrutiny due to bugs, user issues and so on. Development continues and packages are rebuilt. Patches may fail. Somebody, somewhere is likely to notice an oddity, basically.
On this basis, I think you have to make a risk based decision. Is what you intend to protect so valuable you believe highly capable and persistent adversaries are going to try to backdoor a package in a Linux distribution to get to you? If so, go ahead and build your own distribution (there is no reason you can't steal an existing distributor's packages and just carefully audit/compile your own, by the way - that's the beauty of open source).
If not, I would (personally) go with accepting the outside risk.