How does Jami (formerly Ring.cx) really work, and how secure is it?

Question

Jami calls itself "ultimate privacy and control for your voice, video and chat communications". But forums online mentioned in passing (little depth) that it uses bad cryptography protocols and has messy source code. What exactly is insecure about its architecture and why are those aspects issues?

It does not look like they offer perfect forward security for either voice or chat conversations. ... They are not using a protocol like OTR or Axolotl that is designed for real time communications. ... Their description of their encryption confirms my concerns. They are not using a protocol like ZRTP to prevent MITM attacks. (link)

... other software, such as WhatsApp, Ring, ChatSecure, and Signal are not secure than you think. ... "Ring" is using SIP protocol, which is very old and not P2P friendly. They claimed it as a "decentralized" software, and it's bulls***. (link)

Ring, on the other hand seems to want to use "established standards", which are full of loopholes that strip out security.

This does not reflect reality of secure protocols, and actual security experts typically consider "DIY security" dangerous and rely on established primitives or even protocols. (link)

Greetings, Ring developer here. Ring is a distributed communication platform, its nodes are part of a DHT network, so their IP is indeed exposed. (link)

They support crypto that is considered to be broken. https://en.wikipedia.org/wiki/POODLE Also they're patching their crypto lib – there's something deeply wrong if you have to patch the crypto lib you're using.. And no, it's not about patches itself, but about "how broken is your crypto lib, so that you have to patch it?" – ring-daemon @ 8ca874d790be92649187aabcb55fa998dae045df I'd look more into stuff, but the more I looked the worse it got. Not comparable to Tox at all.

(…) it takes a slightly different approach from Tox's by building not only on established cryptographic libraries, but also on established protocols. If that's enough (as it's being made to appear) to make it unsound security-wise... well, to put it simply, it is not enough.

Isn't it build on top of broken stuff? Namely, TLS/SSL. ... It's a matter of complexity – the more complex something is to work with, the more bugs there will be as a result. Using something as complex as TLS/SSL really helps to have buggy crypto code. IMO anything using SSL/TLS is going to be broken, be it sooner or later. One of reasons why Tox uses NaCl/libsodium. (link)

`SIP protocol, which is very old and not P2P friendly` - SIP is by definition a P2P protocol so I would take at least some of those comments with a pinch of salt. — msam, Dec 05 '16 at 12:37

score 21 · Answer 1 · edited Jun 16 '20 at 09:49

As a Jami (formerly Ring) developer I will try to answer this. Of course this answer is inevitably biased and incomplete, but it might answer some concerns or misunderstandings about how Jami works, and help to understand its architecture, and the limits of this architecture.

Building a secure distributed real-time communication platform is often a trade-off between ease of use, performance and privacy. Jami' security model is not perfect and will likely evolve, and we are open to comments, suggestions, and criticism. I will try to edit this answer to add more details if requested.

It does not look like they offer perfect forward security for either voice or chat conversations. ... They are not using a protocol like OTR or Axolotl that is designed for real time communications. ... Their description of their encryption confirms my concerns. They are not using a protocol like ZRTP to prevent MITM attacks.

Jami establishes an authenticated peer-to-peer TLS 1.3 session (using GnuTLS) with PFS cypher suites enforced, typically (TLS1.3)-(ECDHE-SECP384R1)-(RSA-PSS-RSAE-SHA384)-(AES-256-GCM) is used.

This P2P authenticated TLS channel is used for SIP signaling, and temporary SRTP media encryption keys are negotiated with SIP on this channel. This means that real-time communications on Jami (audio and video) are PFS and end-to-end encrypted. However Jami doesn't offer other guarantees like repudiation.

In "classic" (server-based) SIP, negotiating media keys in SIP is an issue because SIP servers can see those keys in plain-text, breaking end-to-end encryption and allowing MITM attacks. That's what ZRTP solves, by performing DH key exchange on the RTP (media) channel. Using SIP on an authenticated P2P channel makes using ZRTP unnecessary.

"Ring" is using SIP protocol, which is very old and not P2P friendly.

The SIP protocol can be used for P2P just fine, in a similar way that one can use HTTP for P2P communications, and it's an established and robust telephony protocol, even if it's not perfect and some people prefer XMPP or others.

They claimed it as a "decentralized" software, and it's bulls***.

Jami is fully distributed, which is stronger than decentralized, by using OpenDHT (a Kademlia distributed hash table similar to what's used by Bittorrent) to find contact IP addresses, which are stored on the DHT in the form of encrypted ICE candidates, used to establish the authenticated TLS P2P channel. Usernames (name<->public key mappings) are registered on a distributed blockchain (Ethereum contract). To my knowledge, no other real-time communication system provides that level of distribution.

Ring developer here. Ring is a distributed communication platform, its nodes are part of a DHT network, so their IP are indeed exposed.

That's a downside of using a distributed network: DHT nodes IP addresses are exposed on the distributed network, which is a valid privacy concern. The current design doesn't allow to cryptographically link DHT nodes with Jami public keys and we work to make this separation as strong as possible.

Also they're patching their crypto lib – there's something deeply wrong if you have to patch the crypto lib you're using..

Isn't it build on top of broken stuff? Namely, TLS/SSL. ... It's a matter of complexity – the more complex something is to work with, the more bugs there will be as a result. Using something as complex as TLS/SSL really helps to have buggy crypto code. IMO anything using SSL/TLS is going to be broken, be it sooner or later. One of reasons why Tox uses NaCl/libsodium.

(comment above made by a Tox developer)

We don't "patch our crypto lib"; when a bug or vulnerability is found, the library is updated.

We live in the real world where no crypto library (and no library in general) is perfect. With time, every crypto ends up being broken - that's not a prediction but an observation.

TLS complexity is a downside, however an advantage is the ability to negotiate cyphersuites, so when a cyphersuite begins to be considered weak, transition to new cyphersuites can be done gracefully, without breaking existing implementations.

Libsodium (NaCL) is a great, high quality library but is not directly comparable to TLS. NaCL provides elementary crypto building blocks, while TLS is a protocol that performs the whole thing: authentication, key exchange, encryption etc. in a way that is standard and very well reviewed.

When building Jami we tried to not reinvent the wheel as much as possible, to avoid the risk of adding vulnerabilities. Instead of building our own protocol using libsodium, we preferred to rely on TLS, and focused on using it the best possible way.

score 2 · Answer 2 · edited Jun 16 '20 at 09:49

aberaud said:

Ring establishes an authenticated peer-to-peer DTLS 1.2 session (using GnuTLS) with PFS cypher suites enforced, typically TLS_ECDHE_RSA_AES_256_GCM_SHA384 is used.

This P2P authenticated TLS channel is used for SIP signaling, and temporary SRTP media encryption keys are negotiated with SIP on this channel. This means that real-time communications on Ring (audio and video) are PFS and end-to-end encrypted. However Ring doesn't offer other guarantees like non-repudiation.

The part "communications on Ring (audio and video) are PFS" is true depending on what you say is PFS. From a conversation to a subsequent one, i.e. between two subsequent connection establishments (key exchanges), communications are indeed perfect forward secure AND backward secure. However, during a same conversation where multiple messages are exchanged, each subsequent messages are encrypted using a same key.

Therefore, the messages in a same conversation are not exchanged in a PFS nor backward secure manner.

To achieve PFS and BS communication during a same conversation, something like OTR, Axolotl should indeed be used, but this is not directly best suited with audio/video, but rather more for text messages. If we restrict the question to audio/video encryption, may be SCIMP (one of the ratchets from Axolotl) can be used to make communications PFS (but not BS) during a same conversation. Regarding the exchange of messages in a group, Ring should use something like GOTR, mpOTR or even Axolotl with 1-to-1 secure channels between everyone. Although, Axolotl doesn't manage any group concerns by itself and GOTR has been designed to outmatch mpOTR. Therefore, GOTR might be best suited. It is worth mentionning that the Ring developper team is actually looking into those situations since the beginning of the year.

score 1 · Answer 3 · answered Oct 17 '17 at 05:52

Here is a bit more of a reason why ring.cx is not particularly secure (in fact it barely functions properly).

Security in large part is about attack vectors, and reducing the number into your system. You will never reduce that to zero, provably so because you always must trust something, and the trust in that can be misplaced (the other user you are connecting to forwards everything you say from their speakers to somebody else, e.g.), but the idea with security is to say that your vectors are small and few, versus large and many, and to be able to prepare specific countermeasures against those you know about (and perhaps those you don't).

The reason people are saying that the messy source code is a problem, is two fold.

Where there is a lack of order it is very easy to hide bugs, and any bug is a potential exploitation. Of course no program, even one that looks nice, is entirely bug free (unless somehow proven otherwise by a mathematical sound security audit), but one which is messy is likely to have very many bugs, null pointers, etc., in it.
- The team is still struggling to make the program clients function correctly and the program is still regularly crashing, experiencing errors, and lacking functionality.
- A cursory glance at the source code shows no tests for any of the clients.
- Most of the code is written in C Plus Plus - it is not that this language cannot be secured, but this is a language notorious for stack overflows, pointer errors, memory overflows, etc. There are better languages such as Rust or even Go, which offer more basic protections against these sort of problems.
- The style of the code is drastically different between client projects. I am not saying that styles can't be different, but this betrays the fact that very different developers with different skills and coding standards are contributing, and there is no competent gate-keeper to say 'no, this code doesn't meet our standards'
- The build system is overly complicated and full of potential issues. While you may have a particular feeling about one or the other build system, this one utilizes up to 5 or 6 - python, cmake, autotools, bash, g++, VC++. This means it would be easier to inject an illegitimate dependency into the build itself, and increases the chance of project tainting, purely by the numbers.
- It pulls in a large number of external dependencies, which may themselves be of questionable quality. This makes sense, but it also means since they haven't developed and tested their own libraries for e.g. video decoding/encoding, their quality and security is only as good as their third party libraries. Many of the past linux security vulnerabilities have had to with obscure libraries with an obscure ability having an exploitable bug that turned an otherwise secure system into an insecure POC (Pile Of Crap).
Fragmented development without clear direction, or instruction
- The client is not one client, but 6. There are 2 Windows clients, 2 mobile clients, 1 linux client (although there may be 2, confusingly the site states there is a gnome and a kde linux client), and one Electron/Web client. There is no common code between these clients (disregarding ring-lrc and ring-daemon), and while the team argues this is a better 'design' for not constraining any particular platform, it means they cannot make as a good a security guarantee about their platform as if they supported 3 clients, e.g. android, IOS, desktop.
- They are making use of multiple non-complementary libraries within their code. Within the same linux application, they are using GTK + QT, both of which offer the same functionality and are independent toolkits with no supported compatability. This is a major potential source of bugs. There isn't a lot to be done about the fundamental incompatibility between Android and IOS (one could argue only having one hardware mobile platform would be more secure - if properly designed of course). The Windows clients are one part QT + GTK application, and a Universal Windows Platform client.
- At the time of this writing, there is only a wiki page describing how to build the project. There is no direction on even a basic general architecture of how the clients should talk to the client lib, or the daemon, or how bugs should be fixed or new functionality implemented. This means it will be very difficult for any new developer to make not just a sensible decision about how to implement something, but also a decision that will end up being a secure decision.

From a networking/security implementation algorithm perspective, some of the libraries they are using may indeed be insecure, out of date, or just plain broken. The above should be a fairly convincing library that the harder work, the network security level work, is no better, and probably in a worse state of affairs, than the client UI layer.

Some of the ideas, such as DHT or double rachet algorithms may be sound security ideas, and may even have secure primitives available, but this is why I would not consider this application secure in the least. If I cannot count on the client doing the correct thing, who is to say it won't be hijacked by some un-privileged user process on the desktop, and any valid security guarantee made by the networking lib is then gone? Or if they are using an old, insecure networking library, that there isn't a way to turn on some feature and thereby defeat the use of the secure algorithm?

All this being said, it is an interesting project if you want to just explore multimedia on many platforms, with a focus on security algorithms as a sort of 'research' project, but is it securely implemented? Almost assuredly not.

How does Jami (formerly Ring.cx) really work, and how secure is it?

3 Answers3