I have a user activity logger and querying system for an ISP with very high log events rate (5k-10K /second). It needs to relate both Radius/Session and NAT Syslogs based on a common InternalIP
field. Each session has two events Start and Stop. A roughly 24hr data produced by 3000 users, can be like 20 Million records, expected to go up.
My solution consists of 2 parsing and persisting agents for each log type, written in Golang with Postgresql db backend. I am experiencing several issues on both sides. The parsing and storage can't keep up with the high data rate even after (in-memory) buffering syslog events. For preserving space I have to group each session into one record and identify a NAT session user from other log, implemented through a trigger. Buffers take up system ram and eventually the process is killed. Writing to PGSQL is slow due to user identification and indexes on the table.
In order to re-visit my approach and I wanted to look for suggestions on how to improve performance. No matter what approach I take, I need to identify the NAT user from Radius Session logs before persisting this data to the database.