There are endless debates whether it is better to store all of your logs in your data lake (skeptics call it the grave :-) ) or keep only those that are relevant for operation or business analytics. In either case there are many benefits of using syslog-ng as a data collection, processing and filtering tool in a Hadoop environment. A single application can collect log and other data from many sources, which complement each other well. Processing of your data can be done close to the source in efficient C code, lessening the load on the processing side of your Hadoop infrastructure. And before storing your messages to HDFS, you can use filters to throw away irrelevant messages or just to route your messages to the right files.
Read more about it in my blog at https://czanik.blogs.balabit.com/2016/02/filling-your-data-lake-with-log-messages-the-syslog-ng-hadoop-hdfs-destination/