Using Grep-Like Commands for Non-Text Files

By

-

January 25, 2017

In the previous article, I showed how to use the grep command, which is great at finding text files that contain a string or pattern. The idea of directly searching in a “grep-like” way is so useful that there are additional commands to let you search right into PDF documents and handle XML files more naturally. Things do not stop there, as you could consider raw network traffic a collection of data that you want to “grep” for information, too.

Grep on PDF files

Packages are available in Debian and Fedora Linux for pdfgrep. For testing, I grabbed version 1.2 of the Open Document Format specification. Running the following command found the matches for “ruby” in the specification. Adding the -H option will print the filename for each match (just as the regular grep does). The -n option is slightly different to regular grep. In regular grep, -n prints the line number that matches; in pdfgrep, the -n option will instead show the page number.

$ pdfgrep ruby OpenDocument-v1.2.pdf 
  6.4 <text:ruby
     6.4.2 <text:ruby
     6.4.3 <text:ruby
 17.10 <style:ruby-properties>..................................................................................................
     19.874.30 <text:ruby
     19.874.31 <text:ruby-text>..................................................................................................
 20.303 style:layout-grid-ruby-below......................................................................................... 783
 20.304 style:layout-grid-ruby-height........................................................................................ 783
 20.341 style:ruby
 20.342 style:ruby

$ pdfgrep -Hn ruby OpenDocument-v1.2.pdf 
OpenDocument-v1.2.pdf:10:   6.4 <text:ruby
OpenDocument-v1.2.pdf:10:      6.4.2 <text:ruby
OpenDocument-v1.2.pdf:10:      6.4.3 <text:ruby
OpenDocument-v1.2.pdf:26:  17.10 <style:ruby

Many other command-line options in pdfgrep operate like the ones in regular grep. You can use -i to ignore the case when searching, -c to see just the number of times the pattern was found, -C to show a given amount of context around matches, -r/-R to recursively search, –include and –-exclude to limit recursive searches, and -m to stop searching a file after a given number of matches.

Because PDF files can be encrypted, pdfgrep also has the –password option to allow you to provide decryption keys. You might consider the –unac option to be somewhat logically grouped into the -i (case-insensitive) class of options. With this option, accents and ligatures are removed from both the search pattern and the content as it is considered. So the single character æ will be considered as “ae” instead. This makes it simpler to find things when typing at a console. Another interesting option in pdfgrep is -p, which shows the number of matches on a page.

Grep for XML

On Fedora Linux, you can dnf install xgrep to get access to the xgrep command. The first thing you might like to do with xgrep is search using -x to look for an XPath expression as shown below.

$ cat sample.xml 
<root>
 <sampledata>
   <foo>Foo Text</foo>
   <bar name="barname">Bar text</bar>
 </sampledata>
</root>

$ xgrep -x '//foo[contains(.,"Foo")]' sample.xml 
<!--         Start of node set (XPath: //foo[contains(.,"Foo")])                 -->
<!--         Node   0 in node set               -->

   <foo>Foo Text</foo>

<!--         End of node set                    -->

$ xgrep -x '//bar[@name="barname"]' sample.xml 
<!--         Start of node set (XPath: //bar[@name="barname"])                 -->
<!--         Node   0 in node set               -->

   <bar name="barname">Bar text</bar>

<!--         End of node set                    -->

The xgrep -s option lets you poke around in XML elements looking for a regular expression. This might work slightly differently from what you expect at the start. The format for the pattern is to pick the element you are interested in and then use one or more subelement/regex/ expressions to limit the matches.

The example below will always print an entire sampledata element, and we limit the search to only those with a bar subelement that matches the ‘Bar’ regular expression. I didn’t find a way to pick off just the bar element, so it seems you are always looking for a specific XML element and limiting the results based on matching the subelements.

$ xgrep -s 'sampledata:bar/Bar/' sample.xml 
<!--         Start of node set (Search: sampledata:bar/Bar/)                 -->
<!--         Node   0 in node set               -->

 <sampledata>
   <foo>Foo Text</foo>
   <bar name="barname">Bar text</bar>
 </sampledata>

<!--         End of node set                    -->

As you can see from this example, xgrep is more about finding matching structure in an XML document. As such it doesn’t implement many of the normal grep command line options. There is also no support for file system recursion built into xgrep, so you have to combine with the find command as shown in the previous article if you want to dig around.

grep your network with ngrep

The ngrep project provides many of the features of grep but works directly on network traffic instead of files. Just running ngrep with the pattern you are after will sift through network packets until something matching is seen, and then you will get a message showing the network packet that matched and the hosts and ports that were communicating. Unless you have specially set up network permissions, you will likely have to run ngrep as the root user to get full access to raw network traffic.

# ngrep foobar
interface: eth1 (192.168.10.0/255.255.255.0)
filter: ((ip || ip6) || (vlan && (ip || ip6)))
match: foobar
###########...######
T 192.168.10.2:738 -> 192.168.10.77:2049 [AP]
.......2...........foobar.. 
...
################################################################
464 received, 0 dropped

Note that the pattern is an extended regular expression, not just a string. So you could find many types of foo using for example fooba[rz].

Similar to regular grep, ngrep supports (-i) for case insensitive search, (-w) for whole word matching, (-v) to invert the result, only showing packets that do not match your pattern, and (-n) to match only a given number of packets before exiting.

The ngrep tool also supports options to timestamp the matching packets with -t to print a timestamp when a match occurs, or -T to show the time delta between matches. You can also shut down connections using the -K option to kill TCP connections that match your pattern. If you are looking at low-level packets you might like to use -x to dump the packet as hexadecimal.

A very handy use for ngrep is to ensure that the network communications that you think are secure really have any security to them at all. This can easily be the case with apps on a phone. If you start ngrep with a reasonably rare string like “mysecretsarehere” and then send that same string in the app, you shouldn’t see it being found by ngrep. Just because you can’t see it in ngrep doesn’t mean the app or communication is secure, but at least there is something being done by the app to try to protect your data that is sent over the Internet.

grep your mail

While the name might be a little misleading, mboxgrep can search mbox files and maildir folders. I found that I had to use the -m option to tell mboxgrep that I wanted to look inside a maildir folder instead of a mailbox. The following command will directly search for matching messages in a maildir folder.

$ cd ~/mail/.Software
$ mboxgrep -m maildir "open source program delight" .

The mboxgrep tool has options to search only the header or body of emails, and you can choose between fcntl, flock, or no file locking during the search depending on your needs. mboxgrep can also recurse into your mail, handy if you have many subfolders you would like to search.

Wrap up

The grep tool has been a go-to tool for finding text files for decades. There are now a growing collection of grep-like tools that allow you to use the same syntax to find other matching things. Those things might be specific file formats — like XML and PDF — or you might consider searching network traffic for interesting events.

Learn more about Linux through the free “Introduction to Linux” course from The Linux Foundation and edX.

Q&A with Arpit Joshipura, Head of Networking for The Linux Foundation

By

SDx Central

-

January 25, 2017

Arpit Joshipura became the Linux Foundation’s new general manager for networking and orchestration in December 2016. He’s tasked with a pretty tall order. He needs to harmonize all the different Linux Foundation open source groups that are working on aspects of network virtualization.

Joshipura may be the right person for the job as his 30 years of experience is broad — ranging from engineering, to management, to chief marketing officer (CMO) roles. Most recently he was VP of marketing with Prevoty, an application security company. Prior to that he served as VP of marketing at Dell after the company acquired Force10 Networks, where he had been CMO.

Kubernetes Logging As Easy As 1..2..3

By

Wercker

-

January 25, 2017

If you’re considering or have already started using Kubernetes, one thing you absolutely cannot get around is having proper logging throughout your system to debug your applications in case of unexpected errors.

At Wercker, we use 3 technologies to set up our logging infrastructure:

AWS ElasticSearch Service to quickly and easily run logging aggregation infrastructure with minimal overhead above what we already manage
Fluentd to collect and aggregate our logs
Last but certainly not least, Wercker to bring everything together and deploy our brand spanking new logging infrastructure!

A Complete Beginner’s Guide To Blockchain

By

Forbes

-

January 25, 2017

A blockchain is a distributed database, meaning that the storage devices for the database are not all connected to a common processor. It maintains a growing list of ordered records, called blocks. Each block has a timestamp and a link to a previous block.

Cryptography ensures that users can only edit the parts of the blockchain that they “own” by possessing the private keys necessary to write to the file. It also ensures that everyone’s copy of the distributed blockchain is kept in synch.

Imagine a digital medical record: each entry is a block.

New to Programming? Check out these Outstanding Open Source Programming Books

By

-

January 25, 2017

Computer programming offers a fascinating career path. It’s full of challenges, a great way of collaborating, teaches you how to think, and most importantly offers a way to improve your life. Become more productive, efficient, and effective in life by learning the discipline of coding.

Anyone wanting to become a programmer needs a kick-start. There are so many questions to contemplate. What’s the best way of building a solid programming foundation? What’s the best way to learn? Should I read one of the ‘Teach yourself [insert programming language] in 24 hours’?

This is a compilation of useful free programming books. And free is in the sense of respecting freedom and community, as all of the books are released under an open source license. You are therefore free to copy, distribute, study, and display these books to your heart’s content.

Read the complete article

How to Set Up MariaDB SSL and Secure Connections from Clients

By

Vivek Gite

-

January 25, 2017

The number of hijacked database servers is growing every day. It is important that we secure communication between MariaDB server and client/webserver.

MariaDB is a database server that offers drop-in replacement functionality for MySQL server. MariaDB is built by some of the original authors of MySQL, with assistance from the broader community of Free and open source software developers. In addition to the core functionality of MySQL, MariaDB offers a rich set of feature enhancements including alternate storage engines, server optimizations, and patches. In this tutorial, I am going to give the instructions on how to setup MariaDB server with SSL, and how to establish secure connections from the console and PHP/Python scripts.

Read the complete article

6 Key Points about Intel’s Hot New Linux Distro

By

InfoWorld

-

January 25, 2017

The great thing about Linux is is that anyone possessing the wherewithal and dedication can produce a distribution to satisfy their own needs. That’s also the bad thing, as it means many Linux distributions, even those with name backing, fight to distinguish themselves or to be recognized at all.

With Intel Clear Linux, the name-brand recognition is only a small part of what matters. Yes, it’s significant that the kingpin chipmaker is adding an entry to Linux Distro Makers Club, but why and to what end?

Intel wants Clear Linux to be known for high performance when running cloud workloads.

OCI’s Push For Open Container Standards Continues in 2017

By

Chris Aniszczyk

-

January 24, 2017

Chris Aniszczyk is The Linux Foundation’s Vice President of Developer Relations and Programs where he serves as the Executive Director of the Open Container Initiative and COO of the Cloud Native Computing Foundation.

As we kick off 2017 and look ahead to the coming year, I want to take some time to reflect back on what the Open Container Initiative (OCI) community accomplished in 2016 and how far we’ve come in a short time since we were founded as a Linux Foundation project a little over a year ago.

The community has been busy working toward our mission to create open industry standards around container formats and runtime! Last year the project saw 3000+ commits from 128 different authors across 36 different organizations. With the addition of the Image Format specification project, we expanded our initial scope from just the runtime specification. Our membership grew to nearly 50 members with the addition of Anchore, ContainerShip, EasyStack and Replicated, which add an even more diverse perspective to the community. We also added new developer tools projects —runtime-tools and image-tools— which serve as repositories for conformance testing tools and have been instrumental in gearing up for the upcoming v1.0 release.

xrLUtogS3eywm7G7oqJzD1MtblU_SSE-3JK0Hpu3

We’ve also recently created a new project within OCI called go-digest (which was donated and migrated from docker/go-digest). This provides a strong hash-identity implementation in Go and services as a common digest package to be used across the container ecosystem.

In terms of early adoption, we have seen Docker support the OCI technology in its container runtime (libcontainer) and contribute it to the OCI project. Additionally, Docker has committed to adopting OCI technology in its latest containerd announcement. The Cloud Foundry community has been an early consumer of OCI by embedding runc via Garden as the cornerstone of its container runtime technology. The Kubernetes project is incubating a new Container Runtime Interface (CRI) that adopts OCI components via implementations like CRI-O and rklet. The rkt community is adopting OCI technology already and is planning to leverage the reference OCI container runtime runc in 2017. The Apache Mesos community is currently building out support for the OCI image specification.

Speaking of the v1.0 release, we are getting close to launch! The milestone release of the OCI Runtime and Image Format Specifications version 1.0 will be available this first quarter of 2017, drawing the industry that much closer to standardization and true portability. To that end, we’ll be launching an official OCI Certification program once the v1.0 release is out. With OCI certification, folks can be confident that their OCI-certified solutions meet a high set of criteria that deliver agile, interoperable solutions.

We’ll be looking into the possibility of adding more projects in the coming year, and we hope to showcase even more demonstrations of the specs in action under different scenarios. We’ll be onsite at several industry events, so please be on the lookout and check out events page for details.

There is still much work to be done! The success of our community depends on a wide array of contributions from all across the industry; the door is always open, so please come join us in shaping the future of container technology! In particular, if you’re interested in contributing to the technology, we recommend joining the OCI developer community which is open to everyone. If you’re building products on OCI technology, we recommend joining as a member and participating in the upcoming certification program.

Want to learn more about container standards? Watch the free re-play of The Linux Foundation webinar, “Container Standards on the Horizon.” Watch now!

This blog originally appeared on the OCI website.

How to Keep Hackers out of Your Linux Machine Part 3: Your Questions Answered

By

Mike Guthrie

-

January 24, 2017

Articles one and two in this series covered the five easiest ways to keep hackers out of your Linux machine, and know if they have made it in. This time, I’ll answer some of the excellent security questions I received during my recent Linux Foundation webinar. Watch the free webinar on-demand.

How can I store a passphrase for a private key if private key authentication is used by automated systems?

This is tough. This is something that we struggle with on our end, especially when we are doing Red Teams because we have stuff that calls back automatically. I use Expect but I tend to be old-school on that. You are going to have to script it and, yes, storing that passphrase on the system is going to be tough; you are going to have to encrypt it when you store it.

My Expect script encrypts the passphrase stored and then decrypts, sends the passphrase, and re-encrypts it when it’s done. I do realize there are some flaws in that, but it’s better than having a no-passphrase key.

If you do have a no-passphrase key, and you do need to use it. Then I would suggest limiting the user that requires that to almost nothing. For instance, if you are doing some automated log transfers or automated software installs, limit the access to only what it requires to perform those functions.

You can run commands by SSH, so don’t give them a shell, make it so they just run that command and it will actually prevent somebody from stealing that key and doing something other than just that one command.

What do you think of password managers such as KeePass2?

Password managers, for me, are a very juicy target. With the advent of GPU cracking and some of the cracking capabilities in EC2, they become pretty easy to get past. I steal password vaults all the time.

Now, our success rate at cracking those, that’s a different story. We are still in about the 10 percent range of crack versus no crack. If a person doesn’t do a good job at keeping a secure passphrase on their password vault, then we tend to get into it and we have a large amount of success. It’s better than nothing but still you need to protect those assets. Protect the password vault as you would protect any other passwords.

Do you think it is worthwhile from a security perspective to create a new Diffie-Hellman moduli and limit them to 2048 bit or higher in addition to creating host keys with higher key lengths?

Yeah. There have been weaknesses in SSH products in the past where you could actually decrypt the packet stream. With that, you can pull all kinds of data across. People use this safes to transfer files and passwords and they do it thoughtlessly as an encryption mechanism. Doing what you can to use strong encryption and changing your keys and whatnot is important. I rotate my SSH keys — not as often as I do my passwords — but I rotate them about once a year. And, yeah, it’s a pain, but it gives me peace of mind. I would recommend doing everything you can to make your encryption technology as strong as you possibly can.

Is using four completely random English words (around 100k words) for a passphrase okay?

Sure. My passphrase is actually a full phrase. It’s a sentence. With punctuation and capitalization. I don’t use anything longer than that.

I am a big proponent of having passwords that you can remember that you don’t have to write down or store in a password vault. A password that you can remember that you don’t have to write down is more secure than one that you have to write down because it’s funky.

Using a phrase or using four random words that you will remember is much more secure than having a string of numbers and characters and having to hit shift a bunch of times. My current passphrase is roughly 200 characters long. It’s something that I can type quickly and that I remember.

Any advice for protecting Linux-based embedded systems in an IoT scenario?

IoT is a new space, this is the frontier of systems and security. It is starting to be different every single day. Right now, I try to keep as much offline as I possibly can. I don’t like people messing with my lights and my refrigerator. I purposely did not buy a connected refrigerator because I have friends that are hackers, and I know that I would wake up to inappropriate pictures every morning. Keep them locked down. Keep them locked up. Keep them isolated.

The current malware for IoT devices is dependent on default passwords and backdoors, so just do some research into what devices you have and make sure that there’s nothing there that somebody could particularly access by default. Then make sure that the management interfaces for those devices are well protected by a firewall or another such device.

Can you name a firewall/UTM (OS or application) to use in SMB and large environments?

I use pfSense; it’s a BSD derivative. I like it a lot. There’s a lot of modules, and there’s actually commercial support for it now, which is pretty fantastic for small business. For larger devices, larger environments, it depends on what admins you can get a hold of.

I have been a CheckPoint admin for most of my life, but Palo Alto is getting really popular, too. Those types of installations are going to be much different from a small business or home use. I use pfSense for any small networks.

Is there an inherent problem with cloud services?

There is no cloud; there are only other people’s computers. There are inherent issues with cloud services. Just know who has access to your data and know what you are putting out there. Realize that when you give something to Amazon or Google or Microsoft, then you no longer have full control over it and the privacy of that data is in question.

What preparation would you suggest to get an OSCP?

I am actually going through that certification right now. My whole team is. Read their materials. Keep in mind that OSCP is going to be the offensive security baseline. You are going to use Kali for everything. If you don’t — if you decide not to use Kali — make sure that you have all the tools installed to emulate a Kali instance.

It’s going to be a heavily tools-based certification. It’s a good look into methodologies. Take a look at something called the Penetration Testing Framework because that would give you a good flow of how to do your test and their lab seems to be great. It’s very similar to the lab that I have here at the house.

Watch the full webinar on demand, for free. And see parts one and two of this series for five easy tips to keep your Linux machine secure.

Mike Guthrie works for the Department of Energy doing Red Team engagements and penetration testing.

The World of 100G Networking

By

Dawn Foster

-

January 24, 2017

Capacity and speed requirements keep increasing for networking, but going from where are now to 100G networking isn’t a trivial matter, as Christopher Lameter and Fernando Garcia discussed recently in their LinuxCon Europe talk about the world of 100G networking. It may not be easy, but with recently developed machine learning algorithms combined with new, more powerful servers, the idea of 100G networking is becoming feasible and cost effective.

Lameter talked about the challenge of processing the massive amount of data generated by a 100G network. He says that “a 1500 bit packet takes 115 nanoseconds. There is no time for you to process that. You can get 60 of those maximum packets within the 10 microsecond window. You will never be able to process this stuff at full speed, so this means the existing mechanism that can compensate for this in the 10G timeframe must either become more sophisticated or you must find other ways to process this data.”

One thing making 100G possible now is hardware with processors like Intel Skylake and IBM Power8 that are capable of sustaining 100G to memory. In addition to server resources, Lameter mentioned that we have also development of a large amount of machine learning, artificial intelligence, and algorithms that can help process the data more quickly. There is also funding from the U.S. Department of Education for new developments in the computer industry with the intent to build a much more powerful supercomputer that can do an extra petaflop of computation.

Moving forward, 100G is maturing, but the software, including the operating system network stack needs to mature to handle these speeds. In particular, Lameter said that in addition to memory throughput, ongoing issues like proper APIs and deeper integration of cpu, memory, and IO are required to make 100G networking a reality.

For all of the technical details, including Garcia’s section on testing and measurement, watch the entire video of the talk.

Interested in speaking at Open Source Summit North America on September 11-13? Submit your proposal by May 6, 2017. Submit now>>

Not interested in speaking but want to attend? Linux.com readers can register now with the discount code, LINUXRD5, for 5% off the all-access attendee registration price. Register now to save over $300!