April 13, 2018

I want to parse a text corpus of size 1.3 billion lines using a shallow parser. I have devided my corpus in 900 text files. What is the efficient way of parsing of all files? I mean how to run parser parallelly for all files?

I am trying following script:
shallow_parser.py file1 & shallow_parser.py file2 & ..... so on.

Click Here!