Best Grep Tool For Os X

  1. Best Grep Tool For Os X Os
  2. Best Grep Tool For Os X Download
  3. Best Grep Tool For Os X Catalina
  4. Best Grep Tool For Os X Operating System

Sift Sift is a grep alternative built with Golang which means that it’s widely available on Linux, Windows, OS X, and others. It’s ridiculously fast, and it has some cool use-cases that replace grep + awk combinations to extract data. I suggest that you check out the samples to learn about the powerful features in sift. Ack has been repackaged for most Linux distributions and OS X. On Debian-derived distributions, it is called 'ack-grep' because 'ack' was already a package for Kanji translation. Please note that the maintainers of ack have nothing to do with these packages.

Welcome to first post in the “Know Your Tools” series!

Without further ado…

Have you ever wondered if/how *nix command line utilities may differ across distributions? Perhaps it never even occurred to you that there was even a possibility the tools were any different. I mean, they’re basic command line tools. How and why could/would they possibly differ?

Well, I’m here to say… thy basic command line utilities art not the same across different distributions. And, the differences can range from those that can cause a simple nuisance to those that can cause oversight of critical data.

Rather than going into aspects of this discussion that have already been covered such as how Linux and BSDgenerallydiffer, I would instead like to focus on a few core utilities commonly used in/for DFIR artifact analysis and some caveats that may cause you some headache or even prevent you from getting the full set of results you’d expect. In highlighting the problems, I will also help you identify some workarounds I’ve learned and developed over the years in addressing these issues, along with an overarching solution at the end to install GNU core utilities on your Mac (should you want to go that route).

Let’s get to it.

Grep

Grep is one of the most useful command-line utilities for searching within files/content, particularly for the ability to use regular expressions for searching/matching. To some, this may be the first time you’ve even heard that term or “regex” (shortened version of it). Some of you may have been using it for a while. And, nearly everyone at some point feels like…

Amirite?

Regardless of whether this is your first time hearing about regular expressions or if you use them regularly albeit with some level of discomfort, I HIGHLY suggest you take the time to learn and/or get better at using them – they will be your most powerful and best friend for grep. Though there is a definite regex learning curve (it’s really not that bad), knowing how to use regular expressions translates directly to performing effective and efficient searches for/of artifacts during an investigation.

Nonetheless, even if you feel like a near master of regular expressions, equally critical to an expression’s success is how it is implemented within a given tool. Specifically for grep, you may or may not be aware that it uses two different methods of matching that can highly impact the usefulness (and more important, validity) of results returned – Greedy vs. Lazy Matching. Let’s explore what each of these means/does.

At a very high level, greedy matching attempts to find the last (or longest) possible match, and lazy matching attempts to find the first possible match (and stops there). More specifically, greedy matching employs what is called backtracking and look-behind’s but that is a separate discussion. Suffice to say, using an incorrect, unintended, and/or unexpected matching method can completely overlook critical data or at the very least provide an inefficient or invalid set of results.

Now having established some foundational knowledge about how grep searches can work, we will drop the knowledge bomb – the exact same grep expression on Linux (using GNU grep) may produce completely different or no results on Mac (using BSD grep), especially when using these different types of matching.

…What? Why?

The first time I found this out I spent an inordinate and unnecessary amount of time banging my head against a wall typing and re-typing the same expression across systems but seeing different results. I didn’t know what I didn’t know. And, well, now I hope to let you know what I didn’t know but painfully learned.

While there is an explanation of why, it doesn’t necessarily matter for this discussion. Rather, I will get straight to the point of what you need to know and consider when using this utility across systems to perform effective searches. While GREEDY searches execute pretty much the same across systems, the main difference comes when you are attempting to perform a LAZY search with grep.

We’ll start with GREEDY searches as there is essentially little to no difference between the systems. Let’s perform a greedy search (find the last/longest possible match) for any string/line ending in “is” using grep’s Extended Regular Expressions option (“-E”).

(Linux GNU)$ echo “thisis” | grep -Eo ‘.+is'
thisis
(Mac BSD)$ echo “thisis” | grep -Eo ‘.+is'
thisis

Both systems yield the same output using a completely transferrable command. Easy peasy.

Note: When specifying Extended Regular Expressions, you can (and I often do) just use “egrep” which implies the “-E” option.

Now, let’s look at LAZY searches. First, how do we even specify a lazy search? Well, to put it simply, you append a “?” to your matching sequence. Using the same search as before, we’ll instead use lazy matching (find the first/shortest match) for the string “is” on both the Linux (GNU) and Mac (BSD) versions of grep and see what both yield.

(Linux GNU)$ echo “thisis” | grep -Eo ‘.+?is'
thisis
(Mac BSD)$ echo “thisis” | grep -Eo ‘.+?is'
this

Here the fun begins. We did the exact same command on both systems and it returned different results.

Well, for LAZY searches, Linux (GNU) grep does NOT recognize lazy searches unless you specify the “-P” option (short for PCRE, which stands for Perl Compatible Regular Expressions). So, we’ll supply that this time:

Best Grep Tool For Os X Os

(Linux GNU)$ echo “thisis” | grep -Po ‘.+?is'
this

There we go. That’s what we expected and hoped for.

*Note: You cannot use the implied Extended expression syntax of “egrep” here as you will get a “conflicting matchers specified” error. Extended regex and PCRE are mutually exclusive in GNU grep.

Note that Mac (BSD), on the other hand, WILL do a lazy search by default with Extended grep. No changes necessary there.

While not knowing this likely won’t lead to catastrophic misses of data, it can (and in my experience will very likely) lead to massive amounts of false positives due to greedy matches that you have to unnecessarily sift through. Ever performed a grep search and got a ton of very imprecise and unnecessarily large (though technically correct) results? This implementation difference and issue could certainly have been the cause. If only you knew then what you know now…

So, now that we know how these searches differ across systems (and what we need to modify to make them do what we want), let’s see a few examples where using lazy matching can significantly help us (note: I am using my Mac for these searches, thus the successful use of Extended expressions using “egrep” to allow for both greedy and lazy matching)…

User-Agent String Matching
Let’s say I want to identify and extract the OS version from Mozilla user-agent strings from a set of logs, the format of which I know starts with “Mozilla/“ and then contains the OS version in parenthesis. The following shows some examples: Dell xps l502x unknown device driver windows 10.

  • Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36
  • Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2226.0 Safari/537.36
  • Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36
  • Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.1 Safari/537.36

Greedy Matching (matches more than we wanted – fails)
(Mac BSD)$ echo 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.1 Safari/537.36' | egrep -o 'Mozilla.+)'
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko)

Lazy Matching
(Mac BSD)$ echo 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.1 Safari/537.36' | egrep -o 'Mozilla.+?)'
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1)

Searching for Malicious Eval Statements
Let’s say I want to identify and extract all of the base64 eval statements from a possibly infected web page for analysis, so that I can then pipe it into sed to extract only the base64 element and decode it for plaintext analysis.

Greedy Matching (matches more than we wanted – fails)
(Mac BSD)$ echo 'date=new Date(); eval(base64_decode('DQplcnJvcl9yZ=')); var ua = navigator.userAgent.toLowerCase();' | egrep -o 'eval(base64_decode(.+)'
eval(base64_decode('DQplcnJvcl9yZ=')); var ua = navigator.userAgent.toLowerCase()

Lazy Matching (matches exactly what we want)
(Mac BSD)$ echo 'date=new Date(); eval(base64_decode('DQplcnJvcl9yZ=')); var ua = navigator.userAgent.toLowerCase();' | egrep -o 'eval(base64_decode(.+?)'
eval(base64_decode('DQplcnJvcl9yZ=“))

There you have it. Hopefully you are now a bit more informed not only about the differences between Lazy and Greedy matching, but also about the difference in requirements across systems.

Strings

Strings is an important utility for use in extracting “human-readable” strings from files/binaries. It is particularly useful in extracting strings from (suspected) malicious binaries/files to attempt to acquire some insight into what may be contained within the file, its capabilities, hard-coded domains/URL’s, commands, … the list goes on.

However, not all strings are created equal. Sometimes, Unicode strings exist within a file/program/binary for various reasons, those of which are also important to identify and extract. By default, the GNU (Linux) strings utility searches for simple ASCII encoding, but also allows you to specify additional encodings for which to search, to include Unicode. Very useful.

By default, the Mac (BSD) strings utility also searches for simple ASCII encoding; however, I regret to inform you that the Mac (BSD) version of strings does NOT have the native capability to search for Unicode strings. Do not ask why. I highly encourage you to avoid the rabbit hole of lacking logic that I endured when I first found this out. Instead, we should move on and instead just be asking ourselves, “What does this mean to me?” Well, if you’ve only been using a Mac to perform string searches using the native BSD utility, you have been MISSING ALL UNICODE STRINGS. Of all the pandas, this is a very sad one.

So, what are our options?

There are several options, but I personally use one of the following (depending no the situation and my mood) when I need to extract both Unicode and ASCII strings from a file using a Mac (BSD) system:
1. Willi Ballenthin’s Python strings tool to extract both ASCII and Unicode strings from a file
2. FireEye’s FLOSS tool (though intended for binary analysis, it can also work against other types of files)
3. GNU strings*

*Wait a minute. I just went through saying how GNU strings isn’t available as a native utility on a Mac. So, how can I possibly use GNU strings on it? Well, my friends, at the end of this post I will revisit exactly how this can be achieved using a nearly irreplaceable third-party package manager.

Now, go back and re-run the above tools against various files and binaries from your previous investigations you performed from the Mac command line. You may be delighted at what new Unicode strings are now found 🙂

Sed

Sed (short for “Stream editor”) is another useful utility to perform all sorts of useful text transformations. Though there are many uses for it, I tend to use it mostly for substitutions, deletion, and permutation (switching the order of certain things), which can be incredibly useful for log files with a bunch of text.

Best grep tool for os x catalina

For example, let’s say I have a messy IIS log file that somehow lost all of its newline separators and I want to extract just the HTTP status code, method, and URI from each line and output into its own separate line (restoring readability):

…2016-08-0112:31:16HTTP200GET/owa2016-08-0112:31:17HTTP200GET/owa/profile2016-08-0112:31:18HTTP404POST/owa/test…

Looking at the pattern, we’d like to insert a newline before each instance of the date, beginning with “2016-…”. Lucky for us, we’re on a Linux box with GNU sed and it can easily handle this:

(Linux GNU)$ sed 's/ (.+?)2016/1n2016/g' logfile.txt
2016-08-0112:31:16HTTP200GET/owa
2016-08-0112:31:17HTTP200GET/owa/profile
2016-08-0112:31:18HTTP404POST/owa/test
..

You can see that it not only handles lazy matching, but also handles ANSI-C escape sequences (e.g., n, r, t, …). This statement also utilizes sed variables, the understanding of which I will leave to the reader to explore.

Sweet. Let’s try that on a Mac…

(Mac BSD)$ sed 's/(.+?)(.+)/1n2016/g' logfile.txt
2016-08-0112:31:16HTTP200GET/owa2016-08-0112:31:17HTTP200GET/owa/profile2016-08-0112:31:18HTTP404POST/owa/test

… Ugh. No luck.

Believe it or not, there are actually two common problems here. The first is the lack of interpretation of ANSI-C escape sequences. BSD sed simply doesn’t recognize any (except for n, but not within the replacement portion of the statement), which means we have to find a different way of getting a properly interpreted newline into the statement.

Below are a few options that will work around this issue (and there are more clever ways to do it as well).

1. Use the literal (i.e., for a newline, literally insert a new line in the expression)
(Mac BSD)$ sed ’s//*Press Enter*
> /g'

2. Use bash ANSI-C Quoting (I find this the easiest and least effort, but YMMV)
(Mac BSD)$ sed 's//'$'n/g’
3. Use Perl
(Mac BSD)$ perl -pe ‘s||n|g'

Coreldraw 2019 crack install. Unfortunately, this only solves the first of two problems, the second being that BSD sed still does not allow for lazy matching (from my testing, though I am possibly just missing something). So, even if you use #1 or #2 above, it will only match the last found pattern and not all the patterns we need it to.

“So, should I bother with using BSD sed or not?”

Well, I leave that up to your judgment. Sometimes yes, sometimes no. In cases like this where you need to use both lazy matching and ANSI-C escape sequences, it may just be easier to skip the drama and use Perl (or perhaps you know of another extremely clever solution to this issue). Options are always good.

Note: There are also other issues with BSD sed like line numbers and using the “-i” parameter. Should you be interested beyond the scope of this post, this StackExchange thread actually has some useful information on the differences between GNU and BSD sed. Though, I’ve found that YMMV on posts like this where the theory and “facts” may not necessarily match up to what you find in testing. So, when in doubt, always test for yourself.

Find

Best Grep Tool For Os X Download

Of all commands, you might wonder how something so basic as find could differ across *nix operating systems. I mean, what could possibly differ? It’s just find, the path, the type, the name… how or why could that even be complicated? Well, for the most part they are the same, except in one rather important use case – using find with regular expressions (regex).

Let’s take for example a regex to find all current (non-archived/rotated) log files.

On a GNU Linux system this is somewhat straight forward:

(Linux GNU)$ find /var/log -type f -regextype posix-extended -regex '/var/log/[a-zA-Z.]+(/[a-zA-Z.]+)*'

You can see here that rather than using the standard “-name” parameter, we instead used the “-regextype” flag to enable extended expressions (remember egrep from earlier?) and then used the “-regex” flag to denote our expression to utilize. And, that’s it. Bless you, GNU!

Obviously, Mac BSD is not this straight forward, otherwise I wouldn’t be writing about it. It’s not exactly SUPER complicated, but it’s different enough to cause substantial frustration as your Google searches will show that the internet is very confused about how to do this properly. I know. Shocking. Nonetheless, there is value in traveling down the path of frustration here so that you don’t have to when it really matters. So, let’s just transfer the command verbatim over to a Mac and see what happens.

(Mac BSD)$ find /var/log -type f -regextype posix-extended -regex '/var/log/[a-zA-Z.]+(/[a-zA-Z.]+)*'
find: -regextype: unknown primary or operator

Great, because why would BSD find use the same operators, right? That would be too easy. By doing a “man find” (on the terminal, not in Google, as that will produce very different results from what we are looking for here) you will see that BSD find does not use that operator. Though, it still does use the “-regex” operator. Easy enough, we’ll just remove that bad boy:

(Mac BSD)$ find /var/log -type f -regex '/var/log/[a-zA-Z.]+(/[a-zA-Z.]+)*
(Mac BSD)$

Best Grep Tool For Os X Catalina

No results. Ok. Let’s look at the manual again… ah ha, to enable extended regular expressions (brackets, parenthesis, etc.), we need to use the “-E” option. Easy enough:

(Mac BSD)$ find /var/log -E -type f -regex '/var/log/[a-zA-Z.]+(/[a-zA-Z.]+)*'
find: -E: unknown primary or operator

Huh? The manual says the “-E” parameter is needed, yet we get the same error message we got earlier about the parameter being an unknown option. I’ll spare you a bit of frustration and tell you that it is VERY picky about where this flag is put – it must be BEFORE the path, like so:

(Mac BSD) $> find -E /var/log -type f -regex '/var/log/[a-zA-Z.]+(/[a-zA-Z.]+)*'
/var/log/alf.log
/var/log/appfirewall.log
/var/log/asl/StoreData
/var/log/CDIS.custom
/var/log/corecaptured.log
/var/log/daily.out
/var/log/DiagnosticMessages/StoreData
/var/log/displaypolicyd.log
/var/log/displaypolicyd.stdout.log
/var/log/emond/StoreData
/var/log/install.log
/var/log/monthly.out
/var/log/opendirectoryd.log
/var/log/powermanagement/StoreData
/var/log/ppp.log
/var/log/SleepWakeStacks.bin
/var/log/system.log
/var/log/Tunnelblick/tunnelblickd.log
/var/log/vnetlib
/var/log/weekly.out
/var/log/wifi.log

Success. And, that’s that. Nothing earth shattering here, but different and unnecessarily difficult enough to be aware of in your switching amongst systems.

So, now what?

Are you now feeling a bit like you know too much about these little idiosyncrasies? Well, there’s no going back now. If for no other reason, maybe you can use them to sound super smart or win bets or something.

These are just a few examples relevant to the commands and utilities often used in performing DFIR. There are still plenty of other utilities that differ as well that can make life a pain. So, now that we know this, what can we do about it? Are we doomed to live in constant translation of GNU <—> BSD and live without certain GNU utility capabilities on our Macs? Fret not, there is a light at the end of the tunnel…

If you would like to not have to deal with many of these cross-platform issues on your Mac, you may be happy to know that the GNU core utilities can be rather easily installed on OS X. There are a few options to do this, but I will go with my personal favorite method (for a variety of reasons) called Homebrew.

Homebrew (or brew) has been termed “The missing package manager for OS X”, and rightfully so. It allows simple command-line installation of a huge set of incredibly useful utilities (using Formulas) that aren’t installed by default and/or easily installed via other means. And, the GNU core utilities are no exception.

As a resource, Hong’s Technology Blog provides a great walk-through of installation and considerations.

You may already be thinking, “Great! But wait… how will the system know which utility I want to run if both the BSD and GNU version are installed?” Great question! By default, homebrew installs the binaries to /usr/local/bin. So, you have a couple options, depending on which utility in particular you are using. Some GNU utilities (such as sed) are prepended with a “g” and can be run without conflict (e.g., “gsed” will launch GNU sed). Others may not have the “g” prepended. In those cases, you will need to make sure that /usr/local/bin is in your path (or has been added to it) AND that it precedes those of the standard BSD utilities’ locations of /usr/bin, /bin, etc. So, your path should look something like this:

$ echo $PATH
/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin

With that done, it will by default now launch the GNU version installed in /usr/local/bin instead of the standard system one located in /usr/bin. Train simulator 2016 free mod apk. And, to use the native system utilities when there is a GNU version installed with the same name, you will just need to provide their full path (i.e., “/usr/bin/<utility>”).

Please feel free to sound off in the comments with any clever/ingenious solutions not covered here or stories of epic failure in switching between Linux and Mac systems 😃

/JP

This page replaces the old page entitled Unix grep for OS X and
Nisus Writer Express, which is now obsolete. (2007-09-26)

Version 0.7.2 (2007-09-26)

Introduction

As I wrote elsewhere (cf. my page rtf to UTF-8 text and pTeX), Mac OS X lacks a good word-processing program -- this problem can be solved in part by using LaTeX.. ; but it lacks also a good search program for text contents of files. The best search program for Classic Mac OS, MgrepApp still works on the Classic environment of OS X (although MgrepApp throws at the start-up time a warning saying that it expired, you can get rid of it by hitting the Return key and clicking the close window box); but it will not work on Intel-Mac..! Mac OS X comes with a very powerful and fast searching utility which is GNU grep, but you have to use Terminal to use it, and this is not easy at all. On the other hand, GNU grep can only search in UTF-8 text files; as many files for our stuides, such as CBETA Taisho files or SAT Taisho files are either in Big5-Eten or in Shift_JIS, this is very inconvenient (although CBETA is beginning to distribute UTF-8 files as well.. [but be aware that CBETA UTF-8 files use Windows line ending characters, so that they may raise problems when using on OS X applications..]).

Therefore, I wrote an AppleScript droplet which will convert all files in a folder into UTF-8 files (please see my other page Batch Convert Files to UTF-8); and I wrote another AppleScript droplet which can be used as an interface for GNU grep. Used in combination with TextWrangler and/or Jedit X, this latter droplet can simulate to a certain extent the behavior of MgrepApp: you will get a list of result; double-clicking on one item, TextWrangler or Jedit X will open the target file, and will select the target matched word. It is this droplet, named unix_grep.app, that I would like to present in this page. The files MUST be in UTF-8 encoding; files with Mac line ending characters cannot be searched, but those with Windows line ending characters can be used (of course, those with Unix line ending characters are preferable).

TextWrangler is a free text editor, very powerful, very fast, and very useful. Unfortunately, it cannot handle styled text.
Jedit X is a shareware text editor (2940 yen or $28); it is also very powerful; it is perhaps a little slower than TextWrangler, but it can handle styled text as well as TextEdit.

I assume in this ReadMe that we are using unix_grep.app for search words in Taisho text files, and will take examples for this kind of use. To follow these examples, you should first have converted to UTF-8 text files, at least some folders of CBETA files, using my other utility, 'batch_conv2utf8_encoding_check.app'. -- But of course, you can use unix_grep.app for any other UTF-8 text files or folders containing UTF-8 text files.

Notes on the new version 0.7.1:

After I released the first version of my unix_grep.app, I asked a friend, Hamid Haji, to test it. He discovered a serious problem with Arabic text files -- in many cases, my droplet fails to find the searched words, and often it crashes. I tried to find the culprit, and discovered that AppleScript's choose from list command is unable to handle long string of Arabic text. Moreover, AppleScript's droplet mechanism cannot deal correctly with Arabic file names. I think the same is true for other languages (scripts) using ligatures, e.g. Hebrew or Devanagari -- and perhaps other languages/scripts.

To avoid the problem with AppleScript's choose from list, I wrote a new version in which I added a new option, special_scripts. If you set this option to 1, the list selecting window will be skipped, and the result of the search will be opened right away in your default application. -- This option can be used for other languages/scripts as well, if you don't need the list selecting window. It is certainly faster, and more robust than when using the list selecting window. So, if you think that the search result will be very large, it will be probably better to use this option.

As to the problem of file names in Arabic or other scripts using ligature, the only way to avoid it is either rename the files with Roman name, or put the files in a folder having a Roman names (and set the option recursive to 1 if the files are in sub-folders [I think the sub-folders can have Arabic names..]).

The new version is improved in other parts also: it can now accept theoretically any number of files or folders (so that perhaps the save symlink mode is a little less useful in this version [see below]).

I changed also the format of the result file, in which the first line will summarize the search result.

Finally, I changed the two scripts for TextWrangler and Jedit X, named open_file_fromGrepRes.scpt, so that now, it will not only open the target file and select the target line, but select the target word.

I rewrote the following documentation to fit with the new version.

End of notes for version 0.7.1.

notes for version 0.7.2

I fixed one bug in the interface: when you have once entered '*' in ext field (standing for 'all files'), it was impossible to get rid of it. This bug, reported by John McRae, could be fixed.s

End of notes for version 0.7.1.

Requirements, Contents of the package and How to install:

Requirements:

  1. OS X 104 and later
  2. Jedit X -- This is optional (the demo version, working for one month, can be downloaded free of charge..)
  3. a bunch of folders containing UTF-8 text files

Contents:

When expanded, the package that you will download from this page (see at the bottom of the page) will contain:
  • unix_grep_AppleScript/
    • (Don't change this file) settings.txt
    • ReadMe.rtf this file
    • grep_symlinks_folder an empty folder
    • unix_grep.app
    • unix_grep_res.txt an empty UTF-8 file
    • Put_in_App_script_folder/
      • for_Jedit_X/
        • open_file_fromGrepRes.scpt
      • for_TextWrangler/
        • open_file_fromGrepRes.scpt

How to install:

To use the two scripts named open_file_fromGrepRes.scpt for two applications, one for TextWrangler and the other for Jedit X, you have to copy them in their respective 'Scripts' folder.

For TextWrangler, it is easy:

  1. Locate the Scripts folder for TextWrangler:
    /Users/[your_account]/Library/Application Support/TextWrangler/Scripts/
  2. Click on the script open_file_fromGrepRes.scpt in the folder for_TextWrangler, press the Option key, and drag the script into that folder.
  3. You should also set the default file encoding of TextWrangler to Unicode (UTF-8, no BOM) if it is not, at this moment:
    Launch TextWrangler, and choose the menu-item TextWrangler > Preferences; in the left side pane, click Text encodings. At the bottom of the window, set the popup menu under If file's encoding can't be guessed, use: to Unicode (UTF-8, no BOM).
This is all for TextWrangler.

For Jedit X:

  1. Launch Jedit X, and select Window > Script Window or Macros > Show Script Window in Jedit X to display the Script window.
  2. Click on the Macro Menu tab of the Script window;
  3. Drag the script open_file_fromGrepRes.scpt in the folder for_Jedit_X from the Finder to the desired location in the Script window to save it there.
  4. You will be asked which you want to copy the script file or the alias file. You would click on Copy, and the script file will automatically be saved in the following scripts folder:
    /Users/[your_account]/Library/Application Support/Jedit X/scripts/
  5. For Jedit X too, you should set the default file encoding to Unicode (UTF-8) at this moment:
    Choose the menu-item Jedit X > Preferences; press the icon Encoding at the top of the window; set the pop-up menu under Default Encoding and Line Endings for Plain Text to Unicode (UTF-8) (and the Line Endings to Unix (lf)).

For details, you can refer to Jedit X's help: Chapter 11.2: 'Script Window', and Chapter 2.4: 'Encoding'.

After you have installed these two scripts, you can place the folder Unix_grep_AppleScript anywhere you want (preferably on your desktop?), but you should NOT change the structure of this folder. Especially, unix_grep.app, the folder grep_symlinks_folder and the text file unix_grep_res.txt should be in the same folder.

How to use:

To see how unix_grep.app works, first, make sure that you have all the needed pieces:
  • TextWrangler
  • Jedit X (although this is optional, I would recommend to download it, if you don't have it already..)
  • one or a bunch of folders full of text files in UTF-8 (that you may have created with my another utility batch_convert2utf8.app [see the page Batch Convert Files to UTF-8]..) -- for example, a folder named 'T01', containing all the files of volume 1 of the Taisho Canon.
    -- Hereafter, I will take folders of the Taisho Canon as example.

    and of course

  • unix_grep.app

Here are the basic steps:

Best Grep Tool For Os X Operating System

  1. Drag and drop your folder 'T01' onto the icon of unix_grep.app.
  2. A dialog will appear, asking you to enter a search word.
  3. Enter for example '大自在' (without quotes)
  4. You will see almost immediately a list selecting dialog, with the title:
    Found 2 matches..
    with the following prompt:
    Choose one to open the file with the default_app.. or.. Press OK with no selection to save the result.
    And in the list selecting window, you will see two lines: one beginning with T01n0022.txt and the other with T01n0081.txt (here, I use the CBETA files as example..).
  5. For this time, select the first item, the one which is beginning with T01n0022.txt, and press OK (you can do this by hitting once the Down Arrow key, then the Return key; alternatively, you can select the item with the mouse and double-click on it).
  6. TextWrangler will launch, open the file T01n0022.txt and select the word '大自在' in the line 403 (if the line contains more than one occurrence of the target word, only the FIRST one will be selected):
    T01n0022_p0275b07(03)||其行平等。尊大自在。心念無畏。以一身化無數身。
  7. The same list selecting window of unix_grep.app will appear again, at the front.This would repeat indefinitely if you don't do either of the following steps.. So, to get rid of this list selecting window and return to TextWrangler, you can either click on the button Cancel or OK.
    • If you click the Cancel button, unix_grep.app will quit without doing anything (and the result of the grep search will be discarded);
    • If you click the OK button, the result of the search will be written in a file, the file named unix_grep_res.txt (located in the same folder as unix_grep.app), and this file will be opened by TextWrangler.
  • If you don't select any line at the first list selecting window, and press the OK button (or hit the Return key), the file unix_grep_res.txt will be opened by TextWrangler, and unix_grep.app will quit.
  • If you don't select any line at the first list selecting window, and/or press the Cancel button, unix_grep.app will quit, discarding the search result.

Be warned that the result of the search will be overwritten each time in the file unix_grep_res.txt, so that you should close this window each time. If you want to save the result, you will have to save it in another file.

You can use the result in the file unix_grep_res.txt to open the file and select the word of your search.

  • Select one line (ALL the line) of the result file that you want, and run the script open_file_fromGrepRes from the AppleScript menu of TextWrangler (or the Macro menu if you use Jedit X).
  • The target file will open, and the target word will be selected.

This is the basic use of the application.

How to configure the settings:

To see different possible settings, please open the file named (Don't change this file) settings.txt with TextEdit or any other text editor. You will see the following default settings:
ignore_case:0
recursive:0
ext:txt
----------------
default_app:TextWrangler
----------------
save_symlink:0
add_to_symlink:0
----------------
special_scripts:0

The file (Don't change this file) settings.txt is there only to show you the default setting of the droplet. If you don't need it, you can put it anywhere.

You will see the same list if you double-click on the icon of unix_grep.app. You can change this default setting:

Double-click on the the icon of unix_grep.app: you will see a list selecting window showing the current setting. You can simply hit the Return key, without selecting any item, -- or click the Cancel button -- to not change the setting.

  • If you select an item and hit the Return key, a new dialog will ask you to enter the value you want for the selected item (see below for possible values for each item, and some explanation).
  • When you click the OK button, another dialog will ask you: Have you finished your changes? -- If you press Finished, a confirming list window appears. Press OK in that window to save the changes, with three buttons, Cancel, Finished and Not yet.. (the default button).
  • If you press Not yet.., the same list selecting window will appear, asking to select one item, and this will repeat until you press Finished (or Cancel -- in which case, all the changes made will be discarded..).
  • If you press Finished, a new list selecting window will appear: it is simply to confirm or not the changes made. You will either press the OK button, to save the changes, or press the Cancel button to discard any changes.

Now, here are some words for each option:

  1. ignore_case: 0, that is case sensiive, or 1, case non-sensitive search (note that for kanji searches, ignore_case has no meaning).
  2. recursive: 0, that is the search will be done only on the first level files in the folder dropped on unix_grep.app, or 1, that is the search will be done in all the files in nested folders in the folder dropped on the application.
  3. ext: extension of the files to be searched. It can be for example txt, html, xml, or pl [for Perl source code files], etc., or '*'. The last one, '*', means all the extensions. Note that the search will not be done if the files have no extension at all. It is *possible* to search in other kinds of files, for example 'doc' files or 'rtf' files, but the result will be totally garbled and meaningless. You should always specify an extension of text files in UTF-8 encoding (with preferably the Unix line ending characters).
  4. default_app: This can be either TextWrangler or Jedit X. Jedit X will behave exactly the same way as TextWrangler, although Jedit X is slower to open large files. If you don't have Jedit X, and you set the option default_app to it, the application will quit, with a warning (but I could not test this situation..). -- It seems that Jedit X fails sometimes to open the target file. In such cases, I would recommend to use rather TextWrangler..
    Latest note added: -- I think I could fix this problem..

The two options, save_symlink, and add_to_symlink, are somehow special, and need to be explained. I use egrep as the search engine for my application, which can perform 'OR' search.

For example, if you want to search for lines which contain '尸棄' OR '光明' in T09, you would..:

  1. Drag & drop the folder T09 onto the icon of unix_grep.app
  2. Type '尸棄|光明' in the dialog asking you to enter the term to search, and you will get a list of 1253 matched lines, which contain either '尸棄' or '光明', or both at the same time. It is the operator '|' which means 'OR' search.

But it is impossible to do 'AND' search with grep or egrep. For example, you might want to find out files which contain both '尸棄' AND '光明'; this is impossible with a simple grep or egrep search. To achieve this goal, you have first to find out files containing (for example) '尸棄'; then find those containing the word '光明' in the found files. This is for such cases that the save_symlink option can be useful.

  1. First, you will set the option save_symlink to 1 (double-click on unix_grep.app, select the save_symlink option, enter 1, press Finished, press OK..); then
  2. You will drag and drop the same T09 onto the icon of unix_grep.app
  3. Type (for example) '尸棄' in the first dialog.
  4. You will see a list window showing 10 lines matching the word '尸棄' in T09; the title of the window will display:
    Save_symlink mode: Found 10 match(es) in 3 file(s)..
    and the Prompt of the window will say:
    Press OK to save the symlink files (existing symlink file[s] will be deleted..)
    -- So, there are only 3 files in T09 in which the word '尸棄' occurs.
  5. Hitting the Return key, you will save the symbolic linked files of the matched files in your folder grep_symlinks_folder (selecting an item in the list has no meaning in Save_symlink mode!).
    Opening the grep_symlinks_folder, you will find 3 files, named T09n0262.txt, T09n0264.txt, and T09n0278.txt -- each of them having a little arrow at the lower left corner of the icon, indicating that they are symbolic linked files (a symbolic linked file is a kind of alias files used in Unix; i is very little in size [only 4 KB each]; double-clicking on its icon will open the original file linked to it).
  6. Now, set the option save_symlink to 0;
  7. Drag and drop the folder grep_symlinks_folder onto unix_grep.app
  8. Enter the word '光明' in the first dialog;
  9. You will get a list of 1085 matched lines..

This means that the 'OR' search is extensive , while 'AND' search is restrictive.

Now, for the other option, that is add_to_symlink:
This option is meaningful only when the option save_symlink is set to 1. If the option add_to_symlink is set to 0, all the symbolic linked files that are in the folder grep_symlinks_folderwill be deleted at each search in the Save_symlink mode, but it you set this option to 1, the symbolic linked files that are already in the grep_symlinks_folder will not be deleted.

This can be useful when you want to gather symbolic linked files satisfying some condition from one search session to another (with the previous version, which accepted only one folder, this was more crucial..).
For example, you have gathered in the previous example symbolic linked files containing the word '尸棄' that were in the folder T09. If you want to add to these files symbolic linked files satisfying the same condition from the folder T10, you will set the option add_to_symlink to 1, and drop the folder T10 on unix_grep.app, and perform the same search. You will get then 3 more files in the grep_symlinks_folder: T10n0279.txt, T10n0293.txt and T10n0294.txt. You can do any other searches on these files if you drop the grep_symlinks_folder onto unix_grep.app (you probably should set the options save_symlink and add_to_symlink to 0).

The last option, special_scripts, was explained above, in the 'Notes on the new version 0.7.1'.
If you want to perform searches in Arabic (or certainly Hebrew or probably Devanagari or other languages/scripts using ligatures), you have to set this option to 1. The list selecting window will be skipped, and the search result file will be opened directly in your default application. You will have to select one line of this result file, and run the script open_file_fromGrepRes, to open the target file, and select the target term. -- This is due to a bug in AppleScript, and this was the only way I could work around it.

Note that you can set this optionto 1 for other languages/scripts, if you don't need the list selecting window. It is certainly faster, and more robust than when using the list selecting window. So, if you think that the search result will be very large, it will be probably better to use this option.

Supplementary notes:

A. You can use the recursive option to search files in nested folders inside one folder. For example, if you have a folder named 'Taisho', in which you have folders such as T01, T02, T03.. T85, you can search all the files in these sub-folders with the option recursive set to 1 (a search for the term '摩訶迦羅天' in all the CBETA Taisho files -- which finds 10 matched lines -- takes less than one minute on my machine, a now rather slow PowerPC G4 Dual 867 MHz. The time needed for the search seems to depend more on the number of hits than the number of files to be parsed..).

You can drop also more than one folder or file onto unix_grep.app. But you can perform more sophisticated searches if you use symlinked folders, and for that, you can use my another utility, named make_symlink.app that you will find in my page Make Symlink. For example, you can do something like the following:

  1. Make a new empty folder where you want, and name it, for example, 'agama';
  2. Locate your folders T01 and T02, and drag and drop them onto the icon of make_symlink.app;
  3. A folder choosing dialog will ask you to select the folder you want: you would select the newly created folder 'agama'.
That's all: you will have symbolic linked folders of your T01 and T02 folders inside your folder 'agama'; you would drag and drop this folder, 'agama', onto unix_grep.app, to search all the files in your original T01 and T02 folders (the option recursive must be set to 1).

You can use the same technique to perform other kind of searches: for example, you would locate all the files whose translator is 鳩摩羅什, gather symbolic linked files of these files in a folder named 'translations_kumarajiva', and search terms in these files, etc.

B. I would recommend to verify the setting of unix_grep.app before each time you want to use it. To do this, double-click on its icon; you will see the list selecting window showing the current setting. You can only hit the Return key if you are sasitfied with the setting; or you will select one item, to change the setting(s)..

C. You should learn also how egrep works, and what wildcard characters can be used. Please have a look at (for example):
http://www.wellho.net/regex/grep.html

D. Due to a bug in AppleScript's droplet mechanism, file or folder names in Arabic (or other 'special' languages) will not be recognized. In such cases, the best is simply to change these file/folder names into Roman names. But the search itself can be done if you put your files/folders with 'special' language names in a folder with a Roman name. You can put symlinked files/folders in a folder with a Roman name as well (don't forget to set the option recursive to 1 if the text files are inside sub-folders..).

E. A final note of warning: I think unix_grep.app is rather robust, but it is a simple AppleScript utility : you should NEVER search for words which may occur more than one or two thousands times. For example, NEVER try to search for '佛' in all the Taisho canon! That would crash certainly the application, and perhaps even the system!!

Download

Please download the package from this link(171K to download).

I would appreciate any feedback, comments, bug reports or requests.

Thank you!

Go to Research tools Home Page
Go to NI Home Page