Pig – Not Bacon
Apache Pig is a platform for analyzing large data sets – also works with the Hadoop sub-project – online resources – http://pig.apache.org/docs/r0.12.1/
Unix and Windows users need the following:
- Hadoop 0.20.2, 020.203, 020.204, 0.20.205, 1.0.0, 1.0.1, or 0.23.0, 0.23.1 –http://hadoop.apache.org/common/releases.html (You can run Pig with different versions of Hadoop by setting HADOOP_HOME to point to the directory where you have installed Hadoop. If you do not set HADOOP_HOME, by default Pig will run with the embedded version, currently Hadoop 1.0.0.)
- Java 1.6 – http://java.sun.com/javase/downloads/index.jsp (set JAVA_HOME to the root of your Java installation)
Windows users also need to install Cygwin and the Perl package: http://www.cygwin.com/
Pig Latin provides operators that can help you debug your Pig Latin statements:
- Use the DUMP operator to display results to your terminal screen.
- Use the DESCRIBE operator to review the schema of a relation.
- Use the EXPLAIN operator to view the logical, physical, or map reduce execution plans to compute a relation.
- Use the ILLUSTRATE operator to view the step-by-step execution of a series of statements.
Shortcuts for Debugging Operators
Pig provides shortcuts for the frequently used debugging operators (DUMP, DESCRIBE, EXPLAIN, ILLUSTRATE). These shortcuts can be used in Grunt shell or within pig scripts. Following are the shortcuts supported by pig
- \d alias – shourtcut for DUMP operator. If alias is ignored last defined alias will be used.
- \de alias – shourtcut for DESCRIBE operator. If alias is ignored last defined alias will be used.
- \e alias – shourtcut for EXPLAIN operator. If alias is ignored last defined alias will be used.
- \i alias – shourtcut for ILLUSTRATE operator. If alias is ignored last defined alias will be used.
- \q – To quit grunt shell