eBay Inc (NASDAQ:EBAY)’s Platform and Infrastructure team, which is responsible for storing and managing Terabytes of logs that are generated every day from thousands of eBay application servers, has recently announced that it has open sourced Oink, which it described as a self-service solution to Apache Pig.
The Platform and Infrastructure Team at eBay Inc (NASDAQ:EBAY) uses Apache Pig to analyze large data sets which the company collects on its servers. Apache Pig uses a high-level language for expressing data analysis programs along with the infrastructure required for evaluating these programs. Apart from Pig, the team uses Apache Hadoop, which offers them a variety of tools to search and view logs and to generate reports on application behavior.
Since Apache Pig is primarily used through the command line to spawn jobs, this made it difficult for the Platform and Infrastructure Team at eBay Inc (NASDAQ:EBAY) to use it as the cluster that housed the application logs was shared with other teams creating issues of scalability, governance and change management for the team.
Oink solved the problem for the team by allowing execution of Pig requests through a REST (Representational State Transfer) interface and by enabling users to register jars, view Pig request output, view the status of Pig requests, and cancel a running Pig request. The eBay Inc (NASDAQ:EBAY) team used a patch found in PIG-3866 to introduce new capabilities in Oink through which Oink runs as a servlet inside a web container and allows the users to run multiple requests in parallel within a single JVM instance.
The Platform and Infrastructure Team at eBay Inc (NASDAQ:EBAY) has been able to on-board 100-plus different use cases onto its cluster, with Oink. It has also made running Pig Jobs completely automatic and the team does not need to manually intervene in the 6000 Pig Jobs it runs every day.