Apache Drill is an open source, low-latency query engine for Hadoop that delivers secure, interactive SQL analytics at petabyte scale. With the ability to discover schemas on-the-fly, Drill is a pioneer in delivering self-service data exploration capabilities on data stored in multiple formats in files or NoSQL databases.
- Query languages This layer is responsible for parsing the user’s query and constructing an execution plan. The initial goal is to support the SQL-like language used by Dremel and Google BigQuery. It will also support Full ANSI SQL:2003.
- Low-latency distributed execution engine This is Drill’s heart. It provides the scalability and fault tolerance needed to efficiently query petabytes of data on 10,000 servers. Drill’s execution engine is based on research in distributed execution engines such as Dremel, Dryad, Hyracks, CIEL, Stratosphere, and columnar storage.
- Nested data formats This layer is responsible for supporting various data formats. The initial goal is to support the column-based format used by Dremel. Drill is designed to support schema-based formats such as Protocol Buffers/Dremel, Avro/AVRO-806/Trevni and CSV, and schema-less formats such as JSON, BSON (Binary JSON,) and YAML.
- Scalable data sources This layer is responsible for supporting data sources. The initial focus is to leverage Hadoop as a data source.
The distributed execution engine is written in Java.
- Rapid time-to-value for business analysts SQL specialists and BI analysts can query any dataset—including complex nested data—instantly, versus waiting several weeks for data preparation by IT.
- Efficiency with easy governance for IT IT can avoid unnecessary ETL cycles and schema maintenance activities, but still ensure governance through easy-to-deploy granular access controls.
- Accelerated big data adoption for businesses Organizations can use the existing and large SQL talent base and tools to rapidly discover new business insights from big data.