Why should you use Apache Hive

Hortonworks Hadoop Hive

This article describes how to connect Tableau to a Hortonworks Hadoop Hive database and how to set up the data source.

requirements

First, collect this connection information:

  • Name of the server hosting the database that you want to connect to

  • Authentication method:

    • No authentication

    • Kerberos

    • User name

    • Username and password

    • Microsoft Azure HDInsight service (from version 10.2.1)

  • The transport options depend on the authentication method selected and can include the following:

  • The credentials depend on the authentication method selected and can include the following:

    • User name

    • password

    • Area

    • Host FQDN

    • Service name

    • HTTP path

  • Do you want to connect to an SSL server?

  • (Optional) Initial SQL statement that is run every time Tableau connects

Driver required

A driver is required for this connector to communicate with the database. The required driver may already be installed on your computer. If the driver is not installed on your computer, Tableau will display a message in the connection dialog with a link to the Download Driver page (Link opens in a new window). There you will find driver links and installation instructions.

Note: Make sure you are using the latest available drivers. For information on getting the latest drivers, see Hortonworks Hadoop Hive (Link opens in a new window) on the Download Tableau Drivers page.

Establishing the connection and setting up the data source

  1. Start Tableau and under Connect, select Hortonworks Hadoop Hive. A comprehensive list of data connections is displayed when you select More under With a server. Then do the following:

    1. Enter the name of the server hosting the database.

    2. Select from the drop-down list Authentication the desired authentication method.

    3. Enter the requested information. The information you are asked for depends on the selected authentication method.

    4. (Optional) Select Initial SQL Dates to specify an SQL command to run at the start of every connection, such as: For example, when you open a workbook, refresh an extract, sign in to Tableau Server, or publish content to Tableau Server. For more information, see Executing Initial SQL.

    5. Select Sign In.

      When connecting to an SSL server, select Require SSL.

      If Tableau cannot connect, verify that your credentials are correct. If you still cannot connect, the computer cannot find the server. Contact your network administrator or database administrator.

  2. On the data sources page, do the following:

    1. (Optional) Select the default data source name at the top of the page, then enter a unique data source name to use in Tableau. For example, you can use a data source naming convention to help other users determine which data source to connect.

    2. Select the search icon from the Scheme drop-down list or type the name of the scheme in the text box, select the search icon, and then select the scheme.

    3. Select the search icon from the Table text box, or type the table name, select the search icon, and then select the table.

    4. Drag the table onto the work area, then click the sheet tab to start your analysis.

      Use custom SQL to connect to a specific query rather than the entire data source. For more information, see Connect to a Custom SQL Query.

      Note: This type of database only supports equality operations (=).

Sign in to a Mac

If you are using Tableau Desktop on a Mac, enter a fully qualified domain name (for example, "mydb.test.ourdomain.lan") instead of a relative domain name (for example, "mydb" or "mydb.test").

Alternatively, you can add the domain to the list of search domains for the Mac computer so that you only need to provide the server name to connect. To update the list of search domains, go to System Preferences> Network> Advanced, then open the DNS tab.

Working with Hadoop Hive data

Working with date / time

Tableau provides standard support for the TIMESTAMP and DATE types. However, if you are storing the date and time data as a string in Hive, the ISO format (YYYY-MM-DD) must be used. You can create a calculated field that uses the DATEPARSE or DATE functions to convert a string to a date or time format. Use the "DATEPARSE ()" function when working with extracts and otherwise the "DATE ​​()" function. For more information, see Date Functions.

For more information on Hive data types, see the dates section on the Apache Hive website (link opens in a new window).

NULL value returned

A null value is returned when, in Tableau 9.0.1 and later and 8.3.5 and later 8.3.x versions, you open a workbook that was created in an earlier version and have date and time data stored as strings in a Hive Contains unsupported format. To resolve this problem, change the field type back to String and create a calculated field using the DATEPARSE () or DATE () functions to convert the date. Use the "DATEPARSE ()" function when working with extracts and otherwise the "DATE ​​()" function.

High latency restrictions

Hive is a batch-oriented system that is not yet able to answer simple queries within a very short time. This limitation makes it difficult to investigate a new data set or an experiment with calculated fields. Some of the newer SQL-on-Hadoop technologies (e.g., Impala from Cloudera and the Stringer project from Hortonworks) have been designed to address this limitation.

Truncated columns in Tableau

The default length for strings in columns for Hortonworks Hadoop Hive is 255 characters. For more information on Hortonworks Hive ODBC driver configuration options, particularly DefaultStringColumnLength, see the Hive ODBC Driver User Guide (Link opens in a new window) from Hortonworks.

See also