faq.rst 14 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318
  1. .. Licensed to the Apache Software Foundation (ASF) under one
  2. or more contributor license agreements. See the NOTICE file
  3. distributed with this work for additional information
  4. regarding copyright ownership. The ASF licenses this file
  5. to you under the Apache License, Version 2.0 (the
  6. "License"); you may not use this file except in compliance
  7. with the License. You may obtain a copy of the License at
  8. .. http://www.apache.org/licenses/LICENSE-2.0
  9. .. Unless required by applicable law or agreed to in writing,
  10. software distributed under the License is distributed on an
  11. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  12. KIND, either express or implied. See the License for the
  13. specific language governing permissions and limitations
  14. under the License.
  15. FAQ
  16. ===
  17. Can I query/join multiple tables at one time?
  18. ---------------------------------------------
  19. Not directly no. A Superset SQLAlchemy datasource can only be a single table
  20. or a view.
  21. When working with tables, the solution would be to materialize
  22. a table that contains all the fields needed for your analysis, most likely
  23. through some scheduled batch process.
  24. A view is a simple logical layer that abstract an arbitrary SQL queries as
  25. a virtual table. This can allow you to join and union multiple tables, and
  26. to apply some transformation using arbitrary SQL expressions. The limitation
  27. there is your database performance as Superset effectively will run a query
  28. on top of your query (view). A good practice may be to limit yourself to
  29. joining your main large table to one or many small tables only, and avoid
  30. using ``GROUP BY`` where possible as Superset will do its own ``GROUP BY`` and
  31. doing the work twice might slow down performance.
  32. Whether you use a table or a view, the important factor is whether your
  33. database is fast enough to serve it in an interactive fashion to provide
  34. a good user experience in Superset.
  35. How BIG can my data source be?
  36. ------------------------------
  37. It can be gigantic! As mentioned above, the main criteria is whether your
  38. database can execute queries and return results in a time frame that is
  39. acceptable to your users. Many distributed databases out there can execute
  40. queries that scan through terabytes in an interactive fashion.
  41. How do I create my own visualization?
  42. -------------------------------------
  43. We are planning on making it easier to add new visualizations to the
  44. framework, in the meantime, we've tagged a few pull requests as
  45. ``example`` to give people examples of how to contribute new
  46. visualizations.
  47. https://github.com/airbnb/superset/issues?q=label%3Aexample+is%3Aclosed
  48. Can I upload and visualize csv data?
  49. ------------------------------------
  50. Yes, using the ``Upload a CSV`` button under the ``Sources`` menu item.
  51. This brings up a form that allows you specify required information.
  52. After creating the table from CSV, it can then be loaded like any
  53. other on the ``Sources -> Tables`` page.
  54. Why are my queries timing out?
  55. ------------------------------
  56. There are many reasons may cause long query timing out.
  57. - For running long query from Sql Lab, by default Superset allows it run as long as 6 hours before it being killed by celery. If you want to increase the time for running query, you can specify the timeout in configuration. For example:
  58. ``SQLLAB_ASYNC_TIME_LIMIT_SEC = 60 * 60 * 6``
  59. - Superset is running on gunicorn web server, which may time out web requests. If you want to increase the default (50), you can specify the timeout when starting the web server with the ``-t`` flag, which is expressed in seconds.
  60. ``superset runserver -t 300``
  61. - If you are seeing timeouts (504 Gateway Time-out) when loading dashboard or explore slice, you are probably behind gateway or proxy server (such as Nginx). If it did not receive a timely response from Superset server (which is processing long queries), these web servers will send 504 status code to clients directly. Superset has a client-side timeout limit to address this issue. If query didn't come back within clint-side timeout (60 seconds by default), Superset will display warning message to avoid gateway timeout message. If you have a longer gateway timeout limit, you can change the timeout settings in ``superset_config.py``:
  62. ``SUPERSET_WEBSERVER_TIMEOUT = 60``
  63. Why is the map not visible in the mapbox visualization?
  64. -------------------------------------------------------
  65. You need to register to mapbox.com, get an API key and configure it as
  66. ``MAPBOX_API_KEY`` in ``superset_config.py``.
  67. How to add dynamic filters to a dashboard?
  68. ------------------------------------------
  69. It's easy: use the ``Filter Box`` widget, build a slice, and add it to your
  70. dashboard.
  71. The ``Filter Box`` widget allows you to define a query to populate dropdowns
  72. that can be used for filtering. To build the list of distinct values, we
  73. run a query, and sort the result by the metric you provide, sorting
  74. descending.
  75. The widget also has a checkbox ``Date Filter``, which enables time filtering
  76. capabilities to your dashboard. After checking the box and refreshing, you'll
  77. see a ``from`` and a ``to`` dropdown show up.
  78. By default, the filtering will be applied to all the slices that are built
  79. on top of a datasource that shares the column name that the filter is based
  80. on. It's also a requirement for that column to be checked as "filterable"
  81. in the column tab of the table editor.
  82. But what about if you don't want certain widgets to get filtered on your
  83. dashboard? You can do that by editing your dashboard, and in the form,
  84. edit the ``JSON Metadata`` field, more specifically the
  85. ``filter_immune_slices`` key, that receives an array of sliceIds that should
  86. never be affected by any dashboard level filtering.
  87. .. code-block:: json
  88. {
  89. "filter_immune_slices": [324, 65, 92],
  90. "expanded_slices": {},
  91. "filter_immune_slice_fields": {
  92. "177": ["country_name", "__time_range"],
  93. "32": ["__time_range"]
  94. },
  95. "timed_refresh_immune_slices": [324]
  96. }
  97. In the json blob above, slices 324, 65 and 92 won't be affected by any
  98. dashboard level filtering.
  99. Now note the ``filter_immune_slice_fields`` key. This one allows you to
  100. be more specific and define for a specific slice_id, which filter fields
  101. should be disregarded.
  102. Note the use of the ``__time_range`` keyword, which is reserved for dealing
  103. with the time boundary filtering mentioned above.
  104. But what happens with filtering when dealing with slices coming from
  105. different tables or databases? If the column name is shared, the filter will
  106. be applied, it's as simple as that.
  107. How to limit the timed refresh on a dashboard?
  108. ----------------------------------------------
  109. By default, the dashboard timed refresh feature allows you to automatically re-query every slice
  110. on a dashboard according to a set schedule. Sometimes, however, you won't want all of the slices
  111. to be refreshed - especially if some data is slow moving, or run heavy queries. To exclude specific
  112. slices from the timed refresh process, add the ``timed_refresh_immune_slices`` key to the dashboard
  113. ``JSON Metadata`` field:
  114. .. code-block:: json
  115. {
  116. "filter_immune_slices": [],
  117. "expanded_slices": {},
  118. "filter_immune_slice_fields": {},
  119. "timed_refresh_immune_slices": [324]
  120. }
  121. In the example above, if a timed refresh is set for the dashboard, then every slice except 324 will
  122. be automatically re-queried on schedule.
  123. Slice refresh will also be staggered over the specified period. You can turn off this staggering
  124. by setting the ``stagger_refresh`` to ``false`` and modify the stagger period by setting
  125. ``stagger_time`` to a value in milliseconds in the ``JSON Metadata`` field:
  126. .. code-block:: json
  127. {
  128. "stagger_refresh": false,
  129. "stagger_time": 2500
  130. }
  131. Here, the entire dashboard will refresh at once if periodic refresh is on. The stagger time of
  132. 2.5 seconds is ignored.
  133. Why does 'flask fab' or superset freezed/hung/not responding when started (my home directory is NFS mounted)?
  134. -------------------------------------------------------------------------------------------------------------
  135. By default, superset creates and uses an sqlite database at ``~/.superset/superset.db``. Sqlite is known to `don't work well if used on NFS`__ due to broken file locking implementation on NFS.
  136. __ https://www.sqlite.org/lockingv3.html
  137. You can override this path using the ``SUPERSET_HOME`` environment variable.
  138. Another work around is to change where superset stores the sqlite database by adding ``SQLALCHEMY_DATABASE_URI = 'sqlite:////new/location/superset.db'`` in superset_config.py (create the file if needed), then adding the directory where superset_config.py lives to PYTHONPATH environment variable (e.g. ``export PYTHONPATH=/opt/logs/sandbox/airbnb/``).
  139. What if the table schema changed?
  140. ---------------------------------
  141. Table schemas evolve, and Superset needs to reflect that. It's pretty common
  142. in the life cycle of a dashboard to want to add a new dimension or metric.
  143. To get Superset to discover your new columns, all you have to do is to
  144. go to ``Menu -> Sources -> Tables``, click the ``edit`` icon next to the
  145. table who's schema has changed, and hit ``Save`` from the ``Detail`` tab.
  146. Behind the scene, the new columns will get merged it. Following this,
  147. you may want to
  148. re-edit the table afterwards to configure the ``Column`` tab, check the
  149. appropriate boxes and save again.
  150. How do I go about developing a new visualization type?
  151. ------------------------------------------------------
  152. Here's an example as a Github PR with comments that describe what the
  153. different sections of the code do:
  154. https://github.com/airbnb/superset/pull/3013
  155. What database engine can I use as a backend for Superset?
  156. ---------------------------------------------------------
  157. To clarify, the *database backend* is an OLTP database used by Superset to store its internal
  158. information like your list of users, slices and dashboard definitions.
  159. Superset is tested using Mysql, Postgresql and Sqlite for its backend. It's recommended you
  160. install Superset on one of these database server for production.
  161. Using a column-store, non-OLTP databases like Vertica, Redshift or Presto as a database backend simply won't work as these databases are not designed for this type of workload. Installation on Oracle, Microsoft SQL Server, or other OLTP databases may work but isn't tested.
  162. Please note that pretty much any databases that have a SqlAlchemy integration should work perfectly fine as a datasource for Superset, just not as the OLTP backend.
  163. How can i configure OAuth authentication and authorization?
  164. -----------------------------------------------------------
  165. You can take a look at this Flask-AppBuilder `configuration example
  166. <https://github.com/dpgaspar/Flask-AppBuilder/blob/master/examples/oauth/config.py>`_.
  167. How can I set a default filter on my dashboard?
  168. -----------------------------------------------
  169. Easy. Simply apply the filter and save the dashboard while the filter
  170. is active.
  171. How do I get Superset to refresh the schema of my table?
  172. --------------------------------------------------------
  173. When adding columns to a table, you can have Superset detect and merge the
  174. new columns in by using the "Refresh Metadata" action in the
  175. ``Source -> Tables`` page. Simply check the box next to the tables
  176. you want the schema refreshed, and click ``Actions -> Refresh Metadata``.
  177. Is there a way to force the use specific colors?
  178. ------------------------------------------------
  179. It is possible on a per-dashboard basis by providing a mapping of
  180. labels to colors in the ``JSON Metadata`` attribute using the
  181. ``label_colors`` key.
  182. .. code-block:: json
  183. {
  184. "label_colors": {
  185. "Girls": "#FF69B4",
  186. "Boys": "#ADD8E6"
  187. }
  188. }
  189. Does Superset work with [insert database engine here]?
  190. ------------------------------------------------------
  191. The community over time has curated a list of databases that work well with
  192. Superset in the :ref:`ref_database_deps` section of the docs. Database
  193. engines not listed in this page may work too. We rely on the
  194. community to contribute to this knowledge base.
  195. .. _SQLAlchemy dialect: https://docs.sqlalchemy.org/en/latest/dialects/
  196. .. _DBAPI driver: https://www.python.org/dev/peps/pep-0249/
  197. For a database engine to be supported in Superset through the
  198. SQLAlchemy connector, it requires having a Python compliant
  199. `SQLAlchemy dialect`_ as well as a
  200. `DBAPI driver`_ defined.
  201. Database that have limited SQL support may
  202. work as well. For instance it's possible to connect
  203. to Druid through the SQLAlchemy connector even though Druid does not support
  204. joins and subqueries. Another key element for a database to be supported is through
  205. the Superset `Database Engine Specification
  206. <https://github.com/apache/incubator-superset/blob/master/superset/db_engine_specs.py>`_
  207. interface. This interface allows for defining database-specific configurations
  208. and logic
  209. that go beyond the SQLAlchemy and DBAPI scope. This includes features like:
  210. * date-related SQL function that allow Superset to fetch different
  211. time granularities when running time-series queries
  212. * whether the engine supports subqueries. If false, Superset may run 2-phase
  213. queries to compensate for the limitation
  214. * methods around processing logs and inferring the percentage of completion
  215. of a query
  216. * technicalities as to how to handle cursors and connections if the driver
  217. is not standard DBAPI
  218. * more, read the code for more details
  219. Beyond the SQLAlchemy connector, it's also possible, though much more
  220. involved, to extend Superset and write
  221. your own connector. The only example of this at the moment is the Druid
  222. connector, which is getting superseded by Druid's growing SQL support and
  223. the recent availability of a DBAPI and SQLAlchemy driver. If the database
  224. you are considering integrating has any kind of of SQL support, it's probably
  225. preferable to go the SQLAlchemy route. Note that for a native connector to
  226. be possible the database needs to have support for running OLAP-type queries
  227. and should be able to things that are typical in basic SQL:
  228. - aggregate data
  229. - apply filters (==, !=, >, <, >=, <=, IN, ...)
  230. - apply HAVING-type filters
  231. - be schema-aware, expose columns and types