installation.rst 58 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101111121131141151161171181191201211221231241251261271281291301311321331341351361371381391401411421431441451461471481491501511521531541551561571581591601611621631641651661671681691701711721731741751761771781791801811821831841851861871881891901911921931941951961971981992002012022032042052062072082092102112122132142152162172182192202212222232242252262272282292302312322332342352362372382392402412422432442452462472482492502512522532542552562572582592602612622632642652662672682692702712722732742752762772782792802812822832842852862872882892902912922932942952962972982993003013023033043053063073083093103113123133143153163173183193203213223233243253263273283293303313323333343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434444454464474484494504514524534544554564574584594604614624634644654664674684694704714724734744754764774784794804814824834844854864874884894904914924934944954964974984995005015025035045055065075085095105115125135145155165175185195205215225235245255265275285295305315325335345355365375385395405415425435445455465475485495505515525535545555565575585595605615625635645655665675685695705715725735745755765775785795805815825835845855865875885895905915925935945955965975985996006016026036046056066076086096106116126136146156166176186196206216226236246256266276286296306316326336346356366376386396406416426436446456466476486496506516526536546556566576586596606616626636646656666676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767777787797807817827837847857867877887897907917927937947957967977987998008018028038048058068078088098108118128138148158168178188198208218228238248258268278288298308318328338348358368378388398408418428438448458468478488498508518528538548558568578588598608618628638648658668678688698708718728738748758768778788798808818828838848858868878888898908918928938948958968978988999009019029039049059069079089099109119129139149159169179189199209219229239249259269279289299309319329339349359369379389399409419429439449459469479489499509519529539549559569579589599609619629639649659669679689699709719729739749759769779789799809819829839849859869879889899909919929939949959969979989991000100110021003100410051006100710081009101010111012101310141015101610171018101910201021102210231024102510261027102810291030103110321033103410351036103710381039104010411042104310441045104610471048104910501051105210531054105510561057105810591060106110621063106410651066106710681069107010711072107310741075107610771078107910801081108210831084108510861087108810891090109110921093109410951096109710981099110011011102110311041105110611071108110911101111111211131114111511161117111811191120112111221123112411251126112711281129113011311132113311341135113611371138113911401141114211431144114511461147114811491150115111521153115411551156115711581159116011611162116311641165116611671168116911701171117211731174117511761177117811791180118111821183118411851186118711881189119011911192119311941195119611971198119912001201120212031204120512061207120812091210121112121213121412151216121712181219122012211222122312241225122612271228122912301231123212331234123512361237123812391240124112421243124412451246124712481249125012511252125312541255125612571258125912601261126212631264126512661267126812691270127112721273127412751276127712781279128012811282128312841285128612871288128912901291129212931294129512961297129812991300130113021303130413051306130713081309131013111312131313141315131613171318131913201321132213231324132513261327132813291330133113321333133413351336133713381339134013411342134313441345134613471348134913501351135213531354135513561357135813591360136113621363136413651366136713681369137013711372137313741375137613771378137913801381138213831384138513861387138813891390139113921393139413951396139713981399140014011402140314041405140614071408140914101411
  1. .. Licensed to the Apache Software Foundation (ASF) under one
  2. or more contributor license agreements. See the NOTICE file
  3. distributed with this work for additional information
  4. regarding copyright ownership. The ASF licenses this file
  5. to you under the Apache License, Version 2.0 (the
  6. "License"); you may not use this file except in compliance
  7. with the License. You may obtain a copy of the License at
  8. .. http://www.apache.org/licenses/LICENSE-2.0
  9. .. Unless required by applicable law or agreed to in writing,
  10. software distributed under the License is distributed on an
  11. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  12. KIND, either express or implied. See the License for the
  13. specific language governing permissions and limitations
  14. under the License.
  15. Installation & Configuration
  16. ============================
  17. Getting Started
  18. ---------------
  19. Superset has deprecated support for Python ``2.*`` and supports
  20. only ``~=3.6`` to take advantage of the newer Python features and reduce
  21. the burden of supporting previous versions. We run our test suite
  22. against ``3.6``, but ``3.7`` is fully supported as well.
  23. Cloud-native!
  24. -------------
  25. Superset is designed to be highly available. It is
  26. "cloud-native" as it has been designed scale out in large,
  27. distributed environments, and works well inside containers.
  28. While you can easily
  29. test drive Superset on a modest setup or simply on your laptop,
  30. there's virtually no limit around scaling out the platform.
  31. Superset is also cloud-native in the sense that it is
  32. flexible and lets you choose your web server (Gunicorn, Nginx, Apache),
  33. your metadata database engine (MySQL, Postgres, MariaDB, ...),
  34. your message queue (Redis, RabbitMQ, SQS, ...),
  35. your results backend (S3, Redis, Memcached, ...), your caching layer
  36. (Memcached, Redis, ...), works well with services like NewRelic, StatsD and
  37. DataDog, and has the ability to run analytic workloads against
  38. most popular database technologies.
  39. Superset is battle tested in large environments with hundreds
  40. of concurrent users. Airbnb's production environment runs inside
  41. Kubernetes and serves 600+ daily active users viewing over 100K charts a
  42. day.
  43. The Superset web server and the Superset Celery workers (optional)
  44. are stateless, so you can scale out by running on as many servers
  45. as needed.
  46. Start with Docker
  47. -----------------
  48. .. note ::
  49. The Docker-related files and documentation are actively maintained and
  50. managed by the core committers working on the project. Help and contributions
  51. around Docker are welcomed!
  52. If you know docker, then you're lucky, we have shortcut road for you to
  53. initialize development environment: ::
  54. git clone https://github.com/apache/incubator-superset/
  55. cd incubator-superset
  56. # you can run this command everytime you need to start superset now:
  57. docker-compose up
  58. After several minutes for superset initialization to finish, you can open
  59. a browser and view `http://localhost:8088` to start your journey.
  60. From there, the container server will reload on modification of the superset python
  61. and javascript source code.
  62. Don't forget to reload the page to take the new frontend into account though.
  63. See also `CONTRIBUTING.md#building <https://github.com/apache/incubator-superset/blob/master/CONTRIBUTING.md#building>`_,
  64. for alternative way of serving the frontend.
  65. It is currently not recommended to run docker-compose in production.
  66. If you are attempting to build on a Mac and it exits with 137 you need to increase your docker resources.
  67. OSX instructions: https://docs.docker.com/docker-for-mac/#advanced (Search for memory)
  68. Or if you're curious and want to install superset from bottom up, then go ahead.
  69. See also `docker/README.md <https://github.com/apache/incubator-superset/blob/master/docker/README.md>`_
  70. OS dependencies
  71. ---------------
  72. Superset stores database connection information in its metadata database.
  73. For that purpose, we use the ``cryptography`` Python library to encrypt
  74. connection passwords. Unfortunately, this library has OS level dependencies.
  75. You may want to attempt the next step
  76. ("Superset installation and initialization") and come back to this step if
  77. you encounter an error.
  78. Here's how to install them:
  79. For **Debian** and **Ubuntu**, the following command will ensure that
  80. the required dependencies are installed: ::
  81. sudo apt-get install build-essential libssl-dev libffi-dev python-dev python-pip libsasl2-dev libldap2-dev
  82. **Ubuntu 18.04** If you have python3.6 installed alongside with python2.7, as is default on **Ubuntu 18.04 LTS**, run this command also: ::
  83. sudo apt-get install build-essential libssl-dev libffi-dev python3.6-dev python-pip libsasl2-dev libldap2-dev
  84. otherwise build for ``cryptography`` fails.
  85. For **Fedora** and **RHEL-derivatives**, the following command will ensure
  86. that the required dependencies are installed: ::
  87. sudo yum upgrade python-setuptools
  88. sudo yum install gcc gcc-c++ libffi-devel python-devel python-pip python-wheel openssl-devel cyrus-sasl-devel openldap-devel
  89. **Mac OS X** If possible, you should upgrade to the latest version of OS X as issues are more likely to be resolved for that version.
  90. You *will likely need* the latest version of XCode available for your installed version of OS X. You should also install
  91. the XCode command line tools: ::
  92. xcode-select --install
  93. System python is not recommended. Homebrew's python also ships with pip: ::
  94. brew install pkg-config libffi openssl python
  95. env LDFLAGS="-L$(brew --prefix openssl)/lib" CFLAGS="-I$(brew --prefix openssl)/include" pip install cryptography==2.4.2
  96. **Windows** isn't officially supported at this point, but if you want to
  97. attempt it, download `get-pip.py <https://bootstrap.pypa.io/get-pip.py>`_, and run ``python get-pip.py`` which may need admin access. Then run the following: ::
  98. C:\> pip install cryptography
  99. # You may also have to create C:\Temp
  100. C:\> md C:\Temp
  101. Python virtualenv
  102. -----------------
  103. It is recommended to install Superset inside a virtualenv. Python 3 already ships virtualenv.
  104. But if it's not installed in your environment for some reason, you can install it
  105. via the package for your operating systems, otherwise you can install from pip: ::
  106. pip install virtualenv
  107. You can create and activate a virtualenv by: ::
  108. # virtualenv is shipped in Python 3.6+ as venv instead of pyvenv.
  109. # See https://docs.python.org/3.6/library/venv.html
  110. python3 -m venv venv
  111. . venv/bin/activate
  112. On Windows the syntax for activating it is a bit different: ::
  113. venv\Scripts\activate
  114. Once you activated your virtualenv everything you are doing is confined inside the virtualenv.
  115. To exit a virtualenv just type ``deactivate``.
  116. Python's setup tools and pip
  117. ----------------------------
  118. Put all the chances on your side by getting the very latest ``pip``
  119. and ``setuptools`` libraries.::
  120. pip install --upgrade setuptools pip
  121. Superset installation and initialization
  122. ----------------------------------------
  123. Follow these few simple steps to install Superset.::
  124. # Install superset
  125. pip install apache-superset
  126. # Initialize the database
  127. superset db upgrade
  128. # Create an admin user (you will be prompted to set a username, first and last name before setting a password)
  129. $ export FLASK_APP=superset
  130. superset fab create-admin
  131. # Load some data to play with
  132. superset load_examples
  133. # Create default roles and permissions
  134. superset init
  135. # To start a development web server on port 8088, use -p to bind to another port
  136. superset run -p 8088 --with-threads --reload --debugger
  137. After installation, you should be able to point your browser to the right
  138. hostname:port `http://localhost:8088 <http://localhost:8088>`_, login using
  139. the credential you entered while creating the admin account, and navigate to
  140. `Menu -> Admin -> Refresh Metadata`. This action should bring in all of
  141. your datasources for Superset to be aware of, and they should show up in
  142. `Menu -> Datasources`, from where you can start playing with your data!
  143. A proper WSGI HTTP Server
  144. -------------------------
  145. While you can setup Superset to run on Nginx or Apache, many use
  146. Gunicorn, preferably in **async mode**, which allows for impressive
  147. concurrency even and is fairly easy to install and configure. Please
  148. refer to the
  149. documentation of your preferred technology to set up this Flask WSGI
  150. application in a way that works well in your environment. Here's an **async**
  151. setup known to work well in production: ::
  152.  gunicorn \
  153. -w 10 \
  154. -k gevent \
  155. --timeout 120 \
  156. -b 0.0.0.0:6666 \
  157. --limit-request-line 0 \
  158. --limit-request-field_size 0 \
  159. --statsd-host localhost:8125 \
  160. "superset.app:create_app()"
  161. Refer to the
  162. `Gunicorn documentation <https://docs.gunicorn.org/en/stable/design.html>`_
  163. for more information.
  164. Note that the development web
  165. server (`superset run` or `flask run`) is not intended for production use.
  166. If not using gunicorn, you may want to disable the use of flask-compress
  167. by setting `ENABLE_FLASK_COMPRESS = False` in your `superset_config.py`
  168. Flask-AppBuilder Permissions
  169. ----------------------------
  170. By default, every time the Flask-AppBuilder (FAB) app is initialized the
  171. permissions and views are added automatically to the backend and associated with
  172. the ‘Admin’ role. The issue, however, is when you are running multiple concurrent
  173. workers this creates a lot of contention and race conditions when defining
  174. permissions and views.
  175. To alleviate this issue, the automatic updating of permissions can be disabled
  176. by setting `FAB_UPDATE_PERMS = False` (defaults to True).
  177. In a production environment initialization could take on the following form:
  178. superset init
  179. gunicorn -w 10 ... superset:app
  180. Configuration behind a load balancer
  181. ------------------------------------
  182. If you are running superset behind a load balancer or reverse proxy (e.g. NGINX
  183. or ELB on AWS), you may need to utilise a healthcheck endpoint so that your
  184. load balancer knows if your superset instance is running. This is provided
  185. at ``/health`` which will return a 200 response containing "OK" if the
  186. the webserver is running.
  187. If the load balancer is inserting X-Forwarded-For/X-Forwarded-Proto headers, you
  188. should set `ENABLE_PROXY_FIX = True` in the superset config file to extract and use
  189. the headers.
  190. In case that the reverse proxy is used for providing ssl encryption,
  191. an explicit definition of the `X-Forwarded-Proto` may be required.
  192. For the Apache webserver this can be set as follows: ::
  193. RequestHeader set X-Forwarded-Proto "https"
  194. Configuration
  195. -------------
  196. To configure your application, you need to create a file (module)
  197. ``superset_config.py`` and make sure it is in your PYTHONPATH. Here are some
  198. of the parameters you can copy / paste in that configuration module: ::
  199. #---------------------------------------------------------
  200. # Superset specific config
  201. #---------------------------------------------------------
  202. ROW_LIMIT = 5000
  203. SUPERSET_WEBSERVER_PORT = 8088
  204. #---------------------------------------------------------
  205. #---------------------------------------------------------
  206. # Flask App Builder configuration
  207. #---------------------------------------------------------
  208. # Your App secret key
  209. SECRET_KEY = '\2\1thisismyscretkey\1\2\e\y\y\h'
  210. # The SQLAlchemy connection string to your database backend
  211. # This connection defines the path to the database that stores your
  212. # superset metadata (slices, connections, tables, dashboards, ...).
  213. # Note that the connection information to connect to the datasources
  214. # you want to explore are managed directly in the web UI
  215. SQLALCHEMY_DATABASE_URI = 'sqlite:////path/to/superset.db'
  216. # Flask-WTF flag for CSRF
  217. WTF_CSRF_ENABLED = True
  218. # Add endpoints that need to be exempt from CSRF protection
  219. WTF_CSRF_EXEMPT_LIST = []
  220. # A CSRF token that expires in 1 year
  221. WTF_CSRF_TIME_LIMIT = 60 * 60 * 24 * 365
  222. # Set this API key to enable Mapbox visualizations
  223. MAPBOX_API_KEY = ''
  224. All the parameters and default values defined in
  225. https://github.com/apache/incubator-superset/blob/master/superset/config.py
  226. can be altered in your local ``superset_config.py`` .
  227. Administrators will want to
  228. read through the file to understand what can be configured locally
  229. as well as the default values in place.
  230. Since ``superset_config.py`` acts as a Flask configuration module, it
  231. can be used to alter the settings Flask itself,
  232. as well as Flask extensions like ``flask-wtf``, ``flask-cache``,
  233. ``flask-migrate``, and ``flask-appbuilder``. Flask App Builder, the web
  234. framework used by Superset offers many configuration settings. Please consult
  235. the `Flask App Builder Documentation
  236. <https://flask-appbuilder.readthedocs.org/en/latest/config.html>`_
  237. for more information on how to configure it.
  238. Make sure to change:
  239. * *SQLALCHEMY_DATABASE_URI*, by default it is stored at *~/.superset/superset.db*
  240. * *SECRET_KEY*, to a long random string
  241. In case you need to exempt endpoints from CSRF, e.g. you are running a custom
  242. auth postback endpoint, you can add them to *WTF_CSRF_EXEMPT_LIST*
  243. WTF_CSRF_EXEMPT_LIST = ['']
  244. .. _ref_database_deps:
  245. Database dependencies
  246. ---------------------
  247. Superset does not ship bundled with connectivity to databases, except
  248. for Sqlite, which is part of the Python standard library.
  249. You'll need to install the required packages for the database you
  250. want to use as your metadata database as well as the packages needed to
  251. connect to the databases you want to access through Superset.
  252. Here's a list of some of the recommended packages.
  253. +------------------+---------------------------------------+-------------------------------------------------+
  254. | database | pypi package | SQLAlchemy URI prefix |
  255. +==================+=======================================+=================================================+
  256. | Amazon Athena | ``pip install "PyAthenaJDBC>1.0.9"`` | ``awsathena+jdbc://`` |
  257. +------------------+---------------------------------------+-------------------------------------------------+
  258. | Amazon Athena | ``pip install "PyAthena>1.2.0"`` | ``awsathena+rest://`` |
  259. +------------------+---------------------------------------+-------------------------------------------------+
  260. | Amazon Redshift | ``pip install sqlalchemy-redshift`` | ``redshift+psycopg2://`` |
  261. +------------------+---------------------------------------+-------------------------------------------------+
  262. | Apache Drill | ``pip install sqlalchemy-drill`` | For the REST API:`` |
  263. | | | ``drill+sadrill://`` |
  264. | | | For JDBC |
  265. | | | ``drill+jdbc://`` |
  266. +------------------+---------------------------------------+-------------------------------------------------+
  267. | Apache Druid | ``pip install pydruid`` | ``druid://`` |
  268. +------------------+---------------------------------------+-------------------------------------------------+
  269. | Apache Hive | ``pip install pyhive`` | ``hive://`` |
  270. +------------------+---------------------------------------+-------------------------------------------------+
  271. | Apache Impala | ``pip install impyla`` | ``impala://`` |
  272. +------------------+---------------------------------------+-------------------------------------------------+
  273. | Apache Kylin | ``pip install kylinpy`` | ``kylin://`` |
  274. +------------------+---------------------------------------+-------------------------------------------------+
  275. | Apache Pinot | ``pip install pinotdb`` | ``pinot+http://CONTROLLER:5436/`` |
  276. | | | ``query?server=http://CONTROLLER:5983/`` |
  277. +------------------+---------------------------------------+-------------------------------------------------+
  278. | Apache Spark SQL | ``pip install pyhive`` | ``jdbc+hive://`` |
  279. +------------------+---------------------------------------+-------------------------------------------------+
  280. | BigQuery | ``pip install pybigquery`` | ``bigquery://`` |
  281. +------------------+---------------------------------------+-------------------------------------------------+
  282. | ClickHouse | ``pip install sqlalchemy-clickhouse`` | |
  283. +------------------+---------------------------------------+-------------------------------------------------+
  284. | CockroachDB | ``pip install cockroachdb`` | ``cockroachdb://`` |
  285. +------------------+---------------------------------------+-------------------------------------------------+
  286. | Dremio | ``pip install sqlalchemy_dremio`` | ``dremio://user:pwd@host:31010/`` |
  287. +------------------+---------------------------------------+-------------------------------------------------+
  288. | Elasticsearch | ``pip install elasticsearch-dbapi`` | ``elasticsearch+http://`` |
  289. +------------------+---------------------------------------+-------------------------------------------------+
  290. | Exasol | ``pip install sqlalchemy-exasol`` | ``exa+pyodbc://`` |
  291. +------------------+---------------------------------------+-------------------------------------------------+
  292. | Google Sheets | ``pip install gsheetsdb`` | ``gsheets://`` |
  293. +------------------+---------------------------------------+-------------------------------------------------+
  294. | IBM Db2 | ``pip install ibm_db_sa`` | ``db2+ibm_db://`` |
  295. +------------------+---------------------------------------+-------------------------------------------------+
  296. | MySQL | ``pip install mysqlclient`` | ``mysql://`` |
  297. +------------------+---------------------------------------+-------------------------------------------------+
  298. | Oracle | ``pip install cx_Oracle`` | ``oracle://`` |
  299. +------------------+---------------------------------------+-------------------------------------------------+
  300. | PostgreSQL | ``pip install psycopg2`` | ``postgresql+psycopg2://`` |
  301. +------------------+---------------------------------------+-------------------------------------------------+
  302. | Presto | ``pip install pyhive`` | ``presto://`` |
  303. +------------------+---------------------------------------+-------------------------------------------------+
  304. | Snowflake | ``pip install snowflake-sqlalchemy`` | ``snowflake://`` |
  305. +------------------+---------------------------------------+-------------------------------------------------+
  306. | SQLite | | ``sqlite://`` |
  307. +------------------+---------------------------------------+-------------------------------------------------+
  308. | SQL Server | ``pip install pymssql`` | ``mssql://`` |
  309. +------------------+---------------------------------------+-------------------------------------------------+
  310. | Teradata | ``pip install sqlalchemy-teradata`` | ``teradata://`` |
  311. +------------------+---------------------------------------+-------------------------------------------------+
  312. | Vertica | ``pip install | ``vertica+vertica_python://`` |
  313. | | sqlalchemy-vertica-python`` | |
  314. +------------------+---------------------------------------+-------------------------------------------------+
  315. | Hana | ``pip install hdbcli sqlalchemy-hana``| ``hana://`` |
  316. | | or | |
  317. | | ``pip install apache-superset[hana]`` | |
  318. +------------------+---------------------------------------+-------------------------------------------------+
  319. Note that many other databases are supported, the main criteria being the
  320. existence of a functional SqlAlchemy dialect and Python driver. Googling
  321. the keyword ``sqlalchemy`` in addition of a keyword that describes the
  322. database you want to connect to should get you to the right place.
  323. Hana
  324. ------------
  325. The connection string for Hana looks like this ::
  326. hana://{username}:{password}@{host}:{port}
  327. (AWS) Athena
  328. ------------
  329. The connection string for Athena looks like this ::
  330. awsathena+jdbc://{aws_access_key_id}:{aws_secret_access_key}@athena.{region_name}.amazonaws.com/{schema_name}?s3_staging_dir={s3_staging_dir}&...
  331. Where you need to escape/encode at least the s3_staging_dir, i.e., ::
  332. s3://... -> s3%3A//...
  333. You can also use `PyAthena` library(no java required) like this ::
  334. awsathena+rest://{aws_access_key_id}:{aws_secret_access_key}@athena.{region_name}.amazonaws.com/{schema_name}?s3_staging_dir={s3_staging_dir}&...
  335. See `PyAthena <https://github.com/laughingman7743/PyAthena#sqlalchemy>`_.
  336. (Google) BigQuery
  337. -----------------
  338. The connection string for BigQuery looks like this ::
  339. bigquery://{project_id}
  340. Additionally, you will need to configure authentication via a
  341. Service Account. Create your Service Account via the Google
  342. Cloud Platform control panel, provide it access to the appropriate
  343. BigQuery datasets, and download the JSON configuration file
  344. for the service account. In Superset, Add a JSON blob to
  345. the "Secure Extra" field in the database configuration page
  346. with the following format ::
  347. {
  348. "credentials_info": <contents of credentials JSON file>
  349. }
  350. The resulting file should have this structure ::
  351. {
  352. "credentials_info": {
  353. "type": "service_account",
  354. "project_id": "...",
  355. "private_key_id": "...",
  356. "private_key": "...",
  357. "client_email": "...",
  358. "client_id": "...",
  359. "auth_uri": "...",
  360. "token_uri": "...",
  361. "auth_provider_x509_cert_url": "...",
  362. "client_x509_cert_url": "...",
  363. }
  364. }
  365. You should then be able to connect to your BigQuery datasets.
  366. To be able to upload data, e.g. sample data, the python library `pandas_gbq` is required.
  367. Elasticsearch
  368. -------------
  369. The connection string for Elasticsearch looks like this ::
  370. elasticsearch+http://{user}:{password}@{host}:9200/
  371. Using HTTPS ::
  372. elasticsearch+https://{user}:{password}@{host}:9200/
  373. Elasticsearch as a default limit of 10000 rows, so you can increase this limit on your cluster
  374. or set Superset's row limit on config ::
  375. ROW_LIMIT = 10000
  376. You can query multiple indices on SQLLab for example ::
  377. select timestamp, agent from "logstash-*"
  378. But, to use visualizations for multiple indices you need to create an alias index on your cluster ::
  379. POST /_aliases
  380. {
  381. "actions" : [
  382. { "add" : { "index" : "logstash-**", "alias" : "logstash_all" } }
  383. ]
  384. }
  385. Then register your table with the ``alias`` name ``logstasg_all``
  386. Snowflake
  387. ---------
  388. The connection string for Snowflake looks like this ::
  389. snowflake://{user}:{password}@{account}.{region}/{database}?role={role}&warehouse={warehouse}
  390. The schema is not necessary in the connection string, as it is defined per table/query.
  391. The role and warehouse can be omitted if defaults are defined for the user, i.e.
  392. snowflake://{user}:{password}@{account}.{region}/{database}
  393. Make sure the user has privileges to access and use all required
  394. databases/schemas/tables/views/warehouses, as the Snowflake SQLAlchemy engine does
  395. not test for user rights during engine creation.
  396. See `Snowflake SQLAlchemy <https://github.com/snowflakedb/snowflake-sqlalchemy>`_.
  397. Teradata
  398. ---------
  399. The connection string for Teradata looks like this ::
  400. teradata://{user}:{password}@{host}
  401. *Note*: Its required to have Teradata ODBC drivers installed and environment variables configured for proper work of sqlalchemy dialect. Teradata ODBC Drivers available here: https://downloads.teradata.com/download/connectivity/odbc-driver/linux
  402. Required environment variables: ::
  403. export ODBCINI=/.../teradata/client/ODBC_64/odbc.ini
  404. export ODBCINST=/.../teradata/client/ODBC_64/odbcinst.ini
  405. See `Teradata SQLAlchemy <https://github.com/Teradata/sqlalchemy-teradata>`_.
  406. Apache Drill
  407. ------------
  408. At the time of writing, the SQLAlchemy Dialect is not available on pypi and must be downloaded here:
  409. `SQLAlchemy Drill <https://github.com/JohnOmernik/sqlalchemy-drill>`_
  410. Alternatively, you can install it completely from the command line as follows: ::
  411. git clone https://github.com/JohnOmernik/sqlalchemy-drill
  412. cd sqlalchemy-drill
  413. python3 setup.py install
  414. Once that is done, you can connect to Drill in two ways, either via the REST interface or by JDBC. If you are connecting via JDBC, you must have the
  415. Drill JDBC Driver installed.
  416. The basic connection string for Drill looks like this ::
  417. drill+sadrill://{username}:{password}@{host}:{port}/{storage_plugin}?use_ssl=True
  418. If you are using JDBC to connect to Drill, the connection string looks like this: ::
  419. drill+jdbc://{username}:{password}@{host}:{port}/{storage_plugin}
  420. For a complete tutorial about how to use Apache Drill with Superset, see this tutorial:
  421. `Visualize Anything with Superset and Drill <http://thedataist.com/visualize-anything-with-superset-and-drill/>`_
  422. Caching
  423. -------
  424. Superset uses `Flask-Cache <https://pythonhosted.org/Flask-Cache/>`_ for
  425. caching purpose. Configuring your caching backend is as easy as providing
  426. a ``CACHE_CONFIG``, constant in your ``superset_config.py`` that
  427. complies with the Flask-Cache specifications.
  428. Flask-Cache supports multiple caching backends (Redis, Memcached,
  429. SimpleCache (in-memory), or the local filesystem). If you are going to use
  430. Memcached please use the `pylibmc` client library as `python-memcached` does
  431. not handle storing binary data correctly. If you use Redis, please install
  432. the `redis <https://pypi.python.org/pypi/redis>`_ Python package: ::
  433. pip install redis
  434. For setting your timeouts, this is done in the Superset metadata and goes
  435. up the "timeout searchpath", from your slice configuration, to your
  436. data source's configuration, to your database's and ultimately falls back
  437. into your global default defined in ``CACHE_CONFIG``.
  438. .. code-block:: python
  439. CACHE_CONFIG = {
  440. 'CACHE_TYPE': 'redis',
  441. 'CACHE_DEFAULT_TIMEOUT': 60 * 60 * 24, # 1 day default (in secs)
  442. 'CACHE_KEY_PREFIX': 'superset_results',
  443. 'CACHE_REDIS_URL': 'redis://localhost:6379/0',
  444. }
  445. It is also possible to pass a custom cache initialization function in the
  446. config to handle additional caching use cases. The function must return an
  447. object that is compatible with the `Flask-Cache <https://pythonhosted.org/Flask-Cache/>`_ API.
  448. .. code-block:: python
  449. from custom_caching import CustomCache
  450. def init_cache(app):
  451. """Takes an app instance and returns a custom cache backend"""
  452. config = {
  453. 'CACHE_DEFAULT_TIMEOUT': 60 * 60 * 24, # 1 day default (in secs)
  454. 'CACHE_KEY_PREFIX': 'superset_results',
  455. }
  456. return CustomCache(app, config)
  457. CACHE_CONFIG = init_cache
  458. Superset has a Celery task that will periodically warm up the cache based on
  459. different strategies. To use it, add the following to the `CELERYBEAT_SCHEDULE`
  460. section in `config.py`:
  461. .. code-block:: python
  462. CELERYBEAT_SCHEDULE = {
  463. 'cache-warmup-hourly': {
  464. 'task': 'cache-warmup',
  465. 'schedule': crontab(minute=0, hour='*'), # hourly
  466. 'kwargs': {
  467. 'strategy_name': 'top_n_dashboards',
  468. 'top_n': 5,
  469. 'since': '7 days ago',
  470. },
  471. },
  472. }
  473. This will cache all the charts in the top 5 most popular dashboards every hour.
  474. For other strategies, check the `superset/tasks/cache.py` file.
  475. Deeper SQLAlchemy integration
  476. -----------------------------
  477. It is possible to tweak the database connection information using the
  478. parameters exposed by SQLAlchemy. In the ``Database`` edit view, you will
  479. find an ``extra`` field as a ``JSON`` blob.
  480. .. image:: _static/images/tutorial/add_db.png
  481. :scale: 30 %
  482. This JSON string contains extra configuration elements. The ``engine_params``
  483. object gets unpacked into the
  484. `sqlalchemy.create_engine <https://docs.sqlalchemy.org/en/latest/core/engines.html#sqlalchemy.create_engine>`_ call,
  485. while the ``metadata_params`` get unpacked into the
  486. `sqlalchemy.MetaData <https://docs.sqlalchemy.org/en/rel_1_2/core/metadata.html#sqlalchemy.schema.MetaData>`_ call. Refer to the SQLAlchemy docs for more information.
  487. .. note:: If your using CTAS on SQLLab and PostgreSQL
  488. take a look at :ref:`ref_ctas_engine_config` for specific ``engine_params``.
  489. Schemas (Postgres & Redshift)
  490. -----------------------------
  491. Postgres and Redshift, as well as other databases,
  492. use the concept of **schema** as a logical entity
  493. on top of the **database**. For Superset to connect to a specific schema,
  494. there's a **schema** parameter you can set in the table form.
  495. External Password store for SQLAlchemy connections
  496. --------------------------------------------------
  497. It is possible to use an external store for you database passwords. This is
  498. useful if you a running a custom secret distribution framework and do not wish
  499. to store secrets in Superset's meta database.
  500. Example:
  501. Write a function that takes a single argument of type ``sqla.engine.url`` and returns
  502. the password for the given connection string. Then set ``SQLALCHEMY_CUSTOM_PASSWORD_STORE``
  503. in your config file to point to that function. ::
  504. def example_lookup_password(url):
  505. secret = <<get password from external framework>>
  506. return 'secret'
  507. SQLALCHEMY_CUSTOM_PASSWORD_STORE = example_lookup_password
  508. A common pattern is to use environment variables to make secrets available.
  509. ``SQLALCHEMY_CUSTOM_PASSWORD_STORE`` can also be used for that purpose. ::
  510. def example_password_as_env_var(url):
  511. # assuming the uri looks like
  512. # mysql://localhost?superset_user:{SUPERSET_PASSWORD}
  513. return url.password.format(os.environ)
  514. SQLALCHEMY_CUSTOM_PASSWORD_STORE = example_password_as_env_var
  515. SSL Access to databases
  516. -----------------------
  517. This example worked with a MySQL database that requires SSL. The configuration
  518. may differ with other backends. This is what was put in the ``extra``
  519. parameter ::
  520. {
  521. "metadata_params": {},
  522. "engine_params": {
  523. "connect_args":{
  524. "sslmode":"require",
  525. "sslrootcert": "/path/to/my/pem"
  526. }
  527. }
  528. }
  529. Druid
  530. -----
  531. * From the UI, enter the information about your clusters in the
  532. `Sources -> Druid Clusters` menu by hitting the + sign.
  533. * Once the Druid cluster connection information is entered, hit the
  534. `Sources -> Refresh Druid Metadata` menu item to populate
  535. * Navigate to your datasources
  536. Note that you can run the ``superset refresh_druid`` command to refresh the
  537. metadata from your Druid cluster(s)
  538. Dremio
  539. ------
  540. Install the following dependencies to connect to Dremio:
  541. * Dremio SQLAlchemy: ``pip install sqlalchemy_dremio``
  542. * Dremio's ODBC driver: https://www.dremio.com/drivers/
  543. Example SQLAlchemy URI: ``dremio://dremio:dremio123@localhost:31010/dremio``
  544. Presto
  545. ------
  546. By default Superset assumes the most recent version of Presto is being used when
  547. querying the datasource. If you're using an older version of presto, you can configure
  548. it in the ``extra`` parameter::
  549. {
  550. "version": "0.123"
  551. }
  552. Exasol
  553. ---------
  554. The connection string for Exasol looks like this ::
  555. exa+pyodbc://{user}:{password}@{host}
  556. *Note*: It's required to have Exasol ODBC drivers installed for the sqlalchemy dialect to work properly. Exasol ODBC Drivers available are here: https://www.exasol.com/portal/display/DOWNLOAD/Exasol+Download+Section
  557. Example config (odbcinst.ini can be left empty) ::
  558. $ cat $/.../path/to/odbc.ini
  559. [EXAODBC]
  560. DRIVER = /.../path/to/driver/EXASOL_driver.so
  561. EXAHOST = host:8563
  562. EXASCHEMA = main
  563. See `SQLAlchemy for Exasol <https://github.com/blue-yonder/sqlalchemy_exasol>`_.
  564. CORS
  565. ----
  566. The extra CORS Dependency must be installed:
  567. .. code-block:: text
  568. pip install apache-superset[cors]
  569. The following keys in `superset_config.py` can be specified to configure CORS:
  570. * ``ENABLE_CORS``: Must be set to True in order to enable CORS
  571. * ``CORS_OPTIONS``: options passed to Flask-CORS (`documentation <https://flask-cors.corydolphin.com/en/latest/api.html#extension>`)
  572. Domain Sharding
  573. ---------------
  574. Chrome allows up to 6 open connections per domain at a time. When there are more
  575. than 6 slices in dashboard, a lot of time fetch requests are queued up and wait for
  576. next available socket. `PR 5039 <https://github.com/apache/incubator-superset/pull/5039>`_ adds domain sharding to Superset,
  577. and this feature will be enabled by configuration only (by default Superset
  578. doesn't allow cross-domain request).
  579. * ``SUPERSET_WEBSERVER_DOMAINS``: list of allowed hostnames for domain sharding feature. default `None`
  580. Middleware
  581. ----------
  582. Superset allows you to add your own middleware. To add your own middleware, update the ``ADDITIONAL_MIDDLEWARE`` key in
  583. your `superset_config.py`. ``ADDITIONAL_MIDDLEWARE`` should be a list of your additional middleware classes.
  584. For example, to use AUTH_REMOTE_USER from behind a proxy server like nginx, you have to add a simple middleware class to
  585. add the value of ``HTTP_X_PROXY_REMOTE_USER`` (or any other custom header from the proxy) to Gunicorn's ``REMOTE_USER``
  586. environment variable: ::
  587. class RemoteUserMiddleware(object):
  588. def __init__(self, app):
  589. self.app = app
  590. def __call__(self, environ, start_response):
  591. user = environ.pop('HTTP_X_PROXY_REMOTE_USER', None)
  592. environ['REMOTE_USER'] = user
  593. return self.app(environ, start_response)
  594. ADDITIONAL_MIDDLEWARE = [RemoteUserMiddleware, ]
  595. *Adapted from http://flask.pocoo.org/snippets/69/*
  596. Event Logging
  597. -------------
  598. Superset by default logs special action event on it's database. These log can be accessed on the UI navigating to
  599. "Security" -> "Action Log". You can freely customize these logs by implementing your own event log class.
  600. Example of a simple JSON to Stdout class::
  601. class JSONStdOutEventLogger(AbstractEventLogger):
  602. def log(self, user_id, action, *args, **kwargs):
  603. records = kwargs.get('records', list())
  604. dashboard_id = kwargs.get('dashboard_id')
  605. slice_id = kwargs.get('slice_id')
  606. duration_ms = kwargs.get('duration_ms')
  607. referrer = kwargs.get('referrer')
  608. for record in records:
  609. log = dict(
  610. action=action,
  611. json=record,
  612. dashboard_id=dashboard_id,
  613. slice_id=slice_id,
  614. duration_ms=duration_ms,
  615. referrer=referrer,
  616. user_id=user_id
  617. )
  618. print(json.dumps(log))
  619. Then on Superset's config pass an instance of the logger type you want to use.
  620. EVENT_LOGGER = JSONStdOutEventLogger()
  621. Upgrading
  622. ---------
  623. Upgrading should be as straightforward as running::
  624. pip install apache-superset --upgrade
  625. superset db upgrade
  626. superset init
  627. We recommend to follow standard best practices when upgrading Superset, such
  628. as taking a database backup prior to the upgrade, upgrading a staging
  629. environment prior to upgrading production, and upgrading production while less
  630. users are active on the platform.
  631. .. note ::
  632. Some upgrades may contain backward-incompatible changes, or require
  633. scheduling downtime, when that is the case, contributors attach notes in
  634. ``UPDATING.md`` in the repository. It's recommended to review this
  635. file prior to running an upgrade.
  636. Celery Tasks
  637. ------------
  638. On large analytic databases, it's common to run queries that
  639. execute for minutes or hours.
  640. To enable support for long running queries that
  641. execute beyond the typical web request's timeout (30-60 seconds), it is
  642. necessary to configure an asynchronous backend for Superset which consists of:
  643. * one or many Superset workers (which is implemented as a Celery worker), and
  644. can be started with the ``celery worker`` command, run
  645. ``celery worker --help`` to view the related options.
  646. * a celery broker (message queue) for which we recommend using Redis
  647. or RabbitMQ
  648. * a results backend that defines where the worker will persist the query
  649. results
  650. Configuring Celery requires defining a ``CELERY_CONFIG`` in your
  651. ``superset_config.py``. Both the worker and web server processes should
  652. have the same configuration.
  653. .. code-block:: python
  654. class CeleryConfig(object):
  655. BROKER_URL = 'redis://localhost:6379/0'
  656. CELERY_IMPORTS = (
  657. 'superset.sql_lab',
  658. 'superset.tasks',
  659. )
  660. CELERY_RESULT_BACKEND = 'redis://localhost:6379/0'
  661. CELERYD_LOG_LEVEL = 'DEBUG'
  662. CELERYD_PREFETCH_MULTIPLIER = 10
  663. CELERY_ACKS_LATE = True
  664. CELERY_ANNOTATIONS = {
  665. 'sql_lab.get_sql_results': {
  666. 'rate_limit': '100/s',
  667. },
  668. 'email_reports.send': {
  669. 'rate_limit': '1/s',
  670. 'time_limit': 120,
  671. 'soft_time_limit': 150,
  672. 'ignore_result': True,
  673. },
  674. }
  675. CELERYBEAT_SCHEDULE = {
  676. 'email_reports.schedule_hourly': {
  677. 'task': 'email_reports.schedule_hourly',
  678. 'schedule': crontab(minute=1, hour='*'),
  679. },
  680. }
  681. CELERY_CONFIG = CeleryConfig
  682. * To start a Celery worker to leverage the configuration run: ::
  683. celery worker --app=superset.tasks.celery_app:app --pool=prefork -O fair -c 4
  684. * To start a job which schedules periodic background jobs, run ::
  685. celery beat --app=superset.tasks.celery_app:app
  686. To setup a result backend, you need to pass an instance of a derivative
  687. of ``werkzeug.contrib.cache.BaseCache`` to the ``RESULTS_BACKEND``
  688. configuration key in your ``superset_config.py``. It's possible to use
  689. Memcached, Redis, S3 (https://pypi.python.org/pypi/s3werkzeugcache),
  690. memory or the file system (in a single server-type setup or for testing),
  691. or to write your own caching interface. Your ``superset_config.py`` may
  692. look something like:
  693. .. code-block:: python
  694. # On S3
  695. from s3cache.s3cache import S3Cache
  696. S3_CACHE_BUCKET = 'foobar-superset'
  697. S3_CACHE_KEY_PREFIX = 'sql_lab_result'
  698. RESULTS_BACKEND = S3Cache(S3_CACHE_BUCKET, S3_CACHE_KEY_PREFIX)
  699. # On Redis
  700. from werkzeug.contrib.cache import RedisCache
  701. RESULTS_BACKEND = RedisCache(
  702. host='localhost', port=6379, key_prefix='superset_results')
  703. For performance gains, `MessagePack <https://github.com/msgpack/msgpack-python>`_
  704. and `PyArrow <https://arrow.apache.org/docs/python/>`_ are now used for results
  705. serialization. This can be disabled by setting ``RESULTS_BACKEND_USE_MSGPACK = False``
  706. in your configuration, should any issues arise. Please clear your existing results
  707. cache store when upgrading an existing environment.
  708. **Important notes**
  709. * It is important that all the worker nodes and web servers in
  710. the Superset cluster share a common metadata database.
  711. This means that SQLite will not work in this context since it has
  712. limited support for concurrency and
  713. typically lives on the local file system.
  714. * There should only be one instance of ``celery beat`` running in your
  715. entire setup. If not, background jobs can get scheduled multiple times
  716. resulting in weird behaviors like duplicate delivery of reports,
  717. higher than expected load / traffic etc.
  718. * SQL Lab will only run your queries asynchronously if you enable
  719. "Asynchronous Query Execution" in your database settings.
  720. Email Reports
  721. -------------
  722. Email reports allow users to schedule email reports for
  723. * chart and dashboard visualization (Attachment or inline)
  724. * chart data (CSV attachment on inline table)
  725. **Setup**
  726. Make sure you enable email reports in your configuration file
  727. .. code-block:: python
  728. ENABLE_SCHEDULED_EMAIL_REPORTS = True
  729. Now you will find two new items in the navigation bar that allow you to schedule email
  730. reports
  731. * Manage -> Dashboard Emails
  732. * Manage -> Chart Email Schedules
  733. Schedules are defined in crontab format and each schedule
  734. can have a list of recipients (all of them can receive a single mail,
  735. or separate mails). For audit purposes, all outgoing mails can have a
  736. mandatory bcc.
  737. In order get picked up you need to configure a celery worker and a celery beat
  738. (see section above "Celery Tasks"). Your celery configuration also
  739. needs an entry ``email_reports.schedule_hourly`` for ``CELERYBEAT_SCHEDULE``.
  740. To send emails you need to configure SMTP settings in your configuration file. e.g.
  741. .. code-block:: python
  742. EMAIL_NOTIFICATIONS = True
  743. SMTP_HOST = "email-smtp.eu-west-1.amazonaws.com"
  744. SMTP_STARTTLS = True
  745. SMTP_SSL = False
  746. SMTP_USER = "smtp_username"
  747. SMTP_PORT = 25
  748. SMTP_PASSWORD = os.environ.get("SMTP_PASSWORD")
  749. SMTP_MAIL_FROM = "insights@komoot.com"
  750. To render dashboards you need to install a local browser on your superset instance
  751. * `geckodriver <https://github.com/mozilla/geckodriver>`_ and Firefox is preferred
  752. * `chromedriver <http://chromedriver.chromium.org/>`_ is a good option too
  753. You need to adjust the ``EMAIL_REPORTS_WEBDRIVER`` accordingly in your configuration.
  754. You also need to specify on behalf of which username to render the dashboards. In general
  755. dashboards and charts are not accessible to unauthorized requests, that is why the
  756. worker needs to take over credentials of an existing user to take a snapshot. ::
  757. EMAIL_REPORTS_USER = 'username_with_permission_to_access_dashboards'
  758. **Important notes**
  759. * Be mindful of the concurrency setting for celery (using ``-c 4``).
  760. Selenium/webdriver instances can consume a lot of CPU / memory on your servers.
  761. * In some cases, if you notice a lot of leaked ``geckodriver`` processes, try running
  762. your celery processes with ::
  763. celery worker --pool=prefork --max-tasks-per-child=128 ...
  764. * It is recommended to run separate workers for ``sql_lab`` and
  765. ``email_reports`` tasks. Can be done by using ``queue`` field in ``CELERY_ANNOTATIONS``
  766. * Adjust ``WEBDRIVER_BASEURL`` in your config if celery workers can't access superset via its
  767. default value ``http://0.0.0.0:8080/`` (notice the port number 8080, many other setups use
  768. port 8088).
  769. SQL Lab
  770. -------
  771. SQL Lab is a powerful SQL IDE that works with all SQLAlchemy compatible
  772. databases. By default, queries are executed in the scope of a web
  773. request so they may eventually timeout as queries exceed the maximum duration of a web
  774. request in your environment, whether it'd be a reverse proxy or the Superset
  775. server itself. In such cases, it is preferred to use ``celery`` to run the queries
  776. in the background. Please follow the examples/notes mentioned above to get your
  777. celery setup working.
  778. Also note that SQL Lab supports Jinja templating in queries and that it's
  779. possible to overload
  780. the default Jinja context in your environment by defining the
  781. ``JINJA_CONTEXT_ADDONS`` in your superset configuration. Objects referenced
  782. in this dictionary are made available for users to use in their SQL.
  783. .. code-block:: python
  784. JINJA_CONTEXT_ADDONS = {
  785. 'my_crazy_macro': lambda x: x*2,
  786. }
  787. SQL Lab also includes a live query validation feature with pluggable backends.
  788. You can configure which validation implementation is used with which database
  789. engine by adding a block like the following to your config.py:
  790. .. code-block:: python
  791. FEATURE_FLAGS = {
  792. 'SQL_VALIDATORS_BY_ENGINE': {
  793. 'presto': 'PrestoDBSQLValidator',
  794. }
  795. }
  796. The available validators and names can be found in `sql_validators/`.
  797. **Scheduling queries**
  798. You can optionally allow your users to schedule queries directly in SQL Lab.
  799. This is done by addding extra metadata to saved queries, which are then picked
  800. up by an external scheduled (like [Apache Airflow](https://airflow.apache.org/)).
  801. To allow scheduled queries, add the following to your `config.py`:
  802. .. code-block:: python
  803. FEATURE_FLAGS = {
  804. # Configuration for scheduling queries from SQL Lab. This information is
  805. # collected when the user clicks "Schedule query", and saved into the `extra`
  806. # field of saved queries.
  807. # See: https://github.com/mozilla-services/react-jsonschema-form
  808. 'SCHEDULED_QUERIES': {
  809. 'JSONSCHEMA': {
  810. 'title': 'Schedule',
  811. 'description': (
  812. 'In order to schedule a query, you need to specify when it '
  813. 'should start running, when it should stop running, and how '
  814. 'often it should run. You can also optionally specify '
  815. 'dependencies that should be met before the query is '
  816. 'executed. Please read the documentation for best practices '
  817. 'and more information on how to specify dependencies.'
  818. ),
  819. 'type': 'object',
  820. 'properties': {
  821. 'output_table': {
  822. 'type': 'string',
  823. 'title': 'Output table name',
  824. },
  825. 'start_date': {
  826. 'type': 'string',
  827. 'title': 'Start date',
  828. # date-time is parsed using the chrono library, see
  829. # https://www.npmjs.com/package/chrono-node#usage
  830. 'format': 'date-time',
  831. 'default': 'tomorrow at 9am',
  832. },
  833. 'end_date': {
  834. 'type': 'string',
  835. 'title': 'End date',
  836. # date-time is parsed using the chrono library, see
  837. # https://www.npmjs.com/package/chrono-node#usage
  838. 'format': 'date-time',
  839. 'default': '9am in 30 days',
  840. },
  841. 'schedule_interval': {
  842. 'type': 'string',
  843. 'title': 'Schedule interval',
  844. },
  845. 'dependencies': {
  846. 'type': 'array',
  847. 'title': 'Dependencies',
  848. 'items': {
  849. 'type': 'string',
  850. },
  851. },
  852. },
  853. },
  854. 'UISCHEMA': {
  855. 'schedule_interval': {
  856. 'ui:placeholder': '@daily, @weekly, etc.',
  857. },
  858. 'dependencies': {
  859. 'ui:help': (
  860. 'Check the documentation for the correct format when '
  861. 'defining dependencies.'
  862. ),
  863. },
  864. },
  865. 'VALIDATION': [
  866. # ensure that start_date <= end_date
  867. {
  868. 'name': 'less_equal',
  869. 'arguments': ['start_date', 'end_date'],
  870. 'message': 'End date cannot be before start date',
  871. # this is where the error message is shown
  872. 'container': 'end_date',
  873. },
  874. ],
  875. # link to the scheduler; this example links to an Airflow pipeline
  876. # that uses the query id and the output table as its name
  877. 'linkback': (
  878. 'https://airflow.example.com/admin/airflow/tree?'
  879. 'dag_id=query_${id}_${extra_json.schedule_info.output_table}'
  880. ),
  881. },
  882. }
  883. This feature flag is based on [react-jsonschema-form](https://github.com/mozilla-services/react-jsonschema-form),
  884. and will add a button called "Schedule Query" to SQL Lab. When the button is
  885. clicked, a modal will show up where the user can add the metadata required for
  886. scheduling the query.
  887. This information can then be retrieved from the endpoint `/savedqueryviewapi/api/read`
  888. and used to schedule the queries that have `scheduled_queries` in their JSON
  889. metadata. For schedulers other than Airflow, additional fields can be easily
  890. added to the configuration file above.
  891. Celery Flower
  892. -------------
  893. Flower is a web based tool for monitoring the Celery cluster which you can
  894. install from pip: ::
  895. pip install flower
  896. and run via: ::
  897. celery flower --app=superset.tasks.celery_app:app
  898. Building from source
  899. ---------------------
  900. More advanced users may want to build Superset from sources. That
  901. would be the case if you fork the project to add features specific to
  902. your environment. See `CONTRIBUTING.md#setup-local-environment-for-development <https://github.com/apache/incubator-superset/blob/master/CONTRIBUTING.md#setup-local-environment-for-development>`_.
  903. Blueprints
  904. ----------
  905. `Blueprints are Flask's reusable apps <https://flask.palletsprojects.com/en/1.0.x/tutorial/views/>`_.
  906. Superset allows you to specify an array of Blueprints
  907. in your ``superset_config`` module. Here's
  908. an example of how this can work with a simple Blueprint. By doing
  909. so, you can expect Superset to serve a page that says "OK"
  910. at the ``/simple_page`` url. This can allow you to run other things such
  911. as custom data visualization applications alongside Superset, on the
  912. same server.
  913. .. code-block:: python
  914. from flask import Blueprint
  915. simple_page = Blueprint('simple_page', __name__,
  916. template_folder='templates')
  917. @simple_page.route('/', defaults={'page': 'index'})
  918. @simple_page.route('/<page>')
  919. def show(page):
  920. return "Ok"
  921. BLUEPRINTS = [simple_page]
  922. StatsD logging
  923. --------------
  924. Superset is instrumented to log events to StatsD if desired. Most endpoints hit
  925. are logged as well as key events like query start and end in SQL Lab.
  926. To setup StatsD logging, it's a matter of configuring the logger in your
  927. ``superset_config.py``.
  928. .. code-block:: python
  929. from superset.stats_logger import StatsdStatsLogger
  930. STATS_LOGGER = StatsdStatsLogger(host='localhost', port=8125, prefix='superset')
  931. Note that it's also possible to implement you own logger by deriving
  932. ``superset.stats_logger.BaseStatsLogger``.
  933. Install Superset with helm in Kubernetes
  934. ----------------------------------------
  935. You can install Superset into Kubernetes with Helm <https://helm.sh/>. The chart is
  936. located in ``install/helm``.
  937. To install Superset into your Kubernetes:
  938. .. code-block:: bash
  939. helm upgrade --install superset ./install/helm/superset
  940. Note that the above command will install Superset into ``default`` namespace of your Kubernetes cluster.
  941. Custom OAuth2 configuration
  942. ---------------------------
  943. Beyond FAB supported providers (github, twitter, linkedin, google, azure), its easy to connect Superset with other OAuth2 Authorization Server implementations that support "code" authorization.
  944. The first step: Configure authorization in Superset ``superset_config.py``.
  945. .. code-block:: python
  946. AUTH_TYPE = AUTH_OAUTH
  947. OAUTH_PROVIDERS = [
  948. { 'name':'egaSSO',
  949. 'token_key':'access_token', # Name of the token in the response of access_token_url
  950. 'icon':'fa-address-card', # Icon for the provider
  951. 'remote_app': {
  952. 'consumer_key':'myClientId', # Client Id (Identify Superset application)
  953. 'consumer_secret':'MySecret', # Secret for this Client Id (Identify Superset application)
  954. 'request_token_params':{
  955. 'scope': 'read' # Scope for the Authorization
  956. },
  957. 'access_token_method':'POST', # HTTP Method to call access_token_url
  958. 'access_token_params':{ # Additional parameters for calls to access_token_url
  959. 'client_id':'myClientId'
  960. },
  961. 'access_token_headers':{ # Additional headers for calls to access_token_url
  962. 'Authorization': 'Basic Base64EncodedClientIdAndSecret'
  963. },
  964. 'base_url':'https://myAuthorizationServer/oauth2AuthorizationServer/',
  965. 'access_token_url':'https://myAuthorizationServer/oauth2AuthorizationServer/token',
  966. 'authorize_url':'https://myAuthorizationServer/oauth2AuthorizationServer/authorize'
  967. }
  968. }
  969. ]
  970. # Will allow user self registration, allowing to create Flask users from Authorized User
  971. AUTH_USER_REGISTRATION = True
  972. # The default user self registration role
  973. AUTH_USER_REGISTRATION_ROLE = "Public"
  974. Second step: Create a `CustomSsoSecurityManager` that extends `SupersetSecurityManager` and overrides `oauth_user_info`:
  975. .. code-block:: python
  976. from superset.security import SupersetSecurityManager
  977. class CustomSsoSecurityManager(SupersetSecurityManager):
  978. def oauth_user_info(self, provider, response=None):
  979. logging.debug("Oauth2 provider: {0}.".format(provider))
  980. if provider == 'egaSSO':
  981. # As example, this line request a GET to base_url + '/' + userDetails with Bearer Authentication,
  982. # and expects that authorization server checks the token, and response with user details
  983. me = self.appbuilder.sm.oauth_remotes[provider].get('userDetails').data
  984. logging.debug("user_data: {0}".format(me))
  985. return { 'name' : me['name'], 'email' : me['email'], 'id' : me['user_name'], 'username' : me['user_name'], 'first_name':'', 'last_name':''}
  986. ...
  987. This file must be located at the same directory than ``superset_config.py`` with the name ``custom_sso_security_manager.py``.
  988. Then we can add this two lines to ``superset_config.py``:
  989. .. code-block:: python
  990. from custom_sso_security_manager import CustomSsoSecurityManager
  991. CUSTOM_SECURITY_MANAGER = CustomSsoSecurityManager
  992. Feature Flags
  993. -------------
  994. Because of a wide variety of users, Superset has some features that are not enabled by default. For example, some users have stronger security restrictions, while some others may not. So Superset allow users to enable or disable some features by config. For feature owners, you can add optional functionalities in Superset, but will be only affected by a subset of users.
  995. You can enable or disable features with flag from ``superset_config.py``:
  996. .. code-block:: python
  997. DEFAULT_FEATURE_FLAGS = {
  998. 'CLIENT_CACHE': False,
  999. 'ENABLE_EXPLORE_JSON_CSRF_PROTECTION': False,
  1000. 'PRESTO_EXPAND_DATA': False,
  1001. }
  1002. Here is a list of flags and descriptions:
  1003. * ENABLE_EXPLORE_JSON_CSRF_PROTECTION
  1004. * For some security concerns, you may need to enforce CSRF protection on all query request to explore_json endpoint. In Superset, we use `flask-csrf <https://sjl.bitbucket.io/flask-csrf/>`_ add csrf protection for all POST requests, but this protection doesn't apply to GET method.
  1005. * When ENABLE_EXPLORE_JSON_CSRF_PROTECTION is set to true, your users cannot make GET request to explore_json. The default value for this feature False (current behavior), explore_json accepts both GET and POST request. See `PR 7935 <https://github.com/apache/incubator-superset/pull/7935>`_ for more details.
  1006. * PRESTO_EXPAND_DATA
  1007. * When this feature is enabled, nested types in Presto will be expanded into extra columns and/or arrays. This is experimental, and doesn't work with all nested types.
  1008. SIP-15
  1009. ------
  1010. `SIP-15 <https://github.com/apache/incubator-superset/issues/6360>`_ aims to ensure that time intervals are handled in a consistent and transparent manner for both the Druid and SQLAlchemy connectors.
  1011. Prior to SIP-15 SQLAlchemy used inclusive endpoints however these may behave like exclusive for string columns (due to lexicographical ordering) if no formatting was defined and the column formatting did not conform to an ISO 8601 date-time (refer to the SIP for details).
  1012. To remedy this rather than having to define the date/time format for every non-IS0 8601 date-time column, once can define a default column mapping on a per database level via the ``extra`` parameter ::
  1013. {
  1014. "python_date_format_by_column_name": {
  1015. "ds": "%Y-%m-%d"
  1016. }
  1017. }
  1018. **New deployments**
  1019. All new Superset deployments should enable SIP-15 via,
  1020. .. code-block:: python
  1021. SIP_15_ENABLED = True
  1022. **Existing deployments**
  1023. Given that it is not apparent whether the chart creator was aware of the time range inconsistencies (and adjusted the endpoints accordingly) changing the behavior of all charts is overly aggressive. Instead SIP-15 proivides a soft transistion allowing producers (chart owners) to see the impact of the proposed change and adjust their charts accordingly.
  1024. Prior to enabling SIP-15 existing deployments should communicate to their users the impact of the change and define a grace period end date (exclusive of course) after which all charts will conform to the [start, end) interval, i.e.,
  1025. .. code-block:: python
  1026. from dateime import date
  1027. SIP_15_ENABLED = True
  1028. SIP_15_GRACE_PERIOD_END = date(<YYYY>, <MM>, <DD>)
  1029. To aid with transparency the current endpoint behavior is explicitly called out in the chart time range (post SIP-15 this will be [start, end) for all connectors and databases). One can override the defaults on a per database level via the ``extra``
  1030. parameter ::
  1031. {
  1032. "time_range_endpoints": ["inclusive", "inclusive"]
  1033. }
  1034. Note in a future release the interim SIP-15 logic will be removed (including the ``time_grain_endpoints`` form-data field) via a code change and Alembic migration.