Changelog

1.18.2 - September 2nd, 2017

  • Silently pass on cancelled futures in as_completed (GH#1366)
  • Fix unicode keys error in Python 2 (GH#1370)
  • Support numeric worker names
  • Add dask-mpi executable (GH#1367)

1.18.1 - August 25th, 2017

  • Clean up forgotten keys in fire-and-forget workloads (GH#1250)
  • Handle missing extensions (GH#1263)
  • Allow recreate_exception on persisted collections (GH#1253)
  • Add asynchronous= keyword to blocking client methods (GH#1272)
  • Restrict to horizontal panning in bokeh plots (GH#1274)
  • Rename client.shutdown to client.close (GH#1275)
  • Avoid blocking on event loop (GH#1270)
  • Avoid cloudpickle errors for Client.get_versions (GH#1279)
  • Yield on Tornado IOStream.write futures (GH#1289)
  • Assume async behavior if inside a sync statement (GH#1284)
  • Avoid error messages on closing (GH#1297), (GH#1296) (GH#1318)
  • Add timeout= keyword to get_client (GH#1290)
  • Respect timeouts when restarting (GH#1304)
  • Clean file descriptor and memory leaks in tests (GH#1317)
  • Deprecate Executor (GH#1302)
  • Add timeout to ThreadPoolExecutor.shutdown (GH#1330)
  • Clean up AsyncProcess handling (GH#1324)
  • Allow unicode keys in Python 2 scheduler (GH#1328)
  • Avoid leaking stolen data (GH#1326)
  • Improve error handling on failed nanny starts (GH#1337), (GH#1331)
  • Make Adaptive more flexible
  • Support --contact-address and --listen-address in worker (GH#1278)
  • Remove old dworker, dscheduler executables (GH#1355)
  • Exit workers if nanny process fails (GH#1345)
  • Auto pep8 and flake (GH#1353)

1.18.0 - July 8th, 2017

  • Multi-threading safety (GH#1191), (GH#1228), (GH#1229)
  • Improve handling of byte counting (GH#1198) (GH#1224)
  • Add get_client, secede functions, refactor worker-client relationship

1.17.1 - June 14th, 2017

  • Remove Python 3.4 testing from travis-ci (GH#1157)
  • Remove ZMQ Support (GH#1160)
  • Fix memoryview nbytes issue in Python 2.7 (GH#1165)
  • Re-enable counters (GH#1168)
  • Improve scheduler.restart (GH#1175)

1.17.0 - June 9th, 2017

  • Reevaluate worker occupancy periodically during scheduler downtime (GH#1038) (GH#1101)
  • Add AioClient asyncio-compatible client API (GH#1029) (GH#1092) (GH#1099)
  • Update Keras serializer (GH#1067)
  • Support TLS/SSL connections for security (GH#866) (GH#1034)
  • Always create new worker directory when passed --local-directory (GH#1079)
  • Support pre-scattering data when using joblib frontent (GH#1022)
  • Make workers more robust to failure of sizeof function (GH#1108) and writing to disk (GH#1096)
  • Add is_empty and update methods to as_completed (GH#1113)
  • Remove _get coroutine and replace with get(..., sync=False) (GH#1109)
  • Improve API compatibility with async/await syntax (GH#1115) (GH#1124)
  • Add distributed Queues (GH#1117) and shared Variables (GH#1128) to enable inter-client coordination
  • Support direct client-to-worker scattering and gathering (GH#1130) as well as performance enhancements when scattering data
  • Style improvements for bokeh web dashboards (GH#1126) (GH#1141) as well as a removal of the external bokeh process
  • HTML reprs for Future and Client objects (GH#1136)
  • Support nested collections in client.compute (GH#1144)
  • Use normal client API in asynchronous mode (GH#1152)
  • Remove old distributed.collections submodule (GH#1153)

1.16.3 - May 5th, 2017

  • Add bokeh template files to MANIFEST (GH#1063)
  • Don’t set worker_client.get as default get (GH#1061)
  • Clean up logging on Client().shutdown() (GH#1055)

1.16.2 - May 3rd, 2017

  • Support async with Client syntax (GH#1053)
  • Use internal bokeh server for default diagnostics server (GH#1047)
  • Improve styling of bokeh plots when empty (GH#1046) (GH#1037)
  • Support efficient serialization for sparse arrays (GH#1040)
  • Prioritize newly arrived work in worker (GH#1035)
  • Prescatter data with joblib backend (GH#1022)
  • Make client.restart more robust to worker failure (GH#1018)
  • Support preloading a module or script in dask-worker or dask-scheduler processes (GH#1016)
  • Specify network interface in command line interface (GH#1007)
  • Client.scatter supports a single element (GH#1003)
  • Use blosc compression on all memoryviews passing through comms (GH#998)
  • Add concurrent.futures-compatible Executor (GH#997)
  • Add as_completed.batches method and return results (GH#994) (GH#971)
  • Allow worker_clients to optionally stay within the thread pool (GH#993)
  • Add bytes-stored and tasks-processing diagnostic histograms (GH#990)
  • Run supports non-msgpack-serializable results (GH#965)

1.16.1 - March 22nd, 2017

  • Use inproc transport in LocalCluster (GH#919)
  • Add structured and queryable cluster event logs (GH#922)
  • Use connection pool for inter-worker communication (GH#935)
  • Robustly shut down spawned worker processes at shutdown (GH#928)
  • Worker death timeout (GH#940)
  • More visual reporting of exceptions in progressbar (GH#941)
  • Render disk and serialization events to task stream visual (GH#943)
  • Support async for / await protocol (GH#952)
  • Ensure random generators are re-seeded in worker processes (GH#953)
  • Upload sourcecode as zip module (GH#886)
  • Replay remote exceptions in local process (GH#894)

1.16.0 - February 24th, 2017

  • First come first served priorities on client submissions (GH#840)
  • Can specify Bokeh internal ports (GH#850)
  • Allow stolen tasks to return from either worker (GH#853), (GH#875)
  • Add worker resource constraints during execution (GH#857)
  • Send small data through Channels (GH#858)
  • Better estimates for SciPy sparse matrix memory costs (GH#863)
  • Avoid stealing long running tasks (GH#873)
  • Maintain fortran ordering of NumPy arrays (GH#876)
  • Add --scheduler-file keyword to dask-scheduler (GH#877)
  • Add serializer for Keras models (GH#878)
  • Support uploading modules from zip files (GH#886)
  • Improve titles of Bokeh dashboards (GH#895)

1.15.2 - January 27th, 2017

  • Fix a bug where arrays with large dtypes or shapes were being improperly compressed (GH#830 GH#832 GH#833)
  • Extend as_completed to accept new futures during iteration (GH#829)
  • Add --nohost keyword to dask-ssh startup utility (GH#827)
  • Support scheduler shutdown of remote workers, useful for adaptive clusters (:pr: 811 GH#816 GH#821)
  • Add Client.run_on_scheduler method for running debug functions on the scheduler (GH#808)

1.15.1 - January 11th, 2017

  • Make compatibile with Bokeh 0.12.4 (GH#803)
  • Avoid compressing arrays if not helpful (GH#777)
  • Optimize inter-worker data transfer (GH#770) (GH#790)
  • Add –local-directory keyword to worker (GH#788)
  • Enable workers to arrive to the cluster with their own data. Useful if a worker leaves and comes back (GH#785)
  • Resolve thread safety bug when using local_client (GH#802)
  • Resolve scheduling issues in worker (GH#804)

1.15.0 - January 2nd, 2017

  • Major Worker refactor (GH#704)
  • Major Scheduler refactor (GH#717) (GH#722) (GH#724) (GH#742) (GH#743
  • Add check (default is False) option to Client.get_versions to raise if the versions don’t match on client, scheduler & workers (GH#664)
  • Future.add_done_callback executes in separate thread (GH#656)
  • Clean up numpy serialization (GH#670)
  • Support serialization of Tornado v4.5 coroutines (GH#673)
  • Use CPickle instead of Pickle in Python 2 (GH#684)
  • Use Forkserver rather than Fork on Unix in Python 3 (GH#687)
  • Support abstract resources for per-task constraints (GH#694) (GH#720) (GH#737)
  • Add TCP timeouts (GH#697)
  • Add embedded Bokeh server to workers (GH#709) (GH#713) (GH#738)
  • Add embedded Bokeh server to scheduler (GH#724) (GH#736) (GH#738)
  • Add more precise timers for Windows (GH#713)
  • Add Versioneer (GH#715)
  • Support inter-client channels (GH#729) (GH#749)
  • Scheduler Performance improvements (GH#740) (GH#760)
  • Improve load balancing and work stealing (GH#747) (GH#754) (GH#757)
  • Run Tornado coroutines on workers
  • Avoid slow sizeof call on Pandas dataframes (GH#758)

1.14.3 - November 13th, 2016

  • Remove custom Bokeh export tool that implicitly relied on nodejs (GH#655)
  • Clean up scheduler logging (GH#657)

1.14.2 - November 11th, 2016

  • Support more numpy dtypes in custom serialization, (GH#627), (GH#630), (GH#636)
  • Update Bokeh plots (GH#628)
  • Improve spill to disk heuristics (GH#633)
  • Add Export tool to Task Stream plot
  • Reverse frame order in loads for very many frames (GH#651)
  • Add timeout when waiting on write (GH#653)

1.14.0 - November 3rd, 2016

  • Add Client.get_versions() function to return software and package information from the scheduler, workers, and client (GH#595)
  • Improved windows support (GH#577) (GH#590) (GH#583) (GH#597)
  • Clean up rpc objects explicitly (GH#584)
  • Normalize collections against known futures (GH#587)
  • Add key= keyword to map to specify keynames (GH#589)
  • Custom data serialization (GH#606)
  • Refactor the web interface (GH#608) (GH#615) (GH#621)
  • Allow user-supplied Executor in Worker (GH#609)
  • Pass Worker kwargs through LocalCluster

1.13.3 - October 15th, 2016

  • Schedulers can retire workers cleanly
  • Add Future.add_done_callback for concurrent.futures compatibility
  • Update web interface to be consistent with Bokeh 0.12.3
  • Close streams explicitly, avoiding race conditions and supporting more robust restarts on Windows.
  • Improved shuffled performance for dask.dataframe
  • Add adaptive allocation cluster manager
  • Reduce administrative overhead when dealing with many workers
  • dask-ssh --log-directory . no longer errors
  • Microperformance tuning for the scheduler

1.13.2

  • Revert dask_worker to use fork rather than subprocess by default
  • Scatter retains type information
  • Bokeh always uses subprocess rather than spawn

1.13.1

  • Fix critical Windows error with dask_worker executable

1.13.0

  • Rename Executor to Client (GH#492)
  • Add --memory-limit option to dask-worker, enabling spill-to-disk behavior when running out of memory (GH#485)
  • Add --pid-file option to dask-worker and --dask-scheduler (GH#496)
  • Add upload_environment function to distribute conda environments. This is experimental, undocumented, and may change without notice. (GH#494)
  • Add workers= keyword argument to Client.compute and Client.persist, supporting location-restricted workloads with Dask collections (GH#484)
  • Add upload_environment function to distribute conda environments. This is experimental, undocumented, and may change without notice. (GH#494)
    • Add optional dask_worker= keyword to client.run functions that gets provided the worker or nanny object
    • Add nanny=False keyword to Client.run, allowing for the execution of arbitrary functions on the nannies as well as normal workers

1.12.2

This release adds some new features and removes dead code

  • Publish and share datasets on the scheduler between many clients (GH#453). See Publish Datasets.
  • Launch tasks from other tasks (experimental) (GH#471). See Launch Tasks from Tasks.
  • Remove unused code, notably the Center object and older client functions (GH#478)
  • Executor() and LocalCluster() is now robust to Bokeh’s absence (GH#481)
  • Removed s3fs and boto3 from requirements. These have moved to Dask.

1.12.1

This release is largely a bugfix release, recovering from the previous large refactor.

  • Fixes from previous refactor
    • Ensure idempotence across clients
    • Stress test losing scattered data permanently
  • IPython fixes
    • Add start_ipython_scheduler method to Executor
    • Add %remote magic for workers
    • Clean up code and tests
  • Pool connects to maintain reuse and reduce number of open file handles
  • Re-implement work stealing algorithm
  • Support cancellation of tuple keys, such as occur in dask.arrays
  • Start synchronizing against worker data that may be superfluous
  • Improve bokeh plots styling
    • Add memory plot tracking number of bytes
    • Make the progress bars more compact and align colors
    • Add workers/ page with workers table, stacks/processing plot, and memory
  • Add this release notes document

1.12.0

This release was largely a refactoring release. Internals were changed significantly without many new features.

  • Major refactor of the scheduler to use transitions system
  • Tweak protocol to traverse down complex messages in search of large bytestrings
  • Add dask-submit and dask-remote
  • Refactor HDFS writing to align with changes in the dask library
  • Executor reconnects to scheduler on broken connection or failed scheduler
  • Support sklearn.external.joblib as well as normal joblib