vscode-jupyter: Jupyter for vscode continues to be slow (for large notebooks with mardown cells & large outputs)

Every few months I try to use vscode for jupyter because I would really love to just use vscode for everything. Every few months, I am disappointed and switch back to the web version.

There are two reasons for this:

1) Jupyter for vscode continues, stubbornly, to essentially always be more slow than traditional jupyter lab on localhost. Look at the run times in this screenshot. It took me a minute to run imports; when I ran the exact same code on the localhost version, it took 7.7 seconds (pictures attached). This is an extremely consistent theme in vscode jupyter. Cells will sometimes randomly take minutes to run, and will sometimes not even run at all until you press ‘shift-enter’ on them twice. This has been true for me across multiple computers, in many different dev environments.

Screenshot 2023-10-06 at 6 04 32 PM Screenshot 2023-10-06 at 6 13 02 PM

Cells also just randomly take forever to run, for god knows what reason. Here is a screenshot of assigning a string to a variable taking 27.4 seconds: Screenshot 2023-10-06 at 6 44 34 PM 1

Note that I am not trying to blame the team here, I am just frustrated because this is so close to being a great product, but this one thing holds it back, and it keeps not being fixed for years on end. The very first thing I would do as a product manager if I were in charge of vscode-jupyter is to pause all current tasks and plan, with the team, a multiple-month effort to speed things up, and get cells to run effectively instantly (or as close to the amount of time the python processing of the code takes as possible), every time.

2) Jupyter for vscode sucks at inline documentation, the equivalent of shift+tab in vscode jupyter. I am aware of the existence of the trigger parameter hints and show hover settings in the keyboard shortcuts. These are extremely unreliable, and actually show documentation when I press the button maybe 1/5 of the time. When they do show documentation, there is a ‘loading’ tag for awhile. Browser jupyter, on the other hand, is immediate with this. Basically every time. Below is an example.

image

The other issue with inline documentation is that, as far as I can tell, hover documentation for methods on instantiated variables simply doesn’t work. When I am using pandas, for instance, typing df.unique( and then pressing the show hover hotkey while my typing carat is to the right of the parenthesis pops up a documentation window saying exactly nothing. In contrast, in the web version, typing the same thing produces full documentation, as expected.

I don’t understand how these two issues aren’t your guys’s top priority. Everyone I’ve spoken to who uses jupyter has had exactly the same experience as I have, and everyone I’ve spoken to who uses jupyter uses the web version exclusively for exactly these issues. Even Kaggle notebooks are better. I love copilot and it’d be great to bring it into my jupyter notebook experience, but it has just never been viable to switch if I don’t want a workflow where I have to wait for 30 seconds every time I press command-enter, or I am frustratingly making a new cell above the current one and typing function? just to see documentation.

These issues have been ongoing since vscode jupyter started. They are the only things holding me and everyone else I’ve spoken to back from using it. Without fixing these issues, the whole thing is unusable, and no other features you guys put in matter. Why are you guys working on anything besides this when they are the only things anyone I know cares about?

I should note that this is all running in a docker container with access to 7 of my 8 cpus and 10gb of RAM. I am on a 2022 macbook air. I realize that this is a rant, so thank you for reading it. Nothing personal, I just think this product has a bunch of potential and I hate to see it unusable for so long.

About this issue

  • Original URL
  • State: closed
  • Created 9 months ago
  • Reactions: 62
  • Comments: 131 (46 by maintainers)

Most upvoted comments

Hi, i think this issue should not be closed because it is not solved.

Or does someone have a solution?

When I run notebooks in jupyter lab in the browser everything is instant but in vscode everything runs delayed.

Closing this issue as its been over 4 weeks, since the information was requested. We’ll be happy to reopen the issue when the requested information has been provided.

I’m having the exact same issue as all here! I don’t know how this could not be related to VS Code since it is happening to all of us when using the editor.

In my experience, jupyter notebooks performance degrades very quickly in the size of the notebook. This is especially true for plotly.express plots, and is independent of whether I am using a .ipynb file or the interactive cell views for a .py file.

Describing the experience for a .py file: when there are no plots and no LaTeX in the interactive window, everything is snappy. But if I have even just a handful of plots (or many lines of rendered LaTeX from Markdown cells), then it takes multiple seconds between when I press Shift+Enter and when the interactive window starts running the command. If I click “clear all”, everything is quick again. This seems to largely depend on how many plots are in the interactive window, not how many are currently visible.

Some other observations:

  • Making fancy interactive .js plots slows vscode way more than making png plots
  • When my notebooks contain many interactive plotly.express plots, the .ipynb files saved from vscode can be ~10x larger (100+MB rather than 10MB) than equivalent notebooks saved from Google Colab

Experiencing same issue - disabled “all” extensions. Having lags / freezes / delays even on markup cells.

When notebook initially opened it is smoother - gets worse after a few minutes. Restarting vscode is the only thing that helps temporarily - which makes it practically impossible to work.

I have exactly the same issues. The notebooks get especially slow as they get bigger. But many of the problems already exist in an empty notebook.

This issue has been closed automatically because it needs more information and has not had recent activity. See also our issue reporting guidelines.

Happy Coding!

In my experience, jupyter notebooks performance degrades very quickly in the size of the notebook. This is especially true for plotly.express plots, and is independent of whether I am using a .ipynb file or the interactive cell views for a .py file.

Describing the experience for a .py file: when there are no plots and no LaTeX in the interactive window, everything is snappy. But if I have even just a handful of plots (or many lines of rendered LaTeX from Markdown cells), then it takes multiple seconds between when I press Shift+Enter and when the interactive window starts running the command. If I click “clear all”, everything is quick again. This seems to largely depend on how many plots are in the interactive window, not how many are currently visible.

Some other observations:

  • Making fancy interactive .js plots slows vscode way more than making png plots
  • When my notebooks contain many interactive plotly.express plots, the .ipynb files saved from vscode can be ~10x larger (100+MB rather than 10MB) than equivalent notebooks saved from Google Colab

I’m experiencing the same issue and I have exactly the same observation as @JasonGross … which pushed me to switch to web-based version…

@JasonGross All that is true, but the bug where it gets stuck on a cell does not depend on notebook size or plot complexity.

Not sure if this is what you’re seeing, but I’ve noticed a regression of an old bug. I have a code cell that should run in a fraction of a second. I run it. It’s stuck for about 1 minute. Then all of a sudden it runs.

Very annoying. Because of this and other, numerous bugs, I’m thinking to go back to Jupyter Notebook in a browser.

@DonJayamanne those getNeighborFiles & detectCellLanguage calls are from copilot This souds like pretty similar behavior to what I was seeing with copilot trying to gather all that context from a large notebook https://github.com/microsoft/vscode/issues/211154

@rebornix was looking into reducing those calls at a certain point

I’m running into this as well and it is preventing me from continue working in VS Code. Thank you for trying to get to the bottom of the issue. In the meanwhile, is there any setting to toggle as a workaround to turn off the backup? I can only find “Autosave” which is already turned off. In JupyterLab I don’t notice any slow down at all for the same notebook.

Thanks @amunger , the code example and the referenced comment is very helpful. I can reproduce this and this might explain why we see a performance slow down for large notebook, especially when we have widgets or rich media.

image

My hypothesis is

  • Auto save or backup kicks when we try to run code
  • When that happens, it will try to convert the notebook document to bytes / buffers.
  • We have relatively large buffers for rich media / widgets
  • Our code for handling them is not very smart and can block the UI

image


When my notebooks contain many interactive plotly.express plots, the .ipynb files saved from vscode can be ~10x larger (100+MB rather than 10MB) than equivalent notebooks saved from Google Colab

This is also something we want to look into.

Since I disabled completions for copilot, it is fast (at least for now).

This issue is making VSCode with Jupyter basically unworkable for me. It used to not be like this however, wonder when it changed.

Hi @DonJayamanne , @amunger

I have just tested with the largest notebook I have which includes a lot of markdowns and it indeed runs faster. Although some functions still take time to execute, but I guess it’s just native to the libraries (it would be necessary for someone else to confirm - sns.regplot and clustering functions). Also I was monitoring the use of the CPU in the MacOS’ Activity Monitor and I noticed it now barely goes above 500 MB.

On regards of the issue with the completions, I guess these were solved as there was no lag nor any problems with them after running all the cells (around 180 coding cells alone), when before it would began to stuck after 70 or so.

The only caveat I would add is it was done only with these packages active and no changes to the settings.json:

Extension Author (truncated) Version
python ms- 2024.5.11021008
vscode-pylance ms- 2024.4.101
jupyter ms- 2024.4.2024041101
jupyter-renderers ms- 1.0.17

So, my suggestion would be to just begin to add our normal extensions just to see if any of those would choke the improvement since most likely we all work different ones. In my normal VsCode I run with these and several changes to the settings.json (for font, font size, ligatures, colours, conda path, semantic highlight, tree views, etc):

Extension Author (truncated) Version
catppuccin-vsc Cat 3.13.0
catppuccin-vsc-icons Cat 1.11.0
catppuccin-vsc-pack cat 1.0.2
vscode-pull-request-github Git 0.86.1
rainbow-csv mec 3.11.0
black-formatter ms- 2024.2.0
debugpy ms- 2024.4.0
python ms- 2024.4.1
vscode-pylance ms- 2024.4.101
jupyter ms- 2024.3.1
jupyter-renderers ms- 1.0.17
sqltools mtx 0.28.1
sqltools-driver-pg mtx 0.5.2
material-icon-theme PKi 4.34.0

(2 theme extensions excluded)

Best

Something else suggested that it is indeed the backup taking time is that if I try to exist VS Code while a large slow notebook is open, I see this:

image

@tlkaufmann @Liam3851 That’s great news, great because we’ve been able to identify the cause and there’s a work around. We will work with copilot to get this resolved

Highlighting this in case it got lost:

But I can’t help but think that this profile doesn’t contain the relevant information—I’m waiting 40 seconds for an operation that should only take 4 seconds (or less), and yet this profile only claims to capture 1256ms/.266 = 4.7 seconds. Where are the other 35s spent?

@DonJayamanne do you have any idea what would cause almost 90% of the time to not show up in the profile at all?

@tlkaufmann The second json is empty. Also what extensions do you have installed? There are two methods getNeighborFiles & detectCellLanguage that gets invoked and I’m not sure what extensions these are coming from. Please can you share the list of the extensions you have installed.

If saving is an issue, then please go to the bottom of the profile view and select the Bottom Up tab and sort the list by Self Time and Total Time as below and send the screen shots.

Thank you for your patience and help,

I’d like to see the top items in the sorted list along with the names and file paths. Screenshot 2024-04-24 at 21 24 40

@ale-dg what did this notebook have that made is so large. Where there some image outputs. I would like to ensure we test with such large notebooks, but would like to get the content right. I.e. ensure we have the same types of outputs as you have (to have a more realistic dataset).

As always, thank you.

@DonJayamanne what I meant is that I loaded a csv of 2.5 GB into a notebook (or a Pandas data frame If you’d like). This file is so large because it has over 25 million lines and 10 columns, so a little over 250 million data points. I haven’t made anything yet with the data, but so far the notebook is 80 kb (not sure why…)

Best

Try the insiders’ version. It has been working smoothly for me the last days

Thanks for the extra info and repro notebooks @ale-dg, that sounds like something different than what I’m trying to solve here, so I’ll split it out into another issue.

@joelostblom - does that error persist if you close all tabs and reload? Can you share the output from the console in Developer: toggle developer tools

Thanks @amunger ! I thought I had restarted VS Code but it turns out there was a window open on another desktop and closing that fixed it. So far I’m noticing much better performance on notebooks with large interactive charts (Altair/Vega charts), thanks for all your work on this issue! I will report back when I test it more with larger and longer-running notebooks if I run into issue.

@rabyj @ale-dg I’ve re-opened the issue, lets continue discussion there as you seem to be running into issues with completions

I haven’t got a chance to try the new solution… I was just giving a bit of feedback on the other one.

I’m fairly convinced that the big perf hit comes from serializing the notebook as part of the backup, in which case shrinking the file size isn’t really going to help.

  • In a notebook with large plots, making an edit will trigger a backup and cause a noticeable hiccup in the renderer.
    • If you open that same notebook as json in a text editor (no serialization necessary), the hiccup is much shorter
  • Logs help show a big difference in serialization time (the first delta, .001s vs ~3s)
[backup tracker] creating backup at 2024-03-29T21:23:55.074Z *.txt 
[backup tracker] storing backup at 2024-03-29T21:23:55.075Z *.txt 
[backup tracker] finished backup at 2024-03-29T21:23:58.148Z *.txt 

[backup tracker] creating backup at 2024-03-29T21:24:17.318Z *.ipynb 
[backup tracker] storing backup at 2024-03-29T21:24:20.239Z *.ipynb 
[backup tracker] finished backup at 2024-03-29T21:24:22.071Z *.ipynb 

here are some perf snapshots of the backup process, top is for a text editor image

Some of above hypothesis validated:

  • It’s backup for the interactive window in @amunger 's scenario, in which I ran the code around 100 times
    • The file snapshot size is Uint8Array(313597405) , which is ~313MB.
    • image
  • Running the same code above in VS Code and Google Colab, VS Code’s notebook is 3 times larger than Colab.
    • In this particular case, the output contains mostly numbers (100000 x and 100000 y).
    • When VS Code / ipynb extension stores the content, it seems we have the content formatted and we have a few \ts before each number, and a line break after: image
    • While Google Colab stores the numbers in the same line (technically they are using a different mimetype, which has a smaller footprint) image
    • The larger the x and y axis, the more tabs/spaces and linebreaks we generate in VS Code

I managed to repro the behavior in this comment by just running this cell ~50 times in the interactive window:

import plotly.express as px
import random
x = []
y = []
for i in range(100000):
    random_number = random.randint(1, 10000)
    x.append(i + random_number)
    y.append(i + random_number)
fig = px.scatter(x, y)
fig.show()

It still doesn’t happen every time, but it’s enough to be able to investigate

Recording 2024-03-22 at 16 08 48

I have the same issue with Code 1.87.2 on Ubuntu 23.10.

I have the same problem working on Fedora 39 - Linux. It’s driving me nuts.

@MehmetDiyar @gdebrun2 please can you share a sample notebook that can be used to replicate the issue you are running into.

Will do in the in the next couple days. Just to be clear, this issue is not exclusive to notebooks with markdown cells. I will try to provide example notebooks with no markdown and with markdown that experience this unresponsiveness.

Hi @DonJayamanne

I have run a large notebook, both with MD and without MD. Find below the logs for both.

Best

1-Jupyter-no-MD.log 1-Jupyter-with-MD.log

Is this behavior related? I sometimes see code execution hanging for multiple minutes on trying to write to the interactive window. Maybe there’s a similar blocking IO writing call that is deadlocked or something in the other cases?

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
Cell In[161], line 12
     10             weights[q_tok, max_tok, n_copies_nonmax] = (max_tok - 1) ** n_copies_nonmax * math.comb(model.cfg.n_ctx - 1, n_copies_nonmax)
     11 for _, v in min_gaps_list_nosvd:
---> 12     weighted_histogram(v.flatten().detach().numpy(), weights.flatten().detach().numpy(),labels={"x":"gap", "y":"count * # sequences"}, num_bins=v.max().item()).show(RENDERER)

File ~/guarantees-based-mechanistic-interpretability/.venv/lib/python3.10/site-packages/plotly/basedatatypes.py:3410, in BaseFigure.show(self, *args, **kwargs)
   3377 """
   3378 Show a figure using either the default renderer(s) or the renderer(s)
   3379 specified by the renderer argument
   (...)
   3406 None
   3407 """
   3408 import plotly.io as pio
-> 3410 return pio.show(self, *args, **kwargs)

File ~/guarantees-based-mechanistic-interpretability/.venv/lib/python3.10/site-packages/plotly/io/_renderers.py:386, in show(fig, renderer, validate, **kwargs)
    383 fig_dict = validate_coerce_fig_to_dict(fig, validate)
    385 # Mimetype renderers
--> 386 bundle = renderers._build_mime_bundle(fig_dict, renderers_string=renderer, **kwargs)
    387 if bundle:
    388     if not ipython_display:

File ~/guarantees-based-mechanistic-interpretability/.venv/lib/python3.10/site-packages/plotly/io/_renderers.py:294, in RenderersConfig._build_mime_bundle(self, fig_dict, renderers_string, **kwargs)
    291             if hasattr(renderer, k):
    292                 setattr(renderer, k, v)
--> 294         bundle.update(renderer.to_mimebundle(fig_dict))
    296 return bundle

File ~/guarantees-based-mechanistic-interpretability/.venv/lib/python3.10/site-packages/plotly/io/_base_renderers.py:126, in ImageRenderer.to_mimebundle(self, fig_dict)
    125 def to_mimebundle(self, fig_dict):
--> 126     image_bytes = to_image(
    127         fig_dict,
    128         format=self.format,
    129         width=self.width,
    130         height=self.height,
    131         scale=self.scale,
    132         validate=False,
    133         engine=self.engine,
    134     )
    136     if self.b64_encode:
    137         image_str = base64.b64encode(image_bytes).decode("utf8")

File ~/guarantees-based-mechanistic-interpretability/.venv/lib/python3.10/site-packages/plotly/io/_kaleido.py:143, in to_image(fig, format, width, height, scale, validate, engine)
    140 # Validate figure
    141 # ---------------
    142 fig_dict = validate_coerce_fig_to_dict(fig, validate)
--> 143 img_bytes = scope.transform(
    144     fig_dict, format=format, width=width, height=height, scale=scale
    145 )
    147 return img_bytes

File ~/guarantees-based-mechanistic-interpretability/.venv/lib/python3.10/site-packages/kaleido/scopes/plotly.py:153, in PlotlyScope.transform(self, figure, format, width, height, scale)
    142     raise ValueError(
    143         "Invalid format '{original_format}'.\n"
    144         "    Supported formats: {supported_formats_str}"
   (...)
    148         )
    149     )
    151 # Transform in using _perform_transform rather than superclass so we can access the full
    152 # response dict, including error codes.
--> 153 response = self._perform_transform(
    154     figure, format=format, width=width, height=height, scale=scale
    155 )
    157 # Check for export error, later can customize error messages for plotly Python users
    158 code = response.get("code", 0)

File ~/guarantees-based-mechanistic-interpretability/.venv/lib/python3.10/site-packages/kaleido/scopes/base.py:305, in BaseScope._perform_transform(self, data, **kwargs)
    302 self._std_error = io.BytesIO()
    304 # Write and flush spec
--> 305 self._proc.stdin.write(export_spec)
    306 self._proc.stdin.write("\n".encode('utf-8'))
    307 self._proc.stdin.flush()

KeyboardInterrupt: 

@DonJayamanne Could you recording a video to do those instruction above? I tried and it is abstract to follow each step. For example, when I type Developer: Set log level in command palette, I see nothing pop up. If you have time to recording the video, I would be very happy to test it. Thanks.

The output is attached. As far as I could see, the output only changed when the cell started executing. The time between me trying to execute and the actual execution seems not to be logged. logs.txt

I don’t use the powertoys extension at all. Maybe it’s also important to mention that the problems with jupyter notebooks are even more severe when developing on a remote server (via ssh or Kubernetes). However, they still persist when developing locally.