opencensus-python: AzureExporter not working with multiprocessing

Describe your environment. MacOS 10.14.6 Python 3.7.5 opencensus-ext-azure==1.0.4 opencensus-ext-requests==0.7.3

Steps to reproduce. I have code that I want to monitor the dependency calls to Azure DevOps APIs. Our code is running multiprocessing using the Process class. When the exporter is ran outside of multiprocessing, it sends telemetry to App Insights. When ran inside a multiprocessing Process, it doesn’t. I added a callback to print the spandata and it doesn’t get called when using Process.

from azure.devops.connection import Connection
from msrest.authentication import BasicAuthentication

from multiprocessing import Process, Pool, Queue
from base_insights import BaseInsights
from opencensus.common.runtime_context import RuntimeContext


class TestInsights:
    def __init__(self):
        self.tracer = BaseInsights.tracer

    def process(self):
       
        procs = []
        organization_url = 'https://dev.azure.com/org'
        credentials = BasicAuthentication('', '')
        
        p1 = Process(target=self.my_loop, args=[organization_url, credentials])
        p1.start()
        p1.join()

    def my_loop(self, organization_url, credentials, parent_span=None):
      
        with self.tracer.span(name='TestLoopProcessThreadingInside'):

            connection = Connection(base_url=organization_url, creds=credentials)
            core_client = connection.clients.get_core_client()

            org = core_client.get_project_collection("test")


TestInsights().process()

BaseInsights:

import os

from opencensus.ext.azure.trace_exporter import AzureExporter
from opencensus.trace import file_exporter

from opencensus.trace import config_integration
from opencensus.trace.samplers import ProbabilitySampler, AlwaysOnSampler
from opencensus.trace.tracer import Tracer

config_integration.trace_integrations(['requests'])


def singleton(cls):
    return cls()


@singleton
class BaseInsights:
    def __init__(self):
        exporter = AzureExporter()
        exporter.add_telemetry_processor(self.callback_function)

        self.tracer = Tracer(exporter=exporter, sampler=AlwaysOnSampler())


    def callback_function(self, envelope):
        print(envelope)

What is the expected behavior? Span data gets sent to Application Insights

What is the actual behavior? Span data is not sent to Application Insights

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 2
  • Comments: 22 (9 by maintainers)

Most upvoted comments

I tried running this code on 4 different environments.

CentOS 7.8 - Python 2.7.5 CentOS 7.8 - Python 3.6.8 macOS 10.14.6 - Python 2.7.16 macOS 10.14.6 - Python 3.7.5

I got the same results. Only INPROC: TestOutside is being logged in Application Insights.

Screen Shot 2020-08-05 at 10 43 46 AM

Any news on this? 👀 We’re having the same problem with Celery and would really appreciate to have an official solution

@lzchen Thanks for the pointers.

Right now we are using “multiprocessing” as a backend. The other backend which we can use is “loky”. That has the same logging issue, even with log files on the disk.

        with Parallel(n_jobs=4, backend="multiprocessing") as parallel:
            parallel(delayed(_handle_blob)(name) for name in blob_paths)

In general, for various priorities, I can’t spend much time in this issue at this point of time. Considering that some flavor of multiprocessing is very common in Python based applications and Microsoft is using this library for Azure Monitor/Application Insight, I thought there may be some work arounds or solutions.

Thanks for your support. Please keep me posted, in case, you encounter with some other solutions in future.

I am facing the same issue while utilizing multiprocessing through joblib. Mine is a pandas based ML Pipeline where work is distributed across multiple processes using joblib (“multiprocessing” back end).

        with Parallel(n_jobs=4, backend="multiprocessing") as parallel:
            parallel(delayed(_handle_blob)(name) for name in blob_paths)

joblib with “multiprocessing” backend successfully logs to a file on the disk across multiple processes. But, when I add “AzureLogHandler”, I don’t see any logs sent to Azure Application Insights.

I have used the workaround proposed in the thread. It works. But finally it’s a work around.

It would be great if this issue can be prioritized and addressed in Open-Census.

The worker is actually present but doesn’t see the contents of the threading queue because it can’t access it. I did some quick PoC of using the multiprocess Queue and was able to send logs to analytics.

from multiprocessing import Queue as MP_Queue
import jsonpickle

...redacted...

    # queue, or shared workers among queues (e.g. queue for traces, queue
    # for logs).
    def export(self, items):
        json_items = jsonpickle.encode(items)

        # Put items on both multiprocessing and threading queues
        self._mp_queue.put(json_items)
        self._queue.puts(items, block=False)  # pragma: NO COVER


... redacted ...

    def run(self):  # pragma: NO COVER
        # Indicate that this thread is an exporter thread.
        # Used to suppress tracking of requests in this thread.
        execution_context.set_is_exporter(True)
        src = self.src
        dst = self.dst
        while True:
            batch = src.gets(dst.max_batch_size, dst.export_interval)

            # Check if batch results in empty tuple and check multiprocessing queue for contents
            if batch == ():
                try:
                    json_items = dst._mp_queue.get()
                    batch = tuple(jsonpickle.decode(json_items))
                except Exception as e:
                    pass

            if batch and isinstance(batch[-1], QueueEvent):

... redacted ...

This works and sends the multiprocessed SpanData into insights.