ksh: Memory leak and eventual crash when defining and running function in virtual subshell

This bug was discovered and identified as separate while fixing #113.

When defining and running function in a (virtual/non-forked) subshell many times, a crash occurs.

Reproducer (minimal except for the counter):

for ((n=0; n < 50000; n++))
do	print -n $'\r'$n  # optional counter
	(function foo { :; }; foo)
done

When compiled with vmalloc, the reproducer consistently crashes at iteration #32766. This is 2^15-2, suggesting an overflow of a short somewhere (max value: 2^15-1, 32767).

When compiled without vmalloc and using the native malloc on macOS, it crashes slightly later and at unpredictable numbers of iterations, but all quite close to a short overflow: 32907, 33191, 32986, 33038, 32832, 32783, etc.

Backtraces of the native macOS malloc crash are here.

The consistent crash with vmalloc always has the same backtrace (edit: updated with backtrace with source line numbers after recompiling with -O0 -g):

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   ksh                                 0x000000010f5fd8d0 _sfpmove + 80 (sfpool.c:235)
1   ksh                                 0x000000010f5f16c5 sfclose + 837 (sfclose.c:96)
2   ksh                                 0x000000010f5b6469 stkclose + 121 (stk.c:317)
3   ksh                                 0x000000010f562bb2 sh_exec + 10674 (xec.c:1516)
4   ksh                                 0x000000010f564a99 sh_exec + 18585 (xec.c:2012)
5   ksh                                 0x000000010f55c7ed sh_subshell + 2605 (subshell.c:605)
6   ksh                                 0x000000010f564402 sh_exec + 16898 (xec.c:1886)
7   ksh                                 0x000000010f564a99 sh_exec + 18585 (xec.c:2012)
8   ksh                                 0x000000010f56571c sh_exec + 21788 (xec.c:2201)
9   ksh                                 0x000000010f564a99 sh_exec + 18585 (xec.c:2012)
10  ksh                                 0x000000010f4e7dbb exfile + 3243 (main.c:582)
11  ksh                                 0x000000010f4e8ce7 sh_main + 3367 (main.c:353)
12  ksh                                 0x000000010f4ce796 main + 38 (pmain.c:45)
13  libdyld.dylib                       0x00007fff670df3d5 start + 1

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 22

Most upvoted comments

Specifying a (void*) cast is unnecessary when using free (free(tofree) is equivalent to free((void*)tofree)). In fact, the above free(buffer) is freeing a char * variable, yet it doesn’t have a cast: https://github.com/ksh93/ksh/blob/05ac1dbb411ee7c4ad5c6d7ae499e72c12be09f6/src/cmd/ksh93/bltins/misc.c#L222 https://github.com/ksh93/ksh/blob/05ac1dbb411ee7c4ad5c6d7ae499e72c12be09f6/src/cmd/ksh93/bltins/misc.c#L307-L308

This is the ksh2020 code that frees shp->st.filename (link)

    if (!np) {
        free(shp->st.filename);
        shp->st.filename = NULL;
    }

The extra = NULL in ksh2020 was added in att/ast@a6b0a828 to fix a Coverity warning (free doesn’t set pointers to NULL after freeing memory).

JohnoKing on Aug 13, 2020

The following patch fixes the POSIX function memory leak without reintroducing the kshdb crash. This makes ksh free the memory pointed to by a tofree variable, which avoids the kshdb crash since tofree always points to memory that can be freed:

--- a/src/cmd/ksh93/bltins/misc.c
+++ b/src/cmd/ksh93/bltins/misc.c
@@ -219,7 +219,7 @@ int    b_dot_cmd(register int n,char *argv[],Shbltin_t *context)
 	register int jmpval;
 	register Shell_t *shp = context->shp;
 	struct sh_scoped savst, *prevscope = shp->st.self;
-	char *filename=0, *buffer=0;
+	char *filename=0, *buffer=0, *tofree;
 	int	fd;
 	struct dolnod   *saveargfor;
 	volatile struct dolnod   *argsave=0;
@@ -282,8 +282,9 @@ int    b_dot_cmd(register int n,char *argv[],Shbltin_t *context)
 	shp->st.self = &savst;
 	shp->topscope = (Shscope_t*)shp->st.self;
 	prevscope->save_tree = shp->var_tree;
+	tofree = shp->st.filename;
 	if(np)
-		shp->st.filename = np->nvalue.rp->fname ? strdup(np->nvalue.rp->fname) : 0;
+		shp->st.filename = np->nvalue.rp->fname;
 	nv_putval(SH_PATHNAMENOD, shp->st.filename ,NV_NOFREE);
 	shp->posix_fun = 0;
 	if(np || argv[1])
@@ -307,7 +308,7 @@ int    b_dot_cmd(register int n,char *argv[],Shbltin_t *context)
 	if(buffer)
 		free(buffer);
 	if(!np)
-		free((void*)shp->st.filename);
+		free(tofree);
 	shp->dot_depth--;
 	if((np || argv[1]) && jmpval!=SH_JMPSCRIPT)
 		sh_argreset(shp,(struct dolnod*)argsave,saveargfor);

JohnoKing on Aug 13, 2020

The shcomp leak increases as you increase the number of iterations N. But at exactly N=6400, the shcomp leak magically goes away, and at larger numbers (such as N=10000, or even N=100000) it doesn’t return. Perhaps this is one of those vmalloc bizarritudes.

McDutchie on Aug 13, 2020

The memory leak in POSIX functions is a regression caused by bd88cc7f, specifically the strdup that was added to fix a crash: https://github.com/ksh93/ksh/blob/05ac1dbb411ee7c4ad5c6d7ae499e72c12be09f6/src/cmd/ksh93/bltins/misc.c#L285-L286

When I remove strdup the memory leak test has the following results:

test leaks begins at 2020-08-13+06:39:56
test leaks passed at 2020-08-13+06:39:56 [ 11 tests 0 errors ]
test leaks(C.UTF-8) begins at 2020-08-13+06:39:56
test leaks(C.UTF-8) passed at 2020-08-13+06:39:56 [ 11 tests 0 errors ]
test leaks(shcomp) begins at 2020-08-13+06:39:56
	shcomp-leaks.ksh[153]: ksh function defined in virtual subshell not freed after leaving subshell (leaked 784 bytes after 512 iterations)
	shcomp-leaks.ksh[159]: POSIX function defined in virtual subshell not freed after leaving subshell (leaked 896 bytes after 512 iterations)
test leaks(shcomp) failed at 2020-08-13+06:39:56 with exit code 2 [ 11 tests 2 errors ]
Total errors: 2

JohnoKing on Aug 13, 2020

The following patch, based on the check from name.c (above), fixes this bug for me while not introducing att#803. @JohnoKing, could you test and confirm this? Does this look good to you?

diff --git a/src/cmd/ksh93/sh/subshell.c b/src/cmd/ksh93/sh/subshell.c
index b839a92..a099f57 100644
--- a/src/cmd/ksh93/sh/subshell.c
+++ b/src/cmd/ksh93/sh/subshell.c
@@ -397,7 +397,7 @@ Dt_t *sh_subfuntree(int create)
 	return(sp->shp->fun_tree);
 }
 
-static void table_unset(register Dt_t *root,int fun)
+static void table_unset(Shell_t *shp,register Dt_t *root,int fun)
 {
 	register Namval_t *np,*nq;
 	int flag;
@@ -405,7 +405,9 @@ static void table_unset(register Dt_t *root,int fun)
 	{
 		nq = (Namval_t*)dtnext(root,np);
 		flag=0;
-		if(fun && np->nvalue.rp && np->nvalue.rp->fname && *np->nvalue.rp->fname=='/')
+		/* Check for autoloaded function: it must not be freed. */
+		if(fun && np->nvalue.rp && np->nvalue.rp->fname && shp->fpathdict
+		&& nv_search(np->nvalue.rp->fname,shp->fpathdict,0))
 		{
 			np->nvalue.rp->fdict = 0;
 			flag = NV_NOFREE;
@@ -693,7 +695,7 @@ Sfio_t *sh_subshell(Shell_t *shp,Shnode_t *t, volatile int flags, int comsub)
 		if(sp->sfun)
 		{
 			shp->fun_tree = dtview(sp->sfun,0);
-			table_unset(sp->sfun,1);
+			table_unset(shp,sp->sfun,1);
 			dtclose(sp->sfun);
 		}
 		n = shp->st.trapmax-savst.trapmax;

McDutchie on Aug 13, 2020

The memory leak appears to be fixed in ksh93v- 2014-12-24 beta. Running the same example in ksh2020 with ps instead of vmstate points to the memory leak returning after ksh93v- as a ksh2020 regression:

# This was also tested in ksh93v- with vmstate, although to test ksh2020
# 'ps -p $$ -o vsz' was used instead of vmstate (so this should be run on
# Linux or FreeBSD, not macOS).
$ ksh93v /tmp/issue114tmp.sh
   VSZ
 12788
49999
   VSZ
 12788

$ ksh93v-2013 /tmp/issue114tmp.sh
   VSZ
  9616
49999
   VSZ
 77200

$ ksh2020 /tmp/issue114tmp.sh    
   VSZ
  8556
49999
   VSZ
 15156

# ksh93u+m with stkref fix applied
$ arch/linux.i386-64/bin/ksh /tmp/issue114tmp.sh
   VSZ
  8264
49999
   VSZ
931752

JohnoKing on Aug 13, 2020

It can’t be an unsigned value since the stk code assumes it can be a negative number:

Hmm, I’m not convinced of that – that <= looks like it might as well be ==. Seems more like a ‘just to be extra sure’ kind of thing.

But that suggests stkref is supposed to decrement, and isn’t decrementing, as it overflows.

Which makes me wonder if there is a memory leak.

Test script:

builtin vmstate || exit
n=0; (function foo { :; }; foo)
vmstate
for ((n=0; n < 32765; n++))
do      print -n $'\r'$n  # optional counter
        (function foo { :; }; foo)
done
echo
vmstate

And sure enough:

$ ksh issue114b.sh 
region=0x10ea7da90 method=best flags=0 size=393216 segments=4 busy=(282816,167,65552) free=(107264,9,34304)
32764
region=0x10ea7da90 method=best flags=0 size=6193152 segments=181 busy=(5107360,65711,65552) free=(19856,5,19552)

McDutchie on Aug 12, 2020