google-play-scraper: TypeError: Cannot read property 'split' of undefined

version: 5.0.0

invode param:

gPlay.list({
        category : gPlay.category.GAME,
        collection : gPlay.collection.NEW_FREE,
        lang : 'zh',
        country : 'CN',
        fullDetail : true,
        start : 0,
        num : 100,
    }).then(function (appList) {
        //console.log(appList);
        cb(null, appList);
    }).catch(function (err) {
        console.error("try to spider googleplay app list failed, err=", err);
        cb(err);
    });

log:

at parseFields (/googleplay/node_modules/google-play-scraper/lib/app.js:48:64)
    at tryCatcher (/googleplay/node_modules/bluebird/js/release/util.js:16:23)
    at Promise._settlePromiseFromHandler (/googleplay/node_modules/bluebird/js/release/promise.js:512:31)
    at Promise._settlePromise (/googleplay/node_modules/bluebird/js/release/promise.js:569:18)
    at Promise._settlePromise0 (/googleplay/node_modules/bluebird/js/release/promise.js:614:10)
    at Promise._settlePromises (/googleplay/node_modules/bluebird/js/release/promise.js:693:18)
    at Async._drainQueue (/googleplay/node_modules/bluebird/js/release/async.js:133:16)
    at Async._drainQueues (/googleplay/node_modules/bluebird/js/release/async.js:143:10)
    at Immediate.Async.drainQueues (/googleplay/node_modules/bluebird/js/release/async.js:17:14)
    at runCallback (timers.js:794:20)
    at tryOnImmediate (timers.js:752:5)
    at processImmediate [as _immediateCallback] (timers.js:729:5)

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 26 (14 by maintainers)

Most upvoted comments

@facundoolano looks like google is doing some A/B testing. The structure and the css-class naming is different from time to time. Unfortunately, it looks like they are now generating css-class names which results in non-human-readable names (such as AHFaub). It could also be the case, that those names change everytime a change to the website is made 😦

@tanqhnguyen I’ve created #205 following the logic described above. I think that solution is good as long as google doesn’t start changing the shape of the data.

I’ve added the first few fields to the result, now we just need to track down each of the remaining ones in the data object and add the paths (I’ll get to it eventually but any help to move this forward is appreciated)

So… this is what I have so far

let result = {
  title: {
    from: 'ds:3',
    path: '[0][0][0]'
  },
  description: {
    from: 'ds:3',
    path: '[0][10][0][1]'
  },
  logoUrl: {
    from: 'ds:3',
    path: '[0][12][1][3][2]'
  },
  categories: {
    from: 'ds:3',
    path: [
      '[0][12][13][0][0]'
    ]
  },
  ratingValue: {
    from: 'ds:10',
    path: '[0][0][1]' 
  },
  ratingCount: {
    from: 'ds:10',
    path: '[0][2][1]'
  },
  publisherName: {
    from: 'ds:3',
    path: '[0][12][5][1]'
  },
  screenshots: {
    from: 'ds:3',
    path: '[0][12][0]'
  }
};

This map contains the basic data to query stuff from the weird google data structure found on a game page

  • path is whatever passed to _.get
  • from indicates from which “node” we should get the data from

Unfortunately, this is purely manual work to find the correct node and its path 😦

And the regex is

/<script nonce="[\S]+">AF_initDataCallback\(([\s\S]*?)\);<\/script>/gi

I also need to use vm to execute the script content matched by the above regex to construct an array of {key: string, data: () => Array}

@tanqhnguyen just a suggestion, it would be neat to parse the scripts into a map object with ‘ds:3’, ‘ds:10’ as the keys, and the arrays as the values.

then you could express the field paths like this:

{
 title: ['ds:3', 0, 0, 0],
 description: ['ds:3', 0, 10, 0, 1]
}

There are ramda functions to facilitate extracting data from paths like those.

Also, don’t use the vm or eval to extract the arrays. You can just remove AF_initDataCallback( and replace the strings like data:function(){return with data: to get a proper json literal.

This applies only to parsing the application detail page. AFAIK the rest of the parsers are still workers (or at least tests are passing)

Hey folks, looks like some interesting progress around this.

Just a quick thought, is there any plan to support both the older format and the newer format as a fallback?

From what we’ve observed it seems not everything on the play store has migrated to this newer markup (yet) so it might be good to try both parsers. Perhaps only newly published apps or app updates now generate the newer format.