Configuration Options for Parsing from JSON

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Overview

This vignette:

introduces the opts argument for reading JSON with the read_json_X() family of functions.
outlines the creation of default options with opts_read_json()
provides extended examples of how these options control parsing of JSON

The `opts` argument - Specifying options when reading JSON

All read_json_x() functions have an opts argument. opts takes a named list of options used to configure the way yyjsonr parses JSON into R objects.

The default argument for opts is an empty list, which internally sets the default options for parsing.

The default options for parsing can also be viewed by running opts_read_json().

The following three function calls are all equivalent ways of calling read_json_str() using the default options:

read_json_str(str)
read_json_str(str, opts = list())
read_json_str(str, opts = opts_read_json())

Setting arguments to override the default options

Setting a single option (and keeping all other options at their default value) can be done in a number of ways.

The following three function calls are all equivalent:

read_json_str(str, opts = list(str_specials = 'string'))
read_json_str(str, opts = opts_read_json(str_specials = 'string'))
read_json_str(str, str_specials = 'string')

Option `promote_num_to_string` - mixtures of numeric and string types

By default, yyjsonr does not promote string values to numerica values i.e. promote_num_to_string = FALSE.

If an array contains mixed types, then an R list will be returned, so that all JSON values retain their original type.

json <- '[1,2,3.1,"apple", null]'
read_json_str(json)

#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 2
#> 
#> [[3]]
#> [1] 3.1
#> 
#> [[4]]
#> [1] "apple"
#> 
#> [[5]]
#> NULL

If promote_num_to_string is set to TRUE, then yyjsonr will promote numeric types to strings if the following conditions are met:

values are stored in a JSON array
the JSON array only contains numerics, strings or the JSON null value

yyjsonr::read_json_str(json, promote_num_to_string = TRUE)

#> [1] "1"        "2"        "3.100000" "apple"    NA

Option `df_missing_list_elem` - Missing list elements (when parsing data.frames)

When JSON data is being parsed into an R data.frame some columns become list-columns if there are mixed types in the original JSON.

It is possible that some values are completely missing in the JSON representation, and the df_missing_list_elem specifies the replacement for this missing value in the R data.frame. The default value is df_missing_list_elem = NULL.

JSON to data.frame (no list columns needed)

str <- '[{"a":1, "b":2}, {"a":3, "b":4}]'
read_json_str(str)

#>   a b
#> 1 1 2
#> 2 3 4

JSON to data.frame - list-columns required

str <- '[{"a":1, "b":[1,2]}, {"a":3, "b":2}]'
read_json_str(str)

#>   a    b
#> 1 1 1, 2
#> 2 3    2

str <- '[{"a":1, "b":[1,2]}, {"a":2}]'
read_json_str(str)

#>   a    b
#> 1 1 1, 2
#> 2 2 NULL

read_json_str(str, df_missing_list_elem = NA)

#>   a    b
#> 1 1 1, 2
#> 2 2   NA

Option `obj_of_arrs_to_df` - Reading JSON as a data.frame

By default, if JSON looks like it represents a data.frame it will be loaded as such. That is, a JSON {} object which contains only [] arrays (all of equal length) will be treated as data.frame. This is the default i.e. obj_of_arrs_to_df = TRUE.

If obj_of_arrs_to_df = FALSE then this data will be read in as a named list. In addition, if the [] arrays are not all the same length, then the data will also be read in as a named list as no inference of missing values will be done.

str <- '{"a":[1,2],"b":["apple", "banana"]}'
read_json_str(str)

#>   a      b
#> 1 1  apple
#> 2 2 banana

read_json_str(str, obj_of_arrs_to_df = FALSE)

#> $a
#> [1] 1 2
#> 
#> $b
#> [1] "apple"  "banana"

str_unequal <- '{"a":[1,2],"b":["apple", "banana", "carrot"]}'
read_json_str(str_unequal)

#> $a
#> [1] 1 2
#> 
#> $b
#> [1] "apple"  "banana" "carrot"

Option `arr_of_objs_to_df` - Reading JSON as a data.frame

str <- '[{"a":1, "b":2}, {"a":3, "b":4}]'
read_json_str(str)

#>   a b
#> 1 1 2
#> 2 3 4

read_json_str(str, arr_of_objs_to_df = FALSE)

#> [[1]]
#> [[1]]$a
#> [1] 1
#> 
#> [[1]]$b
#> [1] 2
#> 
#> 
#> [[2]]
#> [[2]]$a
#> [1] 3
#> 
#> [[2]]$b
#> [1] 4

str <- '[{"a":1, "b":2}, {"a":3, "b":4, "c":99}]'
read_json_str(str)

#>   a b  c
#> 1 1 2 NA
#> 2 3 4 99

Option `str_specials` - Reading string `"NA"` from JSON

JSON only really has the value null for representing special missing values, and this is converted to an R NA_character_ value when it is encountered in a string-ish context.

When yyjsonr encounters a literal "NA" value in a string-ish context, its conversion to an R value is controlled by the str_specials options

The possible values for the str_specials argument are:

string read in as the literal character string "NA" (the default behaviour)
special read in as NA_character_

str <- '["hello", "NA", null]'
read_json_str(str) # default: str_specials = 'string'

#> [1] "hello" "NA"    NA

read_json_str(str, str_specials = 'special')

#> [1] "hello" NA      NA

Option `num_specials` - Reading numeric `"NA"`, `"NaN"` and `"Inf"`

JSON only really has the value null for representing special missing values, and this is converted to an R NA_integer_ or NA_real_ value when it is encountered in a number-ish context.

When yyjsonr encounters a literal "NA", "NaN" or "Inf" value in a number-ish context, its conversion to an R value is controlled by the num_specials options.

The possible values for the num_specials argument are:

special read in as an actual numeric NA, NaN or Inf value (the default behaviour)
string read in as the literal character string "NA" etc

str <- '[1.23, "NA", "NaN", "Inf", "-Inf", null]'
read_json_str(str) # default: num_specials = 'special'

#> [1] 1.23   NA  NaN  Inf -Inf   NA

read_json_str(str, num_specials = 'string')

#> [[1]]
#> [1] 1.23
#> 
#> [[2]]
#> [1] "NA"
#> 
#> [[3]]
#> [1] "NaN"
#> 
#> [[4]]
#> [1] "Inf"
#> 
#> [[5]]
#> [1] "-Inf"
#> 
#> [[6]]
#> NULL

Option `int64` - large integer support

JSON supports large integers outside the range of R’s 32-bit integer type.

When such a large value is encountered in JSON, the int64 option controls the value’s representation in R.

The possible values for the int64 option are:

string store JSON integer as a string in R
double will store the JSON integer as a double precisision numeric. If the integer is outside the range +/- 2^53, then it may not be stored perfectly in the double.
bit64 convert to a 64-bit integer supported by the {bit64} package.

str <- '[1, 274877906944]'

# default: int64 = 'string'
# Since result is a mix of types, a list is returned
read_json_str(str)

#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] "274877906944"

# Read large integer as double
robj <- read_json_str(str, int64 = 'double')
class(robj)

#> [1] "numeric"

robj

#> [1]            1 274877906944

# Read large integer as 'bit64::integer64' type
library(bit64)
read_json_str(str, int64 = 'bit64')

#> integer64
#> [1] 1            274877906944

Option `length1_array_asis` - distinguishing scalars from length-1 vectors

JSON supports the concept of both scalar and vector values i.e. in JSON scalar 67 is different from an array of length 1 [67]. The length1_array_asis option is for situations where it is important to distinguish these value types in R.

However, R does not make this distinction between scalars and vectors of length 1.

To assist in translating objects from JSON to R and back to JSON, setting length1_array_asis = TRUE will mark JSON arrays of length 1 with the class AsIs. This option defaults to FALSE.

read_json_str('67')   |> str()

#>  int 67

read_json_str('[67]') |> str()

#>  int 67

read_json_str('67'  , length1_array_asis = TRUE) |> str()

#>  int 67

read_json_str('[67]', length1_array_asis = TRUE) |> str() # Has 'AsIs' class

#>  'AsIs' int 67

This option is then used with the option auto_unbox when writing JSON in order to control how length-1 R vectors are written. Shown below, if the length-1 vector is marked with AsIs class when reading, then when writing out to JSON with auto_unbox = TRUE it becomes a JSON vector value.

In the following example, only the second value ([67]) is affected by the option length1_array_asis. When the option is TRUE the value is tagged with a class of AsIs. Then when the created R object is subsequently written out to a JSON string, its structure is determined by auto_unbox which understands how to handle this class.

str <- '{"a":67, "b":[67], "c":[1,2]}'

# Length-1 vectors output as JSON arrays
read_json_str(str) |>
  write_json_str(auto_unbox = FALSE) |>
  cat()

#> {"a":[67],"b":[67],"c":[1,2]}

# Length-1 vectors output as JSON scalars
read_json_str(str) |>
  write_json_str(auto_unbox = TRUE) |>
  cat()

#> {"a":67,"b":67,"c":[1,2]}

# Length-1 vectors output as JSON arrays
read_json_str(str, length1_array_asis = TRUE) |>
  write_json_str(auto_unbox = FALSE) |>
  cat()

#> {"a":[67],"b":[67],"c":[1,2]}

# !!!!
# Those values marked with 'AsIs' class when reading are output
# as length-1 JSON arrays
read_json_str(str, length1_array_asis = TRUE) |>
  write_json_str(auto_unbox = TRUE) |>
  cat()

#> {"a":67,"b":[67],"c":[1,2]}

Option `yyjson_read_flag` - internal `YYJSON` C library options

The yyjson C library supports a number of internal options for reading JSON.

These options are considered advanced, and the user should read the original yyjson documentation for further explanation on what they control.

Warning: some of these advanced options do not make sense for interfacing with R, or otherwise conflict with how this package converts JSON to R objects.

# A reference list of all the possible YYJSON options
yyjsonr::yyjson_read_flag

#> $YYJSON_READ_NOFLAG
#> [1] 0
#> 
#> $YYJSON_READ_INSITU
#> [1] 1
#> 
#> $YYJSON_READ_STOP_WHEN_DONE
#> [1] 2
#> 
#> $YYJSON_READ_ALLOW_TRAILING_COMMAS
#> [1] 4
#> 
#> $YYJSON_READ_ALLOW_COMMENTS
#> [1] 8
#> 
#> $YYJSON_READ_ALLOW_INF_AND_NAN
#> [1] 16
#> 
#> $YYJSON_READ_NUMBER_AS_RAW
#> [1] 32
#> 
#> $YYJSON_READ_ALLOW_INVALID_UNICODE
#> [1] 64
#> 
#> $YYJSON_READ_BIGNUM_AS_RAW
#> [1] 128

read_json_str(
  "[1, 2, 3, ] // A JSON comment not allowed by the standard",
  opts = opts_read_json(yyjson_read_flag = c(
    yyjson_read_flag$YYJSON_READ_ALLOW_TRAILING_COMMAS,
    yyjson_read_flag$YYJSON_READ_ALLOW_COMMENTS
  ))
)

#> [1] 1 2 3

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.

Configuration Options for Parsing from JSON

Overview

The opts argument - Specifying options when reading JSON

Setting arguments to override the default options

Option promote_num_to_string - mixtures of numeric and string types

Option df_missing_list_elem - Missing list elements (when parsing data.frames)

JSON to data.frame (no list columns needed)

JSON to data.frame - list-columns required

Option obj_of_arrs_to_df - Reading JSON as a data.frame

Option arr_of_objs_to_df - Reading JSON as a data.frame

Option str_specials - Reading string "NA" from JSON

Option num_specials - Reading numeric "NA", "NaN" and "Inf"

Option int64 - large integer support

Option length1_array_asis - distinguishing scalars from length-1 vectors

Option yyjson_read_flag - internal YYJSON C library options

The `opts` argument - Specifying options when reading JSON

Option `promote_num_to_string` - mixtures of numeric and string types

Option `df_missing_list_elem` - Missing list elements (when parsing data.frames)

Option `obj_of_arrs_to_df` - Reading JSON as a data.frame

Option `arr_of_objs_to_df` - Reading JSON as a data.frame

Option `str_specials` - Reading string `"NA"` from JSON

Option `num_specials` - Reading numeric `"NA"`, `"NaN"` and `"Inf"`

Option `int64` - large integer support

Option `length1_array_asis` - distinguishing scalars from length-1 vectors

Option `yyjson_read_flag` - internal `YYJSON` C library options