Subject: [mongodb-user] Re: MongoDB Spark Connector - how
to create an RDD using the python connector



The documentation on the python connector seems to indicate that the mongo documents to be read into Spark using the python connector must have a defined schema.

Hi,

You don’t have to define a schema. For example, in PySpark you can execute as below:

df = sqlContext.read.format("com.mongodb.spark.sql.DefaultSource")
                    .option("spark.mongodb.input.uri", "mongodb://host:port/dbname.collection")
                    .load()
# Print first record
df.first()
# Get RDD from Dataframe
myRDD = df.rdd

I’ve tried out the MongoDB Spark connector and have run into issues with the python connector.

If you have an issue using MongoDB Spark connector (Python), please provide:

  • MongoDB Spark Connector version
  • Spark version
  • Snippet code enough to reproduce the issue
  • Any error messages that you’re getting

Regards,
Wan.

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it...

, send an email to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/206f7ccb-cc63-48f5-9e7f-d249eb9f63c1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



Programming list archiving by: Enterprise Git Hosting