

# Read csv files into a single data frame and add a column of input file names:īaseDf = ("input_folder/*.csv").withColumn("input_file_name", input_file_name())įilePathInfo = lect("input_file_name").distinct().collect()įilePathInfo_array = list(map(lambda row: row.input_file_name, filePathInfo)) For example, MAGIC: THE GATHERING is a trademarks of Wizards of the Coast. This website may use the trademarks and other intellectual property of Wizards of the Coast LLC, which is permitted under Wizards’ Fan Site Policy. It isn't recognizing the formatting of the. Decked Builder is not affiliated with, endorsed, sponsored, or specifically approved by Wizards of the Coast LLC. coll2 files (both are the same, just different formats) on my PC. (same_folder/ ).write.parquet(output_folder/)īased on the QuickSilver's answer, here is my PySpark version: spark = ("local").appName("csv_to_parquet").getOrCreate() csv from Decked Builder and copy of a decklist file from Assistant 1 deleted 8 yr. Is there any way I can utilize spark to do the batch processing?

My current solution is: for each_csv in same_folder:ĭf = (each_csv, header = True) I have a large number of CSV files that need to be converted to parquet files, using pyspark. Build decks at lightning speed with Decked Builder’s fun and highly optimized user interface, then play test your deck in the app itself.
