pricing of new york city taxi ridescs229.stanford.edu/proj2016/poster/antoniadesfadavifobaa... ·...
TRANSCRIPT
PricingofNewYorkCityTaxiRidesChristophorosAntoniades|DelaraFadavi|[email protected] |[email protected]|[email protected]
ObjectiveThisprojectusespublicallyavailabletaxidatafromNewYorkCityTaxi&LimousineCommissiontoextractinsightsaboutridefareandduration.Thisinformationcanbeusefulinhelpingdriversdecidebetweenridestoaccepttoincreaseprofitortohelppassengerschoosetimesofdaytominimizefareorridetime.
MethodsForwardSearchandLasso
Linearregression• Predictingdurationandfare
AdditionalModelModifications• Transformationoflatitude/longitudecoordinates• Trafficmodelingbyconsideringridesperhour
(yieldssmallpredictionimprovement) ConclusionsandFuturedirection• TheRandomForestmodelperformsthebest,
becauseofthenonlinearinfluenceoflocationpatternsontripdurationandfare
• Predictionaccuracyflattenswithmorevariablesfromthisdataset,implyingneedforadditionalpredictivevariables
• Analyzemoredatatoinfertrafficconditionsorothervariabilitiesthatcanaffectdurationandfare
• Considermodellingtrafficbetweenpickupanddropoff locations
Taxi&LimousineCommission
1N.Ferreira,J.Poco,H.T.Vo,J.Freire,andC.T.Silva,"VisualExplorationofBigSpatio-TemporalUrbanData:AStudyofNewYorkCityTaxiTrips," IEEETransactionsonVisualizationandComputerGraphics,2013.[Online].Available:https://vgc.poly.edu/~juliana/pub/taxivis-tvcg2013.pdf.Accessed:Dec.11,2016.
TaxiPickups(blue)andDropoffs(Yellow)1
DatasetEachobservationrepresentsasingletaxirideandincludesfeatureinformationsuchaspickup/dropoff location,timeofride,fare,tip,paymenttype,andmore.Thedatasetwascleanedtohaveclearcovariatesdelineatingexacttimesanddatesofeachride.DatafromMay2016wasused,whichcontainedapproximately12millionobservationsoftaxirides.8,000observationswereusedastrainingdataand2,000observationswereusedasavalidationsetCovariatestrip_distance pickup_longitude pickup_latitude dropoff_longitude
dropoff_latitude fare_amount extra mta_tax
tip_amount tolls_amount improvement_surcharge
total_amount
manhattan_dist shortest_dist pickup_month dropoff_month
pickup_year dropoff_year pickup_day dropoff_day
pickup_weekday dropoff_weekday pickup_hour dropoff_hour
pickup_minute dropoff_minute passenger_count RatecodeID
payment_type
ValidationRMSE(red),TrainingRMSE(blue)
• Forwardsearchsuggestkeepingnearlyallvariables• Tripdistanceandridesinhour
mostimportantvariables• Lassoresultedinsmalllambda
parameterandhencenosignificantincreaseinpredictionaccuracy
• Linearregressiongivesreasonableresults,buthasalimittoitsaccuracy
• Coordinatesystemvariablesarenotlinearanddothereforenotgivesignificantresults
Absolutepredictionerrorisproportionaltorideduration
RandomForest• 500treesandm=n/3
predictorspersplit• RandomForest
outperformsalllinearregressionsandLasso
• Managestomodelnonlinearityinlocationcoordinates
• Errorlikelytodependontrafficandindividualdrivingcharacteristics
Model RMSE Validation RMSE Train
Fare,BaselineMean $10.45 $10.36
Fare, LinearRegression $3.52 $3.04
Fare,RandomForest $2.28 $2.16
Duration, BaselineMean 11.95min 11.43min
Duration,LinearRegression 6.51 min 6.17min
Duration,RandomForest 5.24 min 5.09min
Results
RandomForestValidationRMSE