Transcript
Page 1: Pricing of New York City Taxi Ridescs229.stanford.edu/proj2016/poster/AntoniadesFadaviFobaA... · 2017. 9. 23. · Pricing of New York City Taxi Rides Christophoros Antoniades | Delara

PricingofNewYorkCityTaxiRidesChristophorosAntoniades|DelaraFadavi|[email protected] |[email protected]|[email protected]

ObjectiveThisprojectusespublicallyavailabletaxidatafromNewYorkCityTaxi&LimousineCommissiontoextractinsightsaboutridefareandduration.Thisinformationcanbeusefulinhelpingdriversdecidebetweenridestoaccepttoincreaseprofitortohelppassengerschoosetimesofdaytominimizefareorridetime.

MethodsForwardSearchandLasso

Linearregression• Predictingdurationandfare

AdditionalModelModifications• Transformationoflatitude/longitudecoordinates• Trafficmodelingbyconsideringridesperhour

(yieldssmallpredictionimprovement) ConclusionsandFuturedirection• TheRandomForestmodelperformsthebest,

becauseofthenonlinearinfluenceoflocationpatternsontripdurationandfare

• Predictionaccuracyflattenswithmorevariablesfromthisdataset,implyingneedforadditionalpredictivevariables

• Analyzemoredatatoinfertrafficconditionsorothervariabilitiesthatcanaffectdurationandfare

• Considermodellingtrafficbetweenpickupanddropoff locations

Taxi&LimousineCommission

1N.Ferreira,J.Poco,H.T.Vo,J.Freire,andC.T.Silva,"VisualExplorationofBigSpatio-TemporalUrbanData:AStudyofNewYorkCityTaxiTrips," IEEETransactionsonVisualizationandComputerGraphics,2013.[Online].Available:https://vgc.poly.edu/~juliana/pub/taxivis-tvcg2013.pdf.Accessed:Dec.11,2016.

TaxiPickups(blue)andDropoffs(Yellow)1

DatasetEachobservationrepresentsasingletaxirideandincludesfeatureinformationsuchaspickup/dropoff location,timeofride,fare,tip,paymenttype,andmore.Thedatasetwascleanedtohaveclearcovariatesdelineatingexacttimesanddatesofeachride.DatafromMay2016wasused,whichcontainedapproximately12millionobservationsoftaxirides.8,000observationswereusedastrainingdataand2,000observationswereusedasavalidationsetCovariatestrip_distance pickup_longitude pickup_latitude dropoff_longitude

dropoff_latitude fare_amount extra mta_tax

tip_amount tolls_amount improvement_surcharge

total_amount

manhattan_dist shortest_dist pickup_month dropoff_month

pickup_year dropoff_year pickup_day dropoff_day

pickup_weekday dropoff_weekday pickup_hour dropoff_hour

pickup_minute dropoff_minute passenger_count RatecodeID

payment_type

ValidationRMSE(red),TrainingRMSE(blue)

• Forwardsearchsuggestkeepingnearlyallvariables• Tripdistanceandridesinhour

mostimportantvariables• Lassoresultedinsmalllambda

parameterandhencenosignificantincreaseinpredictionaccuracy

• Linearregressiongivesreasonableresults,buthasalimittoitsaccuracy

• Coordinatesystemvariablesarenotlinearanddothereforenotgivesignificantresults

Absolutepredictionerrorisproportionaltorideduration

RandomForest• 500treesandm=n/3

predictorspersplit• RandomForest

outperformsalllinearregressionsandLasso

• Managestomodelnonlinearityinlocationcoordinates

• Errorlikelytodependontrafficandindividualdrivingcharacteristics

Model RMSE Validation RMSE Train

Fare,BaselineMean $10.45 $10.36

Fare, LinearRegression $3.52 $3.04

Fare,RandomForest $2.28 $2.16

Duration, BaselineMean 11.95min 11.43min

Duration,LinearRegression 6.51 min 6.17min

Duration,RandomForest 5.24 min 5.09min

Results

RandomForestValidationRMSE

Top Related